The Humanoid Loop What the machines can’t do, who trains them, and where the money should go

The Tutor, the Cyber-Laborer, and the Gap Nobody Has Built

A teacher in Delhi straps an iPhone to his forehead and films himself folding laundry for $5–20 an hour. A computer-science student in Shanghai pilots a humanoid through the same microwave-door motion hundreds of times a day in a VR exoskeleton. They call themselves cyber-laborers. Their footage — over 160,000 hours per month from one company alone — feeds the vision-language-action models that will eventually run the robots replacing warehouse workers in South Carolina and line workers in Leipzig. The people training the machines and the people the machines displace are never in the same room. Call it what it is: offshoring the cognitive imprint. No trade-law category applies. The WTO does not know what to make of it.

This piece breaks down the loop. The demos are mostly single-take cherry-picks from a hundred attempts. The binding constraint isn’t the AI brain — it’s the fingers. The economics work in the US, break even in Europe, and don’t pencil in India until 2028. And the highest-leverage layer in the stack — the one that would actually let the sector scale — hasn’t been built yet.

$6.9 billion went in. The economics still don’t work where it matters.

Venture capital put about $6.89 billion into humanoid robotics in 2025 across 44 funded companies — a 288% year-over-year increase. Figure AI closed a Series C at $39 billion. Unitree filed a $610M IPO on 335% revenue growth. The money is real. The question is whether it’s flowing to the right layer.

Three companies define the sector’s structural shape. Figure AI is the industrial-deployment leader — longest autonomous demo on record (a four-minute dishwasher cycle), live at BMW, building its own foundation model called Helix after ending its OpenAI partnership. Unitree is the cost leader — a G1 humanoid starts at $16,000, and more than 30 published academic papers in 2025 used it. The moat is becoming Linux-like: every graduate student in robotics trains on a Unitree. Sanctuary AI has the deepest manipulation IP — a 21-DoF hydraulic hand with 5-millinewton sensitivity, beyond anything Figure or 1X ships — and the worst balance sheet. Its January 2025 convertible note was only about 50% subscribed. The best hand in the sector is also its most likely acquisition target.

The fact that should discipline every other number: China led global humanoid shipments in 2025. AgiBot alone shipped 5,100+ units — 39% of global volume. The Western sector is not the global frontier. It is a minority of it.

The demos are mostly lies — not because the robot didn’t do the thing, but because it did the thing once

Everything in the sector runs on short video clips. A humanoid folds a shirt. A humanoid threads a bolt. The clips are slick, and understanding how to read them is the most important skill an outside observer can develop.

Four rules.

If a demo video does not include a success rate, assume 5–10%. The industry-insider figure is “cherry-pick one out of a hundred.” IEEE Spectrum’s investigation of Boston Dynamics’ Atlas parkour clips found the backflip worked roughly one time in twenty attempts.

If a humanoid is doing something graceful in a “home,” ask whether a remote operator is in the next room. When Optimus appeared to serve drinks at Tesla’s October 2024 “We, Robot” event, those humanoids were being driven by humans in motion-capture suits offstage. 1X’s NEO folded a shirt in its reveal video; the company’s own blog acknowledged roughly 60–70% autonomy with “Expert Mode” covering the rest.

A flat towel is not laundry. Laundry means a pile, with fitted sheets, tangled straps, and socks turned inside out. Every “laundry” video you have seen is a pile of pre-sorted flat garments. Research on simple cases — flat towels with two corners visible — bottoms out around 20–30% success when those corners are occluded. No humanoid has publicly demonstrated an unconstrained fitted-sheet fold. None.

If the video is silent about wall-clock time, the robot is probably three to ten times slower than a human. Watch your own hands folding a t-shirt. You took six seconds.

The honest operators in this market are Physical Intelligence, whose papers acknowledge first-try failures in print and run in homes the model has never seen, and the academic robotics community, whose benchmark papers publish failure modes alongside results. Everything else is a demo reel.

This time the technology is different. But the bottleneck isn’t where you think.

Every previous humanoid cycle ended the same way — impressive hardware, hopeless economics, a brain too brittle to run off-script. Three things changed simultaneously.

The brain: vision-language-action models trained on pixels replaced scripted state machines. PR2 folded a towel in 2010 by running a pre-computed routine matched to a specific camera rig. Helix folds laundry by feeding raw pixels into a 7-billion-parameter model that writes action sequences on the fly. One generalizes. The other doesn’t.

The cost curve: Chinese actuator, reducer, and battery supply chains compressed hardware cost by roughly an order of magnitude over five years. A PR2 was $400,000 in 2010. A G1 is $16,000 today.

The data: egocentric video at scale — Arjun in Delhi, a medical student in Nigeria, tens of thousands like them — combined with teleoperation in Chinese data centers and synthetic pipelines like NVIDIA’s GR00T-Mimic, which generates 780,000 trajectories in 11 hours.

All three had to arrive in the same window. Any one breaking — a trade restriction on Chinese components, a plateau in VLA scaling, a privacy backlash against egocentric video — punches a hole in the others. The optionality is real. The certainty is not.

But here’s the structural irony: the VLA brain has improved dramatically faster than the hand hardware. Foundation models advance on six-month release cycles. Hand hardware advances on fab-and-supply-chain timescales — three to five years per generation. In a system where one component bottlenecks the other, the slower one determines the pace.

Every failure on the scorecard is a finger failure

The bots can walk. Their torsos balance. Their arms reach with reasonable accuracy. What they cannot do, reliably, is place 27 joints’ worth of soft pressure on an object they did not already know was there. The scorecard above is not a scorecard of robots. It is a scorecard of hands.

The human hand has 27 degrees of freedom and about 17,000 mechanoreceptors. Two million years of evolutionary tuning. The best shipping humanoid hand has 22 degrees of freedom and on the order of 100 tactile receptors.

The research literature has put a number on the gap. A 2025 study called ViTaL found that vision-only manipulation policies averaged 21% success on contact-rich tasks. Adding tactile sensing lifted the same policies to about 90%. A complementary study, VTLA, showed tactile-equipped systems exceeding 90% on peg-in-hole insertion — a task vision-only policies essentially fail at. Meta and GelSight have open-sourced a sensor called Digit 360 that detects forces down to one millinewton across more than 18 modalities.

The cost of adding tactile fingertips to a humanoid is modest. The benefit is enormous. Every serious 2027 humanoid will have them. The first robot that reliably folds a fitted sheet will get there because someone solved denser tactile arrays and cleaner tendon routing — not because someone trained a bigger model.

Silicon on fingertips, not transformers, is the binding constraint.

*The hand gap — human vs. robotic manipulation capabilities*

China’s data factories are an industrial policy, not a gig economy

China has stood up more than forty humanoid training centers through a mix of central and provincial funding. The AgiBot Shanghai Pudong facility runs about 100 robots and generates between 30,000 and 50,000 data points per day across more than a thousand task categories. The Sichuan center in Zigong opened in January 2026 at 6,000 square meters, targeting three million data entries per year. Similar facilities operate in Beijing, Wuxi, Wuhan, Guangzhou, and at least ten other cities. The central government has committed to creating more than one million new jobs in “data collection and robot-collaborative-operation engineering” under the 15th Five-Year Plan.

Nothing in the West compares. The Scale AI / Micro1 pipeline is a federated gig economy routed through a smartphone and a Wi-Fi connection. The Chinese network is an industrial policy. The delta shows up in the data: GO-1, an AgiBot foundation model pretrained on the 2,976-hour AgiBot World dataset, outperforms models pretrained on the federated academic corpus Open X-Embodiment by about 30% on downstream tasks. A single coordinated teleoperation corpus is beating the pooled output of 34 labs.

*Map of China’s humanoid training center network*

The workers next to the robots are invisible in the public record

Figure’s Helix has worked alongside humans on 30,000 BMW X3 vehicles. Digit has moved 100,000 totes at GXO. Apollo is running pilots at Mercedes Berlin-Marienfelde.

Search every English-language story published from these sites. Not a single line worker is quoted by name about what it is like to work next to a humanoid. The quotes come from BMW board members. From GXO’s chief automation officer. From Mercedes executives.

The unions are equally quiet. The UAW, asked directly about humanoid automation, “didn’t provide comment.” Germany’s IG Metall has issued no public statement on the Mercedes pilot. The Ford–UAW contract contains a clause requiring notification of “new or advanced technology” to enable “meaningful discussion of its impact.” No such discussion has entered the public record.

Whoever publishes the first on-the-record interview with a Spartanburg line worker working beside a Figure 02 — or finds a former cyber-laborer describing being laid off after the model trained on her hands went into production — will define how the rest of the decade reads the transition.

The robot is economic in the US, marginal in Europe, and doesn’t work in India

Agility publishes a $30-an-hour all-in RaaS price for Digit. Internal operating cost today is $10–12/hr; the long-run target is $2–3/hr. Against US fully loaded warehouse labor of roughly $30/hr, the robot is clearly economic today. Against EU labor at around €12/hr, it’s marginally break-even at operating cost. Against Indian warehouse labor at $2–3/hr, the humanoid does not break even — and won’t until operating cost drops below $2/hr, which is Agility’s stretch target for probably 2028 or later.

Two things the sector’s own slides get wrong. The 80% uptime claim does not match observed deployments — Figure at BMW runs 10-hour shifts five days a week, which is 50% calendar utilization. Actual field utilization is 40–60% because of charging, teleop handoffs, and single-shift rollouts. And the Indian wage crossover — the scenario the opening of this piece pointed at — is a late-decade risk, not a mid-decade one. Which does not make it less real. It just changes who has to think about it first.

The grid ceiling nobody is modeling

Scale the energy. One million humanoids, each drawing roughly 11 kWh per day, consume around 4 terawatt-hours per year. Trivial.

One hundred million humanoids — the number Musk and Figure’s Brett Adcock routinely float for the mid-2030s — would draw roughly 401 TWh per year. That is about one and a half times the total annual electricity consumption of the United Kingdom.

A warehouse with 50 humanoids needs roughly 550 kWh per day of charging throughput — five Tesla Superchargers running full-time. A port with 5,000 humanoids pulls grid capacity equivalent to a small aluminum smelter. None of this shows up in any public humanoid roadmap. In the bull scenarios, grid interconnect becomes the sector’s first hard ceiling — a bureaucratic and capital-intensive ceiling that cannot be solved by scaling a foundation model.

The highest-leverage layer hasn’t been built

The control layer — VLA stacks, orchestration, WMS integration — is consolidating into three or four closed ecosystems. None expose meaningful third-party SDKs. This is a winner-take-few market; most independent bets will be acquihires.

The data layer is being absorbed. NVIDIA is swallowing simulation, the foundation model, teleop infrastructure, and the inference silicon. Open datasets erode proprietary corpora. The Chinese state-backed data centers are a geopolitical moat, not a private-equity one.

Which leaves the financing and infrastructure layer. And this is where the gap is.

Robot-backed asset-backed securitization does not exist. No specialty fund is bundling Digit, Apollo, and Figure lease receivables into tradable notes. Nobody is running the “Ford Motor Credit for humanoids” — the structure that let auto manufacturers move inventory off balance sheet and scale to millions of units per year. Traditional equipment-leasing firms are pricing humanoids as generic industrial capex without humanoid-specific underwriting models. Meanwhile the underlying contracts are now genuinely securitizable: 1,250+ operating hours at BMW, 100,000+ totes at GXO on a multi-year RaaS contract with published hourly rates and named enterprise counterparties.

Aircraft ABS, solar ABS, equipment-lease securitization — each took 10 to 15 years to standardize. Humanoids are in year two. Whoever writes the first trust indenture, sets the first residual-value assumption, and negotiates the first buyback guarantee with Apptronik or Figure defines the covenant template the entire asset class copies.

That is a genuine, durable moat.

What to watch

BMW Leipzig, summer 2026. If Figure 03 scales past a single production line, the thesis that VLA-based policies can deploy into real industrial operations gets its first major validation outside Spartanburg.

Unitree’s IPO deployment. If IPO proceeds go into aggressive cost-down and G1 ASP crosses below $15K, the geography of labor compression changes.

The first robot-backed ABS. Someone will launch it in 2026 or 2027. Whoever gets there first sets the structure the rest of the asset class inherits.

And one non-investable trigger: the first public, reproducible benchmark in which a humanoid handles a pile of tangled laundry at over 70% success. No company has it. All of them are pointed at it. The first who lands it moves humanoids from warehouse to home in the public imagination.

Back in Delhi, Arjun is still folding laundry. The two-year-old is still running into frame. Fifteen minutes of usable footage per hour. The reporter asked about the work, and the quote that got printed was the one about his daughter. The question underneath was simpler: “How much content can be made in the home? How much content?”

That is not a question about robots. It’s a question about how many people like him the sector needs, and for how long. The answer, eventually, is: fewer, then none. The only question is when, and whether the people who do the work get to read the documents that decide their price.

In April 2026, they do not.

The Humanoid Loop What the machines can’t do, who trains them, and where the money should go was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.