XOOMAR
Technicians gather messy robot training data in a futuristic robotics lab with sensors and robotic arms.
TechnologyJune 17, 2026· 10 min read· By XOOMAR Insights Team

XDOF Wrings $70M From Dirty Robot Training Data Race

Share
Updated on June 17, 2026

$70 million is already chasing the least glamorous part of robotics: collecting enough robot training data to make physical AI work outside a demo room.

XOOMAR Intelligence

Analyst Take

58/ 100
Moderate
4 sources analyzedLow confidenceTrend10Freshness100Source Trust90Factual Grounding90Signal Cluster20

That is the real signal behind XDOF, the startup emerging from stealth with backing from Thrive Capital, Spark Capital, a16z, Lux, and WndrCo, according to TechCrunch. The company is betting that the next bottleneck in robotics won’t be chips or model architecture. It will be the dirty data loop: collecting, cleaning, annotating, evaluating, and repeating physical interactions at scale.

The timing matters. Two weeks ago, OpenAI said it would relaunch the robotics program it shuttered in 2021. That move fits a broader push among frontier labs to make AI useful in the physical world. But unlike LLMs, which trained on oceans of text already sitting online, robots don’t have a web-scale archive of useful motion, force, manipulation, and embodied interaction data.

XOOMAR analysis: XDOF is selling the shovel in a race where everyone wants to mine “physical AI,” but few want to build the warehouse, hire the operators, calibrate the machines, and label the footage.

XDOF robot training data starts with 20 customers and a $70 million war chest

XDOF, pronounced “ecks-doff,” has about 60 employees and is already working with 20 customers, including several frontier AI labs, co-founder and CEO Philippe Wu told TechCrunch. He declined to name them.

“All of the top labs are trying to pursue robotics,” Wu said. “We’ve already seen some of the downfalls of falling a little bit behind in the language model race … you don’t want to be in this type of situation where you pursue this technology too late, and everyone is in this boat where physical AI is the next frontier.”

That quote says more than a launch announcement. AI labs don’t want to repeat the LLM race from behind. If robot foundation models become strategically important, access to high-quality XDOF robot training data could become as important as access to GPUs or research talent.

Wu’s own path explains the company’s thesis. As a PhD student at UC Berkeley, he worked on enabling robots to learn skills from large-scale datasets. The blocker was not theory. It was supply.

“We didn’t have large-scale data to work with,” Wu told TechCrunch. “There was this chicken-and-egg problem — we first needed to actually collect data before we could even ask how to train a foundation model for robotics.”

That gap became the business.

130,000 trajectories show how scarce good robotics data still is

XDOF is partnering with UC Berkeley’s AI Research lab to release ABC, a dataset the company believes is the largest collection of high-quality robot training data ever assembled.

It includes:

  • 130,000 trajectories of robot manipulation data
  • 300 hours of simulation
  • 100 hours of evaluations

The team has already used the data to train robots on benchmark tasks such as folding T-shirts, flattening boxes, and loading AirPods into their cases.

That is meaningful scale for robotics. It also shows the brutal mismatch with language AI. Text and image models benefited from data that already existed in public or semi-public digital form. Robot data has to be produced through physical action. Every example may involve hardware, space, objects, cameras, human operators, maintenance, calibration, and later annotation.

Data source Why it scales Why robotics is harder
Text and images Large stores already existed online Physical interaction is not naturally archived at useful fidelity
YouTube and gig footage Easy to collect in volume TechCrunch says it can be low-fidelity and hard to reconcile with the physical world
Robot teleoperation Produces targeted demonstrations Requires robots, operators, calibration, and task setup
Simulation Can generate variation Still needs real-world grounding and evaluation

XOOMAR analysis: the numbers around ABC are not just a flex. They expose the supply chain problem. If 130,000 trajectories is release-worthy scale, then the industry is still early in building the equivalent of a serious physical data layer.

That echoes a broader pattern in AI infrastructure. Investors don’t only fund glamorous models. They also fund the systems that make models testable, auditable, and usable, a theme we covered in $27M Bet Pushes Pramaana Labs to Make AI Prove Itself.

Clean robot demos don’t solve messy physical work

Robotics demos are persuasive because they compress difficulty into a clean clip. But XDOF’s business exists because deployment requires more than a robot completing a task once under controlled conditions.

TechCrunch reports that Wu and XDOF co-founder and CTO Fred Shentu previously worked on GELLO, a low-cost teleoperation system that lets a human operator control a robotic arm to generate training data. Wu said the paper became influential because “a lot of people had similar needs and bottlenecks.”

The bottleneck is not just data volume. It’s data fit.

XDOF plans to operate across three tiers of a data pyramid:

  • Robot-specific teleoperation: data collected on the actual robot being deployed
  • General teleoperated robot data: systems like GELLO collecting broader manipulation examples
  • Egocentric human data: humans performing everyday tasks, captured through wearable sensors XDOF plans to build

Wu’s point on hardware choice is sharp because it cuts against the idea that any footage will do.

“Your camera choice is going to affect the quality of your data — which is going to affect how your hand-tracking algorithm performs,” Wu said. “If you don’t design the hardware well from the start, the data you collect might have very specific problems that you didn’t anticipate.”

XOOMAR analysis: this is where XDOF’s value moves beyond “data vendor.” If the company can shape collection tools, annotation systems, and evaluation workflows together, it can sell a feedback loop, not just a dataset.

Robotics wants a shared-dataset moment without an internet-sized shortcut

David McAllister, a Berkeley PhD student who helped organize the ABC release, framed the academic upside directly.

“We’ve seen in language, image generation, and other fields, that when models and data are released, the community achieves things that you wouldn’t necessarily have expected,” McAllister told TechCrunch.

That is the optimistic case for ABC. Shared data can create unexpected research gains. It can also make benchmarks less anecdotal and more comparable.

But robotics faces a harsher scaling curve than software-native AI. The web did not accidentally record enough high-quality robot manipulation data. Companies have to manufacture it. That means people operating robots, people wearing sensors, people maintaining equipment, and people deciding what counts as useful training material.

Wu is blunt about why major labs may outsource this work.

“You need a warehouse of hundreds of thousands of square feet with hundreds of robots,” Wu said. “You need to maintain these robots, calibrate their physical parameters, and properly train operators.”

That is not a typical research lab function. It sounds closer to logistics, workforce management, and data operations. The physical AI race may be won partly by whoever can turn that operational grind into repeatable infrastructure.

A related labor question is already visible in other automation pushes. As we reported in 500 Bowls an Hour Pits Wonder Robot Kitchen Against Labor, robotics stories often become labor stories once machines leave the demo floor.


AI labs, robot makers, workers, and customers are pulling on the same data pipeline

For frontier labs, outsourced robot training data offers speed. They can pursue robotics without first building a giant physical data operation from scratch.

For robot companies, better data could make models less brittle across tasks. TechCrunch notes that XDOF is not focused only on data provision. It is also building data cleaning, tooling, and annotation systems, which are meant to create a self-reinforcing loop for robot trainers.

For workers, XDOF’s model points to a new labor layer in AI. The company plans to hire and train armies of teleoperators and egocentric data operators around the world. That work may be repetitive. It may also become essential, much like labeling work became essential to earlier AI systems.

For customers, the issue is trust. If outsourced datasets shape how robots behave, buyers will eventually care about where the data came from, how it was evaluated, and whether the model’s performance translates to their own environment. That is XOOMAR analysis, but it follows directly from XDOF’s focus on collection quality, hardware design, annotation, and evaluation.

The name XDOF captures the ambition. It plays on “degrees of freedom,” the robotics term for independent motions a robot can perform. TechCrunch notes that a human arm from shoulder to wrist has seven degrees of freedom, while Figure.AI’s latest robot has 30.

Wu said the “X” means: “Arbitrary degrees of freedom, unlimited degrees of freedom.”

Proprietary robot training data may become the moat

Hardware alone is unlikely to be enough if competitors can buy similar components and train similar models. XOOMAR analysis: the harder moat may be proprietary embodied data, specialized collection systems, and evaluation loops tied to real tasks.

XDOF appears built around that logic. It is not merely collecting footage. It wants to own the pipeline around data collection tools, data cleaning, annotation, and feedback for model trainers.

That matters because physical AI has less room for vague claims. A chatbot can fail softly. A robot fails in space, around objects, people, equipment, and time-sensitive workflows. The source material does not give safety incident data, so we should not overstate the risk. But the operational stakes are plainly different when the model controls hardware.

The risk for smaller robotics teams is also clear. If the best data pipelines are expensive, labor-intensive, and tied up by frontier labs or large robotics companies, smaller players may face a harder path. They may depend more on limited datasets, simulation, or narrow in-house collection.

That does not mean XDOF wins by default. It means the market is moving toward a harder question: who controls the physical data layer?

XDOF-style data factories could decide which robots leave the demo floor

The next phase to watch is not whether robotics labs can produce better videos. It is whether they can build or buy repeatable data systems that improve models across real tasks.

Evidence that would strengthen XDOF’s thesis includes more named frontier lab customers, broader adoption of ABC, measurable gains on benchmark tasks, and proof that its three-tier data pyramid improves model performance beyond isolated demonstrations.

Evidence that would weaken it would be just as important: if simulation reduces the need for real-world collection faster than expected, if labs decide to build giant internal data operations, or if XDOF’s datasets fail to generalize beyond the environments where they were collected.

For now, the signal is clear. XDOF robot training data is turning unglamorous physical work into AI infrastructure. The companies that master that grind may shape embodied AI more than the ones with the slickest launch clips.

The Bottom Line

  • Robotics may be entering a new race where proprietary physical-world data becomes a key advantage.
  • XDOF’s early customer traction suggests major AI labs are outsourcing the hardest parts of robot training.
  • OpenAI’s robotics relaunch shows frontier labs increasingly see physical AI as the next major battleground.

LLM Training Data vs. Robot Training Data

AreaLLM TrainingRobot Training
Data availabilityLarge volumes of text already existed onlineUseful motion, force, manipulation, and embodied interaction data is scarce
Core workflowTrain on digital text datasetsCollect, clean, annotate, evaluate, and repeat physical interactions
Main bottleneckModel and compute raceReal-world data collection at scale

XDOF Launch Metrics

Funding ($M)
70
Employees
60
Customers
20
XOOMAR

Written by

XOOMAR Insights Team

Research and Editorial Desk

The XOOMAR Insights Team pairs automated research with human editorial judgment. We track hundreds of sources across technology, fintech, trading, SaaS, and cybersecurity, cross-check the facts, and explain what happened, why it matters, and what to watch next. We do not just rewrite headlines. Every article is fact-checked and scored for reliability before it goes live, and we link back to the original sources so you can verify anything yourself.

Related Articles

Futuristic AI workspace visualizing hidden LLM API costs with glowing data layers and server coreTechnology

Hidden Fees Warp LLM API Pricing Beyond Token Costs

Token prices are only the opening bid. Context, caching, retries, tools, and latency tiers can decide the real LLM API bill.

Jun 16, 202618 min
Secure startup data room workspace with encrypted files, AI networks, and investor diligence signals.Technology

Best Data Room Software to Stop Fundraising Chaos Fast

The right startup data room can calm investor diligence, control sensitive files, and show founders exactly who's engaged.

Jun 17, 202623 min
Founder reviews secure data room software with investor access and analytics in a futuristic workspaceTechnology

Investor Checks Ride on Startup Data Room Software

Founders don't need the priciest VDR first. The right data room depends on stage, sensitivity, analytics, and investor access.

Jun 17, 202627 min
Secure digital data room with founders and investors reviewing organized files in a futuristic workspaceTechnology

Startup Data Room Checklist Investors Won't Pick Apart

A clean startup data room speeds due diligence, reduces investor friction, and keeps sensitive fundraising files under control.

Jun 16, 202620 min
Futuristic workspace showing AI converting internal documents into chatbot answersTechnology

No-Code RAG Chatbot Turns Internal Docs Into Answers

No-code RAG chatbot tools let teams query internal docs without building a Python pipeline, but setup and testing decide success.

Jun 16, 202623 min
Browser password tool versus stronger dedicated password manager vault in a dark cybersecurity sceneCybersecurity

Password Manager vs Browser Password Manager, Who Wins?

Browser tools beat password reuse, but dedicated password managers offer safer vaults, sharing, audits, and recovery.

Jun 17, 202624 min
Digital shield blocks network leaks during a sudden VPN tunnel drop in a dark cybersecurity scene.Cybersecurity

VPN Kill Switch Blocks IP Leaks When Tunnels Drop Suddenly

A VPN kill switch cuts internet access when your VPN drops, blocking IP, DNS, and traffic leaks until the tunnel returns.

Jun 17, 202620 min
Laptop and phone protected by glowing privacy shields blocking trackers and data leaksCybersecurity

Privacy Toolkit Locks Down Everyday Browsing Without Pain

A practical privacy toolkit cuts trackers, search profiling, IP leaks, weak passwords, and fingerprinting without making the web unusable.

Jun 17, 202620 min
Futuristic AI hub facing government oversight, with servers, neural networks, and an offline console.Technology

White House Forces Anthropic Fable Shutdown in AI Feud

White House restrictions forced Anthropic Fable offline, exposing an AI policy process shaped by leaks, politics, and safety claims.

Jun 17, 20267 min
Low-end PC protected by a glowing cybersecurity shield with lightweight antivirus visuals.Cybersecurity

Best Antivirus for Low-End PCs That Won't Choke Windows

Low-end PCs need antivirus that protects without eating RAM. Defender, Bitdefender, ESET and Panda stand out if tuned right.

Jun 17, 202625 min

Don't miss the signal

Get our weekly roundup of the stories that matter across tech, fintech, and trading. No noise, just signal.

Free forever. No spam. Unsubscribe anytime.