AI copilot for field technicians.
Record one expert demo. Coach every junior technician forever — real-time voice + visual coaching through their smart glasses or phone. The AI watches the work and verifies each step before it advances.
- STEP 0100:00Isolate line pressure
- STEP 0200:01Loosen housing bolts
- STEP 0300:02Lift the manifold cover
- STEP 0400:03Inspect the old seal
- STEP 0500:04Seat the new seal
- STEP 0600:06Reassemble and pressure-test
75% FTFR INDUSTRY BASELINE·+1PP UPLIFT AT FLEET SCALE = MILLIONS / YR·$200–300 BURNED PER REPEAT TRUCK ROLL
Your senior technician walks out the door — and decades of judgment walks out with him.
Half of process knowledge is held in people's heads. Roughly 42%McKinsey is tribal — undocumented, lost the day they leave. The 600-page service manual that's supposed to replace them is exhaustive and unreadable in the moment.
Junior technicians inherit it anyway. About 25%Aberdeen of failed dispatches trace directly to a training gap — and the average repeat truck roll costs $200–300Industry avg in fuel and time before you count the customer's downtime.
Hiring more people would be the obvious answer, except the people aren't there. The US manufacturing workforce is 44.8BLS years old on average — the oldest it has ever been — and faces 3.8MDeloitte / MI jobs by 2033 with roughly half unfilled. Germany alone is short 5MIW Köln skilled workers by 2030.
One recording. Three modes. One architecture.
One expert.
One narrated demo.
Wear the glasses or hold up the iPhone. Talk through the task once. Gemini 2.5 Pro structures it into ordered steps with per-step completion criteria. ffmpeg clips each step automatically.
- 01Isolate line pressure
- 02Loosen housing bolts
- 03Lift the manifold cover
- 04Inspect the old seal
- 05Seat the new seal
Voice replies.
Vision verifies.
Phone or glasses hold a direct WebSocket to Gemini 3.1 Live — voice and camera at ~0.5 fps. The model watches for the completion criterion and fires advance_step the moment it sees it. No server hop.
No procedure?
Diagnose from scratch.
Describe the broken machine. The AI identifies the product, searches your library first, falls back to a web-grounded fix with cited sources, then hands off to a normal coaching session.
- Identify3-phase chiller
- DiagnoseLow refrigerant
- Fetch3 sources cited
- CoachHandoff to learner
- ↗carrier.com / service / 30RAP / E-04
- ↗service.refrigeration / short-cycle-diagnosis
- ↗fieldnotes.io / topics / low-charge-symptoms
One wedge.
Pick the pain you own.
Service ADVISOR and SIS lock diagnostics in the shop. Retrace puts them on the technician's face.
Real-time visual guidance through smart glasses, on the equipment, no swivel-chair. Embeds into the OEM's after-sales motion as a branded service layer — the technician's hands stay on the machine while the model watches.
Three unlocks landed in the same year.
Gemini 3.1 Flash Live ships real-time video + audio + tools + context compression + session resumption — the model can finally watch and coach indefinitely. Ray-Ban Meta sold 7M units in 2025, +210% YoY; the capture layer is already on technicians' faces. Meta's DAT SDK 0.6 shipped — third-party apps can finally stream from the glasses.
Hardware mature. Model mature. Distribution mature. Nobody had built the field-service coaching layer yet. Until now.
AI copilot for technicians today.
Training corpus for humanoids tomorrow.
Robotics labs aren't compute-starved — they're data-starved. The humanoid wave needs first-person video of humans doing useful work, paired with speech and task-completion ground truth. That's scarce. Most of what exists is consumer or household.
The public benchmarks tell the story. 3,670 hrsEgo4D · Meta '22 of egocentric video across 9 countries. 1,422 hrsEgo-Exo4D · Meta '23 of paired first-person + third-person footage. 76K demosDROID · Stanford '24 of robot manipulation across 86 tasks. 1M+ episodesOpen X-Embodiment · DeepMind '23 spanning 22 different embodiments. All landmark releases. None indexed by industrial procedure, none paired with expert narration, none capturing the long tail of skilled-trade tasks that pay the labs' bills.
Combined Series-B+ funding across the five frontier humanoid programs: over $1.4B raised in 18 months on the promise of robots that can do useful physical work. The bottleneck is the same in every deck.
Five attributes the public corpora don't have together.
- 01First-person POV.
Same camera the humanoids will use. Ego4D had to convince 855 strangers to wear cameras around — Retrace's data comes from technicians whose existing capture habit is on-glasses or in-phone.
- 02Synchronized speech.
Narrated demos with timestamped transcripts. Most egocentric corpora are silent video — narration grounds intent and procedure.
- 03Tool-call ground-truth.
Explicit advance_step segmentation on completion criteria — explicit task boundaries are rare in public datasets and gold for downstream RL.
- 04Paired attempts.
Expert demo + every learner attempt of the same task. Positive and negative samples paired by procedure — the kind of contrastive data RT-2 and π0 wish they had.
- 05Industrial verticals.
Field service, OEM equipment, skilled trades — exactly where Ego4D leans thin. The work humanoids will eventually be sold into.
Same data, two markets. AI copilot today. Training corpus tomorrow.
Full-stack, end-to-end.
Backend / AI + iOS / XR. One real shipped product in four weeks.
- HHyunseok HwangBackend · AI systems
FastAPI server, Gemini orchestration (2.5 Pro + 3.1 Live), structured-output extraction, ephemeral tokens, troubleshoot mode, web-grounded search.
- JJayden DeCambreiOS · XR
SwiftUI app, Meta DAT SDK 0.6 integration, Gemini Live WebSocket client, Ray-Ban HUD design system, MediaPipe hand tracking.