*By leveraging Genie’s immense world knowledge, it can simulate exceedingly rare...

IMTDb · 2026-02-06T22:28:14 1770416894

The idea is that, over time, the quality and accuracy of world-model outputs will improve. That, in turn, lets autonomous driving systems train on a large amount of “realistic enough” synthetic data.

For example, we know from experience that Waymo is currently good enough to drive in San Francisco. We don’t yet trust it in more complex environments like dense European cities or Southeast Asian “hell roads.” Running the stack against world models can give a big head start in understanding what works, and which situations are harder, without putting any humans in harm’s way.

We don’t need perfect accuracy from the world model to get real value. And, as usual, the more we use and validate these models, the more we can improve them; creating a virtuous cycle.

tantalor · 2026-02-07T04:39:30 1770439170

It's a pareto principal.

You can get 80% of the way to "perfect" with 20% of the effort.

dyauspitr · 2026-02-07T05:19:19 1770441559

That’s just a platitude at this point. They for all intents and purposes solved the problem, atleast in the US.

jayd16 · 2026-02-06T23:39:32 1770421172

I don't think you say "ok now the car is ball bearing proof."

Think of it more like unit tests. "In this synthetic scenario does the car stop as expected, does it continue as expected." You might hit some false negatives but there isn't a downside to that.

If it turns out your model has a blind spot for albino cows in a snow storm eating marshmallows, you might be able to catch that synthetically and spend some extra effort to prevent it.

hnburnsy · 2026-02-07T04:21:48 1770438108

Looks like they need to blackouts and parades to that simulator...

https://www.yahoo.com/news/articles/waymo-paralyzed-parade-b...

disillusioned · 2026-02-07T07:58:13 1770451093

The blackouts circumstance was because they escalate blinking/out of service traffic lights to a human confirmed decision, and they experienced a bottleneck spike in those requests for how little they were staffed. The Waymo itself was fine and was prepared to make the correct decision, it just needed a human in the loop.

In the video from the parade... there's just... people in the road. Like, a lot of small children and actual people on this tiny, super narrow bridge. I think that erring on the side of "don't think you can make it but accidentally drag a small child instead" is probably the right call, though admittedly, these cases are a bit wonky.

sznio · 2026-02-07T20:28:04 1770496084

>The blackouts circumstance was because they escalate blinking/out of service traffic lights to a human confirmed decision

Which isn't really a scalable solution. In my city the majority of streetlights switch to blinking yellow at night, with priority/yield signs instead. I can't imagine a human having to approve 10 of these on any route.

xnx · 2026-02-08T16:08:46 1770566926

From their blog post they give the sense that they had the human review "just to be safe", but didn't anticipate this scenario. They've probably adjusted that manual review rule and will let the cars do what they would've done anyway without waiting for manual review/approval.

joshfee · 2026-02-06T22:11:57 1770415917

Isn't that true for any scenario previously unencountered, whether it is a digital simulation or a human? We can't optimize for the best possible outcome in reality (since we can't predict the future), but we can optimize for making the best decisions given our knowledge of the world (even if it is imperfect).

In other words it is a gradient from "my current prediction" to "best prediction given my imperfect knowledge" to "best prediction with perfect knowledge", and you can improve the outcome by shrinking the gap between 1&2 or shrinking the gap between 2&3 (or both)

notatoad · 2026-02-07T02:12:05 1770430325

seems like the obvious answer to that is you cover a patch of road with 5mm ball bearings, and send a waymo to drive across it. if the ball bearings behave the way the simulation says they would, and the car behaves the way the simulation said it would, then you've validated your simulation.

do that for enough different scenarios, and if the model is consistently accurate across every scenario you validate, then you can start believing that it will also be accurate for the scenarios you haven't (and can't) validate.

fooker · 2026-02-06T22:10:15 1770415815

> from a tornado to a casual encounter with an elephant

A sims style game with this technology will be pretty nice!

ses1984 · 2026-02-06T22:25:13 1770416713

You could train it in simulation and then test it in reality.

inkysigma · 2026-02-06T22:27:02 1770416822

Would it actually be a good idea to operate a car near an active tornado?

klysm · 2026-02-07T01:28:23 1770427703

It’s autonomous!

kylehotchkiss · 2026-02-10T02:25:01 1770690301

Kinda yeah, they tend to always travel northeast

bharrison · 2026-02-07T05:07:09 1770440829

The tornado?

gokuldas011011 · 2026-02-07T07:32:58 1770449578

ML models doesn't have fight or flight, so we'll have to show them tornado and teach to run away.

YeGoblynQueenne · 2026-02-07T18:53:23 1770490403

>> How do you know the generated outputs are correct? Especially for unusual circumstances?

You know the outputs are correct because the models have many billions of parameters and were trained on many years of video on many hectares of server farms. Of course they'll generate correct outputs!

I mean that's literally the justification. There aren't even any benchmarks that you can beat with video generation, not even any bollocks ones like for LLMs.

aaaalone · 2026-02-06T22:06:29 1770415589

They probably just look at the results of the generation.

I mean would I like a in-depth tour of this? Yes.

But it's a marketing blog article, what do you expect?

parliament32 · 2026-02-06T22:13:26 1770416006

> just look at the results of the generation

And? The entire hallucination problem with text generators is "plausible sounding yet incorrect", so how does a human eyeballing it help at all?

inkysigma · 2026-02-06T22:32:52 1770417172

I think because here there's no single correct answer that the model is allowed to be fuzzier. You still mix in real training data and maybe more physics based simulation of course but it does seem acceptable that you synthesize extremely tail evaluations since there isn't really a "better" way by definition and you can evaluate the end driving behavior after training.

You can also probably still use it for some kinds of evaluation as well since you can detect if two point clouds intersect presumably.

In much a similar way that LLMs are not perfect at translation but are widely used anyway for NMT.

aaaalone · 2026-02-07T19:37:16 1770493036

You should be able to see if it is generated wrong after you see a car driving in it.

I can spot Halluzination in LLM too