Yeah its absurd. As a Tesla driver, I have to say the autopilot model really does feel like what someone who's never driven a car before thinks it's like.
Using vision only is so ignorant of what driving is all about: sound, vibration, vision, heat, cold...these are all clues on road condition. If the car isn't feeling all these things as part of the model, you're handicapping it. In a brilliant way Lidar is the missing piece of information a car needs without relying on multiple sensors, it's probably superior to what a human can do, where as vision only is clearly inferior.
Tesla went nothing-but-nets (making fusion easy) and Chinese LIDAR became cheap around 2023, but monocular depth estimation was spectacularly good by 2021. By the time unit cost and integration effort came down, LIDAR had very little to offer a vision stack that no longer struggled to perceive the 3D world around it.
Also, integration effort went down but it never disappeared. Meanwhile, opportunity cost skyrocketed when vision started working. Which layers would you carve resources away from to make room? How far back would you be willing to send the training + validation schedule to accommodate the change? If you saw your vision-only stack take off and blow past human performance on the march of 9s, would you land the plane just because red paint became available and you wanted to paint it red?
I wouldn't completely discount ego either, but IMO there's more ego in the "LIDAR is necessary" case than the "LIDAR isn't necessary" at this point. FWIW, I used to be an outspoken LIDAR-head before 2021 when monocular depth estimation became a solved problem. It was funny watching everyone around me convert in the opposite direction at around the same time, probably driven by politics. I get it, I hate Elon's politics too, I just try very hard to keep his shitty behavior from influencing my opinions on machine learning.
> but monocular depth estimation was spectacularly good by 2021
It's still rather weak and true monocular depth estimation really wasn't spectacularly anything in 2021. It's fundamentally ill posed and any priors you use to get around that will come to bite you in the long tail of things some driver will encounter on the road.
The way it got good is by using camera overlap in space and over time while in motion to figure out metric depth over the entire image. Which is, humorously enough, sensor fusion.
It was spectacularly good before 2021, 2021 is just when I noticed that it had become spectacularly good. 7.5 billion miles later, this appears to have been the correct call.
What are the techniques (and the papers thereof) that you consider to be spectacularly good before 2021 for depth estimation, monocular or not?
I do some tangent work from this field for applications in robotics, and I would consider (metric) depth estimation (and 3D reconstruction) starting to be solved only by 2025 thanks to a few select labs.
Car vision has some domain specificity (high similarity images from adjacent timestamps, relatively simpler priors, etc) that helps, indeed.
depth estimation is but one part of the problem— atmospheric and other conditions which blind optical visible spectrum sensors, lack of ambient (sunlight) and more. lidar simply outperforms (performs at all?) in these conditions. and provides hardware back distance maps, not software calculated estimation
Lidar fails worse than cameras in nearly all those conditions. There are plenty of videos of Tesla's vision-only approach seeing obstacles far before a human possibly could in all those conditions on real customer cars. Many are on the old hardware with far worse cameras
There's a misconception that what people see and what the camera sees is similar. Not true at all. One day when it's raining or foggy, have some record the driving, through the windshield. You'll be very surprised. Even what the camera displays on the screen isn't what it's actually "seeing".
Monocular depth estimation can be fooled by adversarial images, or just scenes outside of its distribution. It's a validation nightmare and a joke for high reliability.
It isn't monocular though. A Tesla has 2 front-facing cameras, narrow and wide-angle. Beyond that, it is only neural nets at this point, so depth estimation isn't directly used; it is likely part of the neural net, but only the useful distilled elements.
Always thought the case was for sensor redundancy and data variety - the stuff that throws off monocular depth estimation might not throw off a lidar or radar.
How many of the 70 human accidents would be adequately explained by controlling for speed, alcohol, wanton inattention, etc? (The first two alone reduce it by 70%)
No customer would turn on FSD on an icy road, or on country lanes in the UK which are one lane but run in both directions; it's much harder to have a passenger fatality in stop-start traffic jams in downtown US cities.
Even if those numbers are genuine (2 vs 70) I wouldn't consider it apples-for-apples.
Public information campaigns and proper policing have a role to play in car safety, if that's the stated goal we don't necessarily need to sink billions into researching self driving
There are a sizeable number of deaths associated with the abuse of Tesla’s adaptive cruise control with lane cantering (publicly marketed as “autopilot”). Such features are commonplace on many new cars and it is unclear whether Tesla is an outlier, because no one is interested in obsessively researching cruise control abuse among other brands.
Good ole Autopilot vs FSD post. You would think people on Hacker News would be better informed. Autopilot is just lane keep and adaptive cruise control. Basically what every other car has at this point.
"MacOS Tahoe has these cool features". "Yea but what about this wikipedia article on System 1. Look it has these issues."
Isn't there a great deal of gaming going on with the car disengaging FSD milliseconds before crashing? Voila, no "full" "self" driving accident; just another human failing [*]!
[*] Failing to solve the impossible situation FSD dropped them into, that is.
Seeing how its by a lidar vendor, I don't think they're biased against it. It seems Lidar is not a panacea - it struggles with heavy rain, snow, much more than cameras do and is affected by cold weather or any contamination on the sensor.
So lidar will only get you so far. I'm far more interested in mmwave radar, which while much worse in spatial resolution, isn't affected by light conditions, weather, can directly measure stuff on the thing its illuminating, like material properties, the speed its moving, the thickness.
Fun fact: mmWave based presence sensors can measure your hearbeat, as the micro-movements show up as a frequency component. So I'd guess it would have a very good chance to detect a human.
I'm pretty sure even with much more rudimentary processing, it'll be able to tell if its looking at a living being.
By the way: what happened to the idea that self-driving cars will be able to talk to each other and combine each other's sensor data, so if there are multiple ones looking at the same spot, you'd get a much improved chance of not making a mistake.
Lidar is a moot point. You can't drive with just Lidar, no matter what. That's what people don't understand. The most common one I hear: "What if the camera gets mud on it", ok then you have to get out and clean it, or it needs an auto cleaning system.
Maybe vision-only can work with much better cameras, with a wider spectrum (so they can see thru fog, for example), and self-cleaning/zero upkeep (so you don't have to pull over to wipe a speck of mud from them). Nevertheless, LIDAR still seems like the best choice overall.
Using vision only is so ignorant of what driving is all about: sound, vibration, vision, heat, cold...these are all clues on road condition. If the car isn't feeling all these things as part of the model, you're handicapping it. In a brilliant way Lidar is the missing piece of information a car needs without relying on multiple sensors, it's probably superior to what a human can do, where as vision only is clearly inferior.