Human depth perception uses stereo out to only about 2 or 3 meters, after which the distance between your eyes is not a useful baseline. Beyond 3m we use context clues and depth from motion when available.
We do a lot more internal image processing. For example, relative motion as seen by either eye helps improve accuracy by a whole lot, in the "medium" distance range.
And I'll add that it in practice it is not even that much unless you're doing some serious training, like a professional athlete. For most tasks, the accurate depth perception from this fades around the length of the arms.
ok, but the point trying to be made is based on human's depth perception, but a car's basic limitation is the width of the vehicle, so there's missing information if you're trying to figure out if a car can use cameras to do what human eyes/brains do.
Humans are very good at processing the images that come into our brain. Each eye has a “blind spot” but we don’t notice. Our eyes adjust color (fluorescent lights are weird) and the amount of light coming in. When we look through a screen door or rain and just ignore it, or if you look outside a moving vehicle to the side you can ignore the foreground.
If you increase the distance of stereo cameras you probably can increase depth perception.
But a lidar or radar sensor is just sensing distance.
Radar has a cool property that it can sense the relative velocity of objects along the beam axis too, from Doppler frequency shifting. It’s one sense that cars have that humans don’t.
To this point, one of the coolest features Teslas _used_ to have was the ability for it to determine and integrate the speed of the car in front of you AND the speed of the car in front of THAT car, even if the second car was entirely visually occluded. They did this by bouncing the radar beam under the car in front and determining that there were multiple targets. It could even act on this: I had my car AEB when the second ahead car slammed on THEIR brakes before the car ahead even reacted. Absolutely wild. Completely gone in vision-only.
The company I used to work for was developing a self driving car with stereo depth on a wide baseline.
It's not all sunshine and roses to be honest - it was one of the weakest links in the perception system. The video had to run at way higher resolutions than it would otherwise and it was incredibly sensitive to calibration accuracy.