What a Self-Driving Car “Sees” on a Public Road
The human eye is a marvel—capable of detecting particles of light as far as 2.6 million miles away (hello, Andromeda Galaxy!). But even this evolutionarily advanced organ can’t compete with what sensors in a self-driving car can “see”. Our eyes can’t see in 360 degrees. They can’t keep track of hundreds of objects at once. And they can’t calculate an object’s velocity and trajectory with absolute precision.
For that, you need what an autonomous vehicle has—namely 30 individual sensors taking in gigabytes of information about the world around it, every second. All of this data is fused together by the self-driving system (SDS) to create a three-dimensional model of the world, allowing the vehicle’s “brain” to perceive what’s going on and decide how to act next. Simple, right?
Not at all. In its raw format, the information streaming in from the vehicle’s sensors is incredibly complex: 2D images, coordinates in space, and lots and lots of numbers. While a self-driving system can process an immense amount of sensor data at once, it can be difficult for the software engineers working on the system to make sense of all of it. They need to create tools that allow them to input data and produce visual outputs.
The images below represent a single moment in time—an Argo test vehicle making a left-hand turn in an intersection in Midtown Miami—captured by a plethora of on-board sensors.
Cameras and Classification
The test vehicle is crowned by seven cameras, three of which capture the scene unfolding directly in the front of the car (as seen via the front camera), and four processing anything that might approach the vehicle from the left, right, or behind. Here, after stopping at a busy four-way intersection, the vehicle determines it has the right-of-way and safely enters the intersection. It stops to allow two pedestrians to cross the street. Each “actor” in this scene—including the FedEx truck creeping into the intersection; parked and moving vehicles; a bicycle parked against a street-pole; and all nearby pedestrians—are detected by the SDS and classified. The color-coded “masks” match each pixel of the image to an object classification. Orange masks indicate people, blue masks mean vehicles, and pink masks are for bicycles.
Lidar Goes to Work
The same scene is captured by the vehicle’s lidar sensors. The visual waves represent pulses of laser light beamed from sensors housed in the vehicle’s rooftop sensor pod. Whenever the beams hit an object, they bounce back to the vehicle’s sensors as unique points of measurement. Cumulatively, these hundreds of thousands of points generate a “point cloud,” a visual representation of the surface area of all the objects detected. The point cloud can be visualized in a multitude of ways, in this case as color fields that indicate the distance of the object from the vehicle. Each lidar sensor collects 10 of these images every second, ensuring that if anything moves suddenly, the vehicle will notice it.
Overlaying the 3D Map
It’s not enough to just “see” what’s happening around the vehicle–the SDS must also understand the parameters of the area where it’s operating. For this, it turns to a 3D model of the area, a meticulously constructed map of the city that features everything a driver (or, in this case, the Argo SDS) should know about the street, its infrastructure, and the laws that regulate it. Here, you will find lane markers that indicate all of the legal pathways for driving, street signs that dictate speed limits and rules of the road, and exactly-rendered traffic lights that communicate which lanes they control and the yielding relationships of all road users.
Predicting the Future
Once the vehicle has perceived its surroundings, it must make predictions about the intentions of all actors on the road. Here, the SDS highlights potential pathways for each of the nearby vehicles, allowing the Argo vehicle to anticipate multiple possible actions so that it’s not taken by surprise. The blue line is the route that the Argo vehicle intends to take. The multicolored lines indicate likely routes for other vehicles in the scene.
The World on High
It may look like a satellite-eye view of the city below, but this visual is actually part of a 3D model of the city created from the ground up (and definitely not from outer space). Used by Argo engineers to help visualize all of the physical and legal features of its surroundings—from no-turn lanes to bus stops to view-blocking foliage—the 3D map can be “zoomed out” to see the vehicle in its context: Just one small participant in a rich and complex ecosystem.
All in a Blink
As these examples demonstrate, there’s no such thing as a routine left-hand turn for a self-driving vehicle. In the level of detail they observe and the volume of data that they convey, these visualizations underscore how computer vision plays such a vital role in enabling safe and predictable driving. And it all happens in one-tenth of a second—a literal blink of an eye.