A Carnegie Mellon Robotics Professor Untangles a Tesla, the Moon and a Streetlight
With autonomous vehicles, one of the fundamental elements of creating a safe and capable self-driving system (SDS) is being able to accurately perceive all objects in the car’s environment.
Recently, a Tesla owner cruising down a California highway received a surprising notification on their car’s dashboard display. The vehicle’s Autopilot driver-assist system spotted the moon high in the sky, perfectly circular and tinted yellow in a haze of wildfire smoke, and made an incorrect assumption: yellow traffic signal ahead!
To identify and navigate traffic signals, Tesla’s Autopilot relies solely on visual interpretation in the moment, an approach that was spotlighted in its misperception of the moon. In contrast to driver-assist, the approach taken by self-driving system developers like Argo AI, Waymo, and Aurora, uses a combination of data from multiple sensors and high-definition 3D maps to avoid mistaking streetlights for moonlight.
In order to understand the latter approach, and more about Tesla’s moon encounter, we spoke with Deva Ramanan, Professor in Carnegie Mellon University’s Robotics Institute and Principal Scientist at Argo AI.
Harry Spitzer: So, in a nutshell, what went wrong here?
Deva Ramanan: Only Tesla can answer that, but if I had to speculate what went wrong, it’s that Autopilot isn’t doing enough to place the visual that the car is seeing—a yellow orb—in context, to understand that it’s the moon and not a traffic light.
HS: How do you solve a problem like mistaking a traffic light for a moon?
Deva Ramanan: I would approach this technical problem from the perspective of how a human is able to distinguish between a traffic light and the moon. Even though a traffic light and the moon may resemble each other, a self-driving system should use a combination of contextual cues—including spatial, temporal, and prior knowledge—to tell them apart.
HS: How are these different contexts distinguished by an SDS like Argo’s?
DR: Spatial context means understanding the object in the context of its surroundings. For example, the night sky at dusk that surrounds the moon looks different from the standard black metal of a traffic-light fixture. By understanding what’s around the yellow orb, we can make a more educated guess as to whether or not that orb is a traffic signal or the moon. We can also rely on geometry—like identifying if what we perceive as a traffic light is housed in a box floating in the air 15 feet in front of us. Argo does this with proprietary lidar technology that gives us an accurate representation of the physical objects in our surroundings.
Temporal context means we don’t just identify the object in a fixed moment, we track that object over time. As a human, when you look at the moon and drive towards it for a matter of minutes, its position in the sky won’t change significantly relative to your eye line. A traffic signal, because of its relative closeness, will. In the same way, a car should be able to differentiate between the two because it knows that it’s moving closer to a traffic signal, and therefore that the signal’s position relative to the car ought to shift correspondingly.
Finally, an autonomous vehicle should arguably rely heavily on prior context—or prior knowledge of where things ought to be. In the context of Argo’s vehicles, that means relying on extensive 3D maps, which have extremely detailed information not only of where to expect traffic signals at which intersection, but also information like lane geometry, construction, and even the position of trees and statues in our surroundings.
HS: How does an autonomous vehicle developer ensure that their system works, and won’t make similar mistakes?
DR: In addition to using all three types of context in our traffic light perception system, we do extensive closed-course testing of our system to make sure that it’s prepared to properly identify traffic lights of all sorts in various states of disrepair. For example, we can test on a malfunctioning traffic light that registers two lights at once—or traffic lights of various heights and hues.
HS: Any final steps that ensure that we don’t confuse the moon for a traffic signal?
DR: Safety comes from, among other factors, redundancy in sensor types and methods applied for interpreting the data they produce. Argo vehicles utilize cameras, lidar and radar for a constant 360-view of the vehicle’s surroundings. I’ve walked you through the main methods—spatial, temporal, and prior context, and geometry (Lidar)—to illustrate the fact that we’re relying on a combination of sensors. This way our vehicles are designed to respond safely to traffic signals—or any lunar manifestation—they encounter.
As the Tesla moon predicament demonstrates, there is no end to the complex real-world situations that come into play when developing a self-driving system. This is why Level 4 autonomous vehicle companies, especially those which are successfully beginning to commercialize, take a multifaceted approach to testing and training an SDS.