How Does a Self-Driving Car Safely Detect Pedestrians, Bicycles, Scooters, and Other Vulnerable Road Users?
Even though there are more than 190 breeds of dog in the world, young children see a four-legged animal with pointy ears and a wagging tail, and, regardless of its breed, know that it is “a dog.” What’s more, they can usually differentiate dogs from other similarly sized animals, like cats or sheep.
Children can do this because, from very early on, they’ve had dogs pointed out to them in the street, seen them in books, and played with them at the park. Children have been taught how dogs move, how they behave, and (hopefully) how to approach them safely.
What does this have to do with autonomous vehicles? Well, plenty.
Just like a human child, an autonomous vehicle is taught to recognize the presence of something in its environment, identify it, predict its likely actions, and safely navigate it. This is important for everything the vehicle encounters, but especially so for vulnerable road users, or VRUs—that is, cyclists, scooter riders, pedestrians, and other unprotected occupants of the road (like dogs!).
VRUs require special attention from self-driving cars for obvious reasons—they are not safely guarded by a steel cage, and lack the protection of safety devices such as seat belts and airbags. To emphasize how critical this is, the World Health Organization estimates that three types of VRUs—pedestrians, bicyclists, and motorcyclists—account for more than half of the 1.3 million annual global road-traffic deaths, with pedestrians and cyclists alone accounting for 26% of those fatalities.
Argo has put safety front and center of its corporate strategy. “The customer is actually the bike that’s riding next to us, the pedestrian that’s crossing in front of the vehicle,” CEO Bryan Salesky told Nilay Patel on the Decoder podcast. “The customer is really the environment around us in addition to whatever work the vehicle is doing at the time. It all matters, for a community to be okay with having self-driving cars.”
Teaching a car like you would a child
For the Argo Self-Driving System (SDS) to safely recognize a VRU—or even something in the background that it has never seen before, but that resembles or moves like a VRU—it is trained using data annotation, or labeling. “Labeling provides the ground truth—that is, the data sets required for machine learning,” explains Dave Chekan, a data engineer at Argo AI. “And machine learning is a major piece of the self-driving puzzle.”
Chekan describes his role as “teaching cars how to drive.” But compared to the child-and-dog analogy, “the car requires several orders of magnitude more examples.” These examples—images manually labeled, one by one, by a large team of annotators—allow the SDS to differentiate one object from another. It’s the job of Chekan’s labeling team to feed the SDS a steady stream of these images from which to learn. For any one object that the vehicle needs to detect, this means labeling literally millions of images to create a training data set.
Essentially, explains Nicolas Cebron, a senior computer vision manager at Argo AI, the team labels “things and stuff”: “‘Things’ are unitary and countable, such as pedestrians and cars. And ‘stuff’ is anything we cannot count, such as vegetation. After all, we’re not gonna label each tree, or shrub or blade of grass!”
One of the most common “things” that AVs must recognize on city streets are bicyclists. The labeling team could have already annotated countless images of cyclists on road bikes, but since the self-driving vehicle may just as likely encounter a cyclist hauling several bags of groceries in a basket, or a bicycle fitted with a child seat, or pulling a trailer, the system needs to also be fed images of those scenarios. This extends to more “anomalous” examples, too, says Chekan, like a cyclist riding a recumbent bicycle, or two people riding a tandem. “We look for anomalies,” he says, “and that’s what augments our ability to process millions of data samples with a relatively small team of people.”
Layering preparation for every encounter
While tandem bikes may be uncommon, they’re also relatively easy to recognize. But how does an autonomous vehicle deal with even more anomalous scenarios—for instance, a cyclist with a large guitar case strapped across their back?
Such rare sightings can’t be verified with the help of thousands of images, so in such instances, says Chekan, the team takes a single image and “we layer attributes, or what we call meta features, on top of the objects that we are labeling.” The team might manually add a label for the cyclist, for the bicycle, and for the “static object”—in this case, the guitar case—and then group each of those “things” as its own recognizable road user. This way, when the vehicle next encounters a cyclist lugging a guitar case, the system will detect the cyclist’s presence, recognize their current state, and predict their likely next actions.
Although labeled images are an important tool in the toolbox for identifying vulnerable road users and other actors and objects on the road, the Argo SDS has several other means of detection at its disposal. These include sensor fusion, or the use of several sensors in partnership—typically lidar, radar, and camera—so that the advantages of each sensor complement the limitations of the others. And, as Chief Technology Officer Dr. Brett Browning explains, the Argo SDS also uses a technique called open world detection to classify geometric objects that it hasn’t seen before: “A self-driving system that has been trained using labeled data can detect and classify known objects, and this is known as closed world detection. But there is a strong chance the vehicle will encounter objects it has never seen before, and so our system also contains components for detection of unrecognized geometric objects–and that’s open world detection.”
In such scenarios, the SDS classifies as a VRU any unidentified VRU-sized objects, or unidentified objects which move like a VRU, and treats them accordingly. “By using these different detection techniques,” says Cebron, “we have multiple layers of redundancy to ensure the safety of everyone on the road.”
Armed with the knowledge that what it has seen is indeed a person on a bicycle—or another type of VRU—the SDS guides the vehicle according to specifically defined principles designed to prioritize safe interactions.
The learning never stops
Like children learning the names of everything around them, the Argo SDS looks at countless images of everything that it might ever conceivably come across on the road. Testing in seven major cities is part of this strategy, increasing the variability and nuance of objects and events it might encounter. And as with kids, the learning never really stops, because their safety is paramount.
Once they learn about the existence of a dog, then it’s on to naming the next thing, and the next thing, and the next thing. For kids and self-driving systems alike, the more complicated and diversified their stockpile of knowledge, the smarter—and safer—they become.