How Open World Recognition Enables Autonomous Vehicles to Respond Safely to Anything They Encounter on the Road
Concrete jungles: a nickname given to urban areas, one that evokes the wildness of these environments, and the challenge of taming the seething masses of life within. For all of the order we impose on our urban spaces—the gridded streets, signs and traffic lights, pedestrian walkways and bike lanes—we will always encounter something unexpected. Maybe not every day, or even every month, but if you drive around long enough you’re bound to come across something you’ve never encountered before.
For decades, the minds behind autonomous vehicles have been hard at work designing systems capable of reacting appropriately to the things we’re certain we’ll encounter: people, cyclists, vehicles, road markings and so much more.
But what about everything else? What about those days when our vehicles encounter that three-headed monster costume on Halloween or a Dick Van Dyke-esque one man band—or an escaped pet crocodile on the loose. As much as we may try to make order out of chaos, the world doesn’t always conform to the vocabulary we’ve developed for it. So how do we build systems that can identify and react properly to examples of ‘otherness’? And how can we use those encounters to improve our system for future trips?
These are the questions that Professor Deva Ramanan, head of the CMU-Argo AI Center for Autonomous Vehicle Research, and CMU Postdoctoral Shu Kong have set out to answer in their latest paper, OpenGAN: Open-Set Recognition via Open Data Generation, recently awarded the David Marr Prize honorable mention at the International Conference on Computer Vision. They’ve put forth new methods by which vehicles can operate with something closer to open world recognition—the ability of a detection system to identify and react appropriately to objects outside of the subset of categories it’s been trained to recognize—methods that autonomous vehicle developer Argo AI will incorporate to enhance their self-driving system’s ability to respond safely and appropriately to anything it may encounter in on the road.
The first problem Ramanan and Kong are tackling is one of classification. At present, most detection systems ignore examples of otherness. They come across an image of a crocodile, know something is there, but discard the image once it fails to identify what it is. This is because the system assumes that image is useless—since it has no model to identify crocodiles, what use could it possibly have for the image in the future?
Instead of ignoring the errant crocodile altogether just because it’s not in their system’s defined subset of objects, Ramanan and Kong, “let’s still classify it as ‘other.’ At the very least stating that, ‘Okay, I don’t know what this is, but I know it’s not a person. I know it’s not a car. I know it’s not a motorcycle.’ Even that is a useful output,” says Ramanan.
They perform this classification task by comparing this instance of otherness to examples of objects the system already knows. “We try to find the thing in our existing dataset that’s closest to what we’re looking at right now, and then we look at the distance,” explains Ramanan. “And if it looks really different from any other image in my training set, then we flag that something out-of-the-ordinary is happening.”
This, however, introduces a problem of speed. It would require far too much compute power to reference an image against an entire training set every single time we encounter something new on the road. So instead, Ramanan and Kong have developed a binary model. As Ramanan explains, “one simple approach is to treat this problem as a classification task, which can be made very fast. You give the model an image and it assigns a label, zero or one—a part of our defined subset or not,” he says. “That’s the heart of why this binary model is so fast, it doesn’t have to look through millions of examples on-the-fly.”
By putting every ‘other’ into a single category, you give your system a form of open world recognition, allowing engineers to define how that system will react to anything unknown—be it a stray amphibian, pop-up roadside circus, or avante garde statue. Unsurprisingly, Ramanan and Kong’s first recommendation would be to “proceed cautiously.”
The second problem Ramanan and Kong are solving is one of wasted training data. “Many other detection systems, when they come across something they’ve never seen before, discards these images because their system said, ‘it didn’t belong to our subset of examples,’” says Ramanan, “but we’re saying, ‘actually you’re told that this is an ‘other,’ so use to improve your system.’” In this way, Kong and Ramanan’s system takes an ‘active learning’ approach, encouraging cars to catalogue new encounters for use in training datasets on future object detection pipelines.
However, even if their system catalogues every single crocodile it comes across, that likely won’t be enough to train a model. And then, even if they somehow managed to create a crocodile detection model, Ramanan explains, “it’s hard to test its accuracy because you’re so rarely coming across examples of this thing in the real world.”
In the meantime, to contend with the dearth of crocodile encounters in the wild—as well as an entire world’s worth of rare encounters—Ramanan and Kong have developed new methods by which to generate images of “otherness” to train their system.
Using a machine learning (ML) technology called Generative Adversarial Networks, or GANS, Ramanan and Kong are able to generate a large volume of images of ‘otherness’ that aren’t real, in the sense that the images are entirely fabricated by the ML engine. They then use these images to train detection models and test their veracity, helping them to become better at identifying otherness, and giving them a chance to test the accuracy of their classification system on ML-generated examples of otherness that they likely wouldn’t encounter organically in the real world. Put differently, you can’t know for certain how good your classification system is at correctly labeling a crocodile as ‘other’ until you encounter a crocodile in the wild. Since, fortunately, we don’t live in a world where streetside crocodile encounters are a regular occurence, GAN-generated images take the place of real-world encounters, allowing the engineers to both train these systems and confirm how effective they actually are.
“The hope is that generative machine learning architectures can be used to generate data of such rare open world things. And that’s how we could build something that can reliably tell us, this is an animal, but it’s not a dog or a cat,” explains Ramanan. “We really care about building models that can tell the difference between the known things that it’s seeing and these other open world things.”
As for those “known things” Ramanan mentions, Argo’s SDS has an ontology that consists of numerous categories of defined objects, ranging from vehicle classes to signs to types of pedestrians and animals. This is in parallel with open world recognition—the ability of the SDS to classify objects as ‘other’ or not—in part thanks to Ramanan and Kong’s research.
Ramanan and Kong’s new methods can be used to refine Argo’s open set recognition approach, which can identify otherness within a defined category, identifying traits it recognizes on an object or creature that it doesn’t recognize. For example, “you can have an ‘other’ inside the category of animal,” says Ramanan. The system may see four legs and a tail and understand that it’s likely seeing an animal that walks instead of flies, and can respond appropriately. “The goal of open world labeling is that even when you say that it’s an ‘other,’ you have more knowledge,” he continues “An animal-other is different than a vehicle-other, so [autonomous cars] can respond by doing two very different things [to ensure the car navigates the object safely].”
Cities will continue to evolve. Urban landscapes will always be populated with knowns and unknowns. As Argo continues to refine its system, they’ll keep looking to research like Ramanan and Kong’s, exploring the limits of open world recognition to ensure that its self-driving system can respond appropriately to anything it encounters in the wilds of the concrete jungles so many of us call home. (And eventually, everywhere else as well.)