Autonomous Vehicles Depend on Good Data: Here's How We Help

preparing training data sets for autonomous driving

One of the coolest things for everyone working at CloudFactory is the amazing technology our customers are creating. Their spirit of innovation inspires each of us to push our own boundaries and solve new problems by thinking differently. We get to support our customers as they quite literally create the future, and that is a novelty that never wears off.

One area we’re particularly excited about is the development of AI for autonomous driving vehicles. Whether it’s working with a customer like Embark, who is creating the first ever autonomous driving trucks, or drive.ai, who is using deep learning to “build the brain of self-driving vehicles,” these are ambitious efforts that could lead to a safer, cleaner, and more productive future.

We help these customers by taking the painstaking tasks associated with preparing the massive datasets they need to fuel their algorithms. Often it comes in the form of processing thousands upon thousands of raw images, by enriching that data with labeled bounding boxes, providing scene annotation for semantic understanding, or providing 3-D point cloud annotations. All of these applications hold the promise of making machine vision algorithms safer, which in turn, brings them one step closer to reality.

Recently, the New York Times featured an article showcasing how the technology works, who the players are, and what role they’re playing. We thought it would be interesting to examine how data scientists and engineers are creating an autonomous driving future.

The fact is, the future is here.

“Autonomous cars have arrived. Uber has a fleet operating in Pittsburgh, Google’s parent company is closer to coming to market with its driverless project and the federal government has begun to issue guidelines on how the cars should work.” (From NYT Article)

Most animals (95%) use vision to navigate their environments, and people who are pushing the boundaries of AI believe their technology should do the same. We help our customers prepare accurate datasets for their computer vision algorithms.

For instance, notice the camera on top of the car in the image above. The raw images contain objects like road signs, traffic lights, or moving objects - like people. To train both their recognition and decision-making algorithms, we take their raw data and deliver it back with bounding boxes and labels that accurately categorize and identify those objects. These enriched images are then used to “teach” autonomous systems how to recognize the objects, and how to decide on the appropriate response.

The car’s sensors gather data on nearby objects, like their size and rate of speed. It categorizes the objects — as cyclists, pedestrians or other cars and objects — based on how they are likely to behave. (From NYT Article)

For those using lidar, an active laser sensor system illuminates the car’s surroundings, creating what are known as point clouds. We transform those images into 3-D annotations. This lidar data is annotated to provide accurate georeferenced coordinates that are used to replicate the reality of a car’s surroundings, creating the AI that makes the technology safer and more reliable. This makes it possible for our customers to build better learning systems, and ultimately, safer autonomous vehicles.

Another way we help our customers is by providing enriched data for contextual situations, otherwise known as semantic understanding. We help take image understanding from low-level image features to high-level semantics by identifying objects and events, which provides situational understanding to help our customers create more advanced learning systems.

These are just a few of the ways we’re helping our customers define the future of autonomous vehicles. The ugly truth is that data scientists spend close to 80% of their time on data preparation. This is an expensive proposition when you consider that their time could be used far more efficiently by solving complex problems, instead of processing thousands of images. We’re here to free them up by offering a dependable and elastic way to prepare accurate datasets so they can focus on building incredible technology.