AI training data operations are a lot like the assembly lines of yesterday’s factories. Data is your raw material, and you have to get it through multiple processing and review steps before it’s ready for machine learning. If you want to develop a high-performing ML model, you need smart people, tools, and operations. We hosted a webinar to discuss this topic with experts in workforce and tooling for machine learning. This is a transcript of that November 14, 2018 webinar. It includes minor edits for clarity.
“Houston, we’ve had a problem.” Astronaut Jack Swigert made the words famous when he communicated to NASA mission control that an explosion had rocked the Apollo 13 capsule that was transporting him and two other people to the moon in April 1970. To get the astronauts home safely, the engineers at Johnson Space Center in Houston, Texas would have to do something they had never attempted before: use the descent engines on the lunar lander to send it home.
NASA estimated that it took 400,000 engineers, scientists, and technicians to send astronauts to the moon on the Apollo missions. The massive workforce was comprised of people from four major enterprise companies and a host of subcontractors who worked for them.
Bringing artificial intelligence (AI) to life in the real world is a lot like the 20th-century “space race” for dominance in spaceflight capability. Few can fathom the level of innovation and sheer effort it takes. From model development and data prep to testing and deployment, AI requires a pioneering spirit, sharp minds, and a lot of hard work. AI innovators encounter countless challenges and frustrating defeats.
A Production Problem (Solved)
When Henry Ford attempted to produce the Model T at a rapid pace and with high quality, he ran into a problem. It was difficult to organize teams of specialized workers to assemble automobiles, and with so many workers needed to scale the process, it was highly inefficient. To make matters worse, late delivery of parts caused pile-ups of workers vying for space to work and delays in production.
Google’s three-day I/O’18 conference in Mountainview, Calif., last week brought together developers from around the globe for hands-on learning, discussion with experts, and a look at Google’s latest developer products. The conference also featured Google I/O Extended sessions held in technology hubs across the country, including a panel discussion that featured CloudFactory Chief Revenue Officer Mike Riegel.
Data is today’s gold for businesses, representing huge potential value. But there’s a catch: the data must be uncovered, clean, and structured.
For all of AI’s promises, we still need people to do a lot of work behind the scenes to make it all possible. People collect, enrich, clean, and prepare data for AI systems to operate accurately and optimally. In fact, data scientists spend countless hours cleaning and combining datasets, a process commonly referred to as “data wrangling.”
“[Video] data annotation is super labor-intensive. Each hour of data collected takes almost 800 human hours to annotate. How are you going to scale that?”
-Sameep Tandon, CEO of Drive.ai, an autonomous car startup in Silicon Valley and CloudFactory client