Scaling Quality Training Data - Choosing the People in Your AI Tech Stack

Bringing artificial intelligence (AI) to life in the real world is a lot like the 20th-century “space race” for dominance in spaceflight capability. Few can fathom the level of innovation and sheer effort it takes. From model development and data prep to testing and deployment, AI requires a pioneering spirit, sharp minds, and a lot of hard work. AI innovators encounter countless challenges and frustrating defeats.

One of those challenges is access to talent that is in short supply. More than half (54%) of leaders report skill shortage as the biggest challenge facing their organizations, according to Gartner. Another is dirty data, which data scientists say is their number-one problem, according to a Kaggle survey. If you want to strategically deploy your team, you probably don't want your prized data scientists doing the tedious, time-consuming work of data labeling or annotation.

But they're likely mired in it. A massive amount of data must be gathered, structured, and quality-checked in the process of machine learning (ML). For example, to develop computer vision for a self-driving car, you’ll need people in the loop to annotate, or label, countless frames of driving video to teach the algorithm to “see” objects such as people, signs, trees, and vehicles. For each hour of video, there’s a staggering 800 hours of annotation work to do.

To process the big data that feeds your artificial intelligence, you need a reliable workforce with relevant domain expertise and high standards for quality. A growing number of innovators are using in-house staff, freelancers, contractors, and gig workers to get this massive amount of data work done, and as ML models proliferate, Deloitte predicts that trend will increase significantly over the next few years.

Use of Contractors on the Rise

New workforce models are emerging. Last year, the U.S. National Aeronautics and Space Administration (NASA) researched disruptors driving the future of work so it could evolve its talent strategies. The result was its Future of Work Framework, which encourages leaders to design for agility and focus on impact because “work today requires fluid talent to meet ever increasingly complex work, requiring multidisciplinary skills, delivered by teams of people, networked together that have overarching goals tied to organizational performance and productivity.”

In Deloitte’s 2018 Global Human Capital Trends report, half of respondents said they have a large number of contractors in their workforces. Deloitte maps the workforce ecosystem from traditional, full-time workers who have strong organizational context to open, crowd workforces who have little, if any, understanding of the organization’s overall strategy.

Deloitte Analysis - Workforce Ecosystem

And that’s the challenge with open crowdsourcing for AI development: domain expertise and context. Workers need more than the ability to tag, label, or annotate your data. They must understand the needs of your end user, the rules for your data, and context for the tasks they are doing if you want them to return quality data to train your machine learning algorithms. And quality is king, because when the people who tag, label, or annotate your data provide low-quality work, your model struggles to learn and progress is stalled. So while each task may be simple, how it fits into the larger picture of your end user’s experience isn’t as easy to teach someone quickly. That’s difficult to scale.

The People in Your AI Tech Stack

When you combine training and management challenges, your workforce choice might be the factor that determines your success. The right workforce gives you the flexibility to respond to changes in market conditions, product development, and business requirements. On the left side of Deloitte’s continuum, you’ll shoulder the burden of management with an in-house team. On the right side, quality work is likely to be a hurdle with crowds.

Here are your workforce options for cleaning and structuring data for AI:

  1. In-house employees can manage your data needs with reasonably good quality, and this approach works fine until it’s time to scale your model. Over time, these processes will grow more difficult and costly to manage, so you’re likely to join the growing list of companies that are turning to contractors, freelancers, and gig workers to structure data for AI development.
  2. Contractors and freelancers are another option but be sure to factor in the time it will take you to source and manage your team. One-third of Deloitte’s survey respondents said their human resources departments are not involved in sourcing (39%) or hiring (35%) decisions for contract employees, which “suggests that these workers are not subject to the cultural, skills, and other forms of assessments used for full-time employees.” That can be a problem when it comes to quality work, so allocate additional time for sourcing, training, and management.
  3. Crowdsourcing leverages the cloud to send data tasks to a large number of people at once. Quality is established using consensus, which means several people complete the same task, and the answer provided by the majority of the workers is chosen as the correct one. Crowd workers are paid based on the number of tasks they complete on the platform provided by the workforce vendor, so you could spend, on average, double the time processing data with a crowd than you would with an in-house team. The burden is on you to manage workers’ data outputs at scale.
  4. Managed cloud workers aren’t reflected on Deloitte’s continuum. This option has emerged over the last decade, and it combines the quality of a trained, in-house team with the scalability of the crowd. It’s ideal for data work because dedicated teams are steeped in your business rules and they stick with projects long-term, enabling them to increase their throughput and accuracy while providing consistent labeling quality. This model also provides a team that is in direct communication with you, enabling agile process iterations necessary for creating high-quality datasets. To learn more, start by reviewing these five steps to sourcing great-fit cloud labor.

The Bottom Line

From founders and data scientists to product owners and engineers, AI developers are fighting an uphill battle and they need all the support they can get, including a dedicated team to process massive amounts of data with high accuracy. As with the race to space, the race to build AI that solves real-world problems holds untold promise but victory won’t come easy. In AI, progress is hard-won, and innovators who identify strong workforce partners today will have the tools and talent they need to test their models, fail faster, and get it right sooner than their competitors.

This article is the first in a series of three about scaling your training data operations for AI. The next article will explore the hidden costs of the crowd.

New Call-to-action

Outsourcing Workforce Strategy Training Data AI & Machine Learning

Get the latest updates on CloudFactory by subscribing to our blog