Picture of John Snowden

John Snowden

Feb 15, 2016

How to Turn Unstructured Data to Structured Data Using Microtasking

Your company is growing and so are the processes. Along with it, data is piling up with no structure, which makes it hard to analyze and make use of. This is the beginning of Unstructured Data.

Unstructured data is typically associated with text heavy documents and may contain data like dates, numbers or facts.  Structuring the data formats may require complex models and algorithms. One means to creating value from unstructured data is to break down the structuring process into microtasks.

A microtask is a small, repeatable task. Each instance of that task has unique input values which directly affect the output. It will be done hundreds of thousands times (or millions of times!) by workers.

Each microtask contributes toward the improvement of a business process. Its value is determined by the business need it fills for that company. Business needs range from medical assessment evaluations, to simple transcription, to human judgments.

What makes a good microtask?

Whatever that business process, it will ultimately succeed or fail based on the accuracy of the microtask’s results en masse. That accuracy is directly dependent on how well that task is set up. In order to maintain accuracy, the task must be:

  • Simple
  • Clear
  • Intuitive

A simple task must happen in as few decision-steps as possible. A clear task should have no ambiguity in its direction. An intuitive task will be obvious to the human eye as quickly as possible - even if some up-front training is required.

Additionally, a task that has a black-and-white answer versus several shades of gray will likely come back with better results. Often times when we’ve added the third “yellow flag” option to a task, workers tend to reach for that option - and that costs clients money. But without the middle option, the chips are down and the worker is going to have to say “yes” or say “no” - or simply skip it and let another worker do it.

(When our data science algorithms deem it necessary - multiple workers review the answer and our system wrangles them toward consensus with each worker being totally blind to the others’ answers.)

Processing unstructured data as microtasks allows the workflow to be broken down into diverse types of tasks: data gathering (example: web data), image data moderation, data transcription, and data or document digitization.

A microtask is only the tip of the crowdsourcing iceberg. In most cases, a microtask will be part of a larger workflow along side several other microtasks. In this case, each microtask has its inputs and outputs while the workflow itself has its own inputs and outputs. One microtask’s output informs the next microtask’s input. Algorithms interact with them throughout the workflow, and a business process is enhanced.

The New AI Factory Model [Webinar]

Data Science

Recent Posts

Subscribe to CloudFactory Blog