For crowdsourcing organizations, accuracy is what counts. At CloudFactory, most of our clients are very happy with our “normal” accuracy rate, which is still very high compared to traditional BPO approaches.
But what if a business demands off-the-charts accuracy? At CloudFactory, we can fine-tune our accuracy rate—in fact, it’s one of the key “levers” or “dials” that our clients can adjust to meet their specific business needs.
For instance, we’re currently delivering greater than 99% field-level accuracy to two clients, for whom we process medical documents and business-critical invoices.
While such accuracy costs some additional time and money, our ability to deliver such accuracy ensures that CloudFactory isn’t just another face in the crowd.
How do we do it?
We Reduce Anonymity
Compared to Amazon’s Mechanical Turk, our version of the crowd model has a few built-in advantages. Most importantly, we manage our “crowd.” In fact, our crowd is hardly a crowd since it’s full of the familiar faces of our friends.
Each worker is managed by a full-time Core Team member called a CloudSeeder. Furthermore, each worker has accountability within a team that meets weekly face-to-face. Besides encouraging output that is more accurate than what an anonymous crowd can offer, this intra-team accountability also discourages people from trying to “game” the system. Cheating would hurt an entire team, not just the worker who gets caught.
Atomization of Work
At CloudFactory, we break down tasks into their smallest parts to avoid the drag that excessive business rules have on task completion.
When possible, we break each human decision into separate tasks. In other words, a single worker does not classify a single document (“Is this an invoice? Or a receipt?”) and then follow business rules to transcribe it. Rather, “Classify this document” is one task; “digitize this receipt” or “digitize this invoice” are also entirely separate tasks.
In practice, we break down tasks even further.
Consider, for example, our ideal workflow for an invoice. The worker who handles rudimentary transcription doesn’t have to know the ins-and-outs of invoices. Rather, s/he is simply asked to “type this date.” Then another worker receives that invoice and draws a box around different fields like invoice number, invoice date, and invoice amount. Our image processing engine—incidentally, we call it Khukuri, after the amazing Nepali knives that Gurkha soldiers wield—uses the drawn boxes to slice the image. Finally, our system distributes the different types of data—say, invoice number, date, and amount—to three different workers who only need to be able to type what they see.
Because we atomize the work, we can also hyper-specialize it.
In fact, we hyper-specialize to best fit our workers’ own aptitudes. Is one worker really great at typing, but unsure when it comes to identifying categories of documents (Invoice? Receipt?) or particular categories of data (Shipping dates? Order dates?)? Naturally, s/he gets typing tasks. Does another worker struggle to type, but enjoys spatial-visual tasks? Does that worker also enjoy determining specific types of information on an invoice--for instance,which date is the invoice date (not the shipping date, due date, service date, or order date)? That worker gets “slicing” or “marking” tasks.
The key to successful hyperspecialization is reputation management. Our system tracks workers’ skills, so that we know the qualifications of each worker to perform a given task. Our system is designed for customized (yet dynamic) precision-task dispatching.
To establish accurate results, traditional Business Process Outsourcing (BPO) uses a concept called “double keying,” where two workers key (or type) the same document. After the results are compared, any disagreements are curated or harmonized.
Since we hyper-specialize and atomize, we’re not trying to establish consensus about an entire document. Rather, we’re dealing with byte-sized chunks (pun intended). One worker performs the “content” task (type, mark, categorize).
Even before the worker has performed the task, we have already assembled a number of indicators to help us determine if the worker is completing the task correctly. Based on our confidence, we may have two other workers review the work as part of a process that we call “content-review.”
We have high confidence in results that meet two criteria: first, the two reviewers agree that the first worker was correct; second, the two reviewers are themselves satisfactorily reputable. Such results are now complete—end of story.
And if the results don’t meet those criteria? If the reviewers agree that the worker was wrong, we restart the task with another (better) worker. If there is non-consensus, or the reviewers are not satisfactorily reputable, we may send the results to as many as 5 more review workers. And if consensus still eludes us, we have processes built into our workflow for escalation, tie-breaking, and super-worker overrides.
Crucially, our content-review process is adaptable. We can configure your production line to have N reviewers per content-review cycle and to go through a max of Y content-review cycles. These settings ratchet the accuracy higher and higher. Naturally, increasing the number of reviewers or content-review cycles also ratchets up the cost—so your business needs will drive how much you invest to move the accuracy needle higher and higher.
Data science isn’t just another step in our accuracy-building process, but an ethos that pervades our way of doing things. We use data science to analyze, improve, and demonstrate many of the elements that we’ve discussed so far. Our team of four full-time data scientists identify powerful trends in data in order to make our product increasingly useful to your business-process workflow.
Our data science drives our confidence in our results. With our reputation and content-review data, we can also publish a composite confidence index from the various indicators that make us highly confident in the result being accurate or that perhaps the indicators point to this particular task being an especially difficult one.
Pushing field level accuracy from 95% to 96% takes a lot less effort than moving the needle from 98% to 99%. But in cases where accuracy is critical—say, electronic medical records or accounts payable documents —CloudFactory is ready to fight this good fight for you.