When it comes to your data processing operations, how important is quality to you?
Each year, billions or trillions of finance-related documents and data points flow into your systems for processing. Are you confident that the quality of the processed output is high?
Technologies like optical character recognition (OCR), robotic process automation (RPA), and cognitive automation (think natural language processing and machine learning) can increase your confidence to a degree.
But what about exceptions—the documents and outliers that need a human touch before entering into your systems?
After all, whether you’re doing the work in-house or outsourcing, you never really know what was happening with your processing lead last month when, unbeknownst to you, she was in the midst of a painful personal situation. Was her mind really on her work? And who could blame her? Did errors and faulty data unintentionally slip through, muddying your metrics or models?
Across industries, the operations leaders we work with care deeply about quality. In the fintech and finserv spaces, ops leaders tend to speak of quality in terms of accuracy, which makes sense—when it comes to numbers, the accuracy of those trailing decimals matters.
Enter quality assurance (QA), quality control (QC), and continuous improvement.
For our clients in the fintech and finserv spaces, we rely heavily on all three methods to keep a tight grip on data quality. In this post—the first in a three-part series—you’ll peek behind the curtains at CloudFactory’s most common QA workflow methods.
First up—QA, and the three workflow models we most often use and recommend to our clients in the finance industry: Single Pass, Review and Improve, and Consensus.
The Single Pass QA Method
The American Society for Quality says that the goal of quality assurance is confidence—internal confidence for you, and external confidence for clients, regulators, and other stakeholders that quality requirements have been fulfilled. Quality assurance focuses on the right-here, right-now processes that ensure data quality in real-time—complete a task, check it, rinse and repeat.
To fulfill your quality requirements and optimize your data processing operations, you need processes for delivering quality outputs. You also need pre-planned actions for evaluating and correcting quality issues.
The Single Pass method, which involves no formal quality assurance at all, maximizes throughput by having a data analyst complete each task once; all data outputs enter directly into your system or process.
Single Pass is best for simple, objective use cases where subject matter experts or experienced data analysts complete the tasks. Companies also use the Single Pass approach when they need a large quantity of data processed irrespective of quality.
Example financial services use cases for the Single Pass method
- Data classification: Is it a receipt or an invoice? Is the amount positive or negative?
- Data verification: Is a receipt or invoice available or not? Is the URL valid or not?
- Data enrichment: Given the company name, what is the URL?
Quality, Edge Cases, and OCR Technology
Companies across the financial services and financial technology spaces use optical character recognition (OCR) to process and extract data from a wide range of images and documents:
- Bank statements
- Mortgage statements
- Insurance dec pages
- Powers of attorney
- Court orders
- Trust certifications
- Business cards
- Photo IDs
- Property tax bills
- Pension award letters
- Insurance policies
- Tax returns
- Divorce decrees
- Lease agreements
- Purchase orders
Most of our finserv and fintech clients use OCR to optimize operational efficiency. But given that OCR technology is notoriously prone to error, the human-powered work we do for those clients is to provide quality assurance and quality control on OCR outputs and handle edge cases that do not fit the technology’s criteria.
The Review and Improve QA Method
In the Review and Improve QA method, a senior data analyst or subject matter expert reviews the outputs of junior analysts before those outputs enter your system or process. Accurate outputs pass through. Inaccurate outputs flow to the reviewer for correction or back to the junior analyst with feedback. The junior analyst then corrects their errors and resubmits the data for another round of review. Iterating in this way can lead to any number of Review-Improve cycles for a single task or data output.
This approach leads to higher accuracy for most use cases. It’s appropriate for tasks that involve some ambiguity and subjectivity and when data quality is paramount. Review and Improve is also the method we recommend our clients consider first.
The Review and Improve method gives you multiple levers to pull, including time, throughput, and accuracy. Because two analysts must touch the data at least once before a task is complete, throughput is lower, and accuracy is higher. Depending on the task complexity and type, the QA portion of the work can take more, less, or the same time as the initial task.
You can also limit how much work to review. According to our client delivery experts, reviewing 10% of work is common, especially as an analyst group improves. It’s also common to review the work of new junior analysts as they get up to speed.
Generally, expect QA to take less time than the initial task. For data enrichment, web scraping, and NLP use cases, expect QA time closer to the initial task time. And for more straightforward tasks, such as data verification or image transcription, expect QA time to be less than the initial task time.
Also, note that as the number of cycles of Review and Improvement increases, throughput will decrease, and quality will increase.
Example financial services use cases for the Review and Improve method
- Receipt transcription: From any receipt, transcribe the store name, amount, credit card details, date, and store location.
- Data enrichment: Complete the location-based matrix by determining whether a business belongs to a specific site.
- Image transcription: From scanned images of invoices, extract data for given fields.
The Consensus QA Method
In the Consensus method, two or more data analysts complete the same task, and the final result flows into your system or process only after the analysts reach consensus. Discrepancies are placed back into the work pool and resolved by a subject matter expert or senior data analyst.
This “wisdom of the crowd” approach, or inter-analyst/inter-annotator agreement, ensures higher accuracy as compared to the accuracy achieved by a single analyst.
You can use this agreement to:
- Identify the least and most reliable analysts in a pool.
- Calculate an analyst’s consistency over time.
- Create reference standards, sometimes called gold standards, benchmarks, or ground-truth answers.
- Escalate complex or confusing tasks.
Although multiple variations of the Consensus QA method exist, all involve the same input being processed more than once by two or more analysts.
Example financial services use cases for the Consensus method
- Invoice transcription: Split, sort, and transcribe data for a niche invoice automation platform.
- Data categorization: Extract and categorize a set of information from varying document types.
- Sentiment analysis: Identify whether a sentiment is neutral, positive, or negative.
Side-by-side comparison of the three QA methods
Would you like to see a side-by-side comparison of the three QA methods? Download the Outsourced Data Processing white paper, which includes a comparison chart, plus a detailed discussion of quality control and continuous improvement.
If you found this post helpful, you might want to read the next post in this series—Quality Control for Fintech and Finserv Data Processing: Sustaining High Accuracy. That post talks about quality control—the future-focused, scientific approach for predicting where quality is headed.
CloudFactory is driving innovation in the financial services and financial technology industries by processing hundreds of millions of documents each year for visionary companies across the finance industry. We bring you a clean, well-described set of ideas around performance management, scaling, and elasticity to move volume up and down while maintaining high speed and accuracy. Learn more about our data processing work here.