4 Data Hurdles in Medical AI Development

As healthcare and medicine enter the fourth industrial revolution of ongoing automation and use of smart technologies, artificial intelligence (AI) shows great promise in areas like drug discovery, patient diagnosis, and treatment. Yet, challenges remain for model development despite rapidly growing demand. These challenges include those intrinsic to machine learning, such as the need to address edge cases, prevent AI bias and drift, and provide an enormous amount of human work to continuously train data and validate and optimize machine learning models.

Overcoming these challenges is essential for delivering a meaningful and measurable impact on patient outcomes and medical research. Moreover, the healthcare sector comes with challenges of its own, and the ability to address these will be instrumental in ensuring the safe and timely transition of AI into real-world scenarios. After all, there is minimal room for error when it can mean the difference between sickness and health.

1. Privacy and Regulatory Compliance

All healthcare data is subject to stringent regulatory compliance such as HIPAA and HITECH in the U.S., and GDPR in Europe. These mandated directives were introduced to ensure patient privacy and to govern the collection, management, and dissemination of personally identifiable information (PII), including healthcare records and medical imagery. Compliance in medical AI development is complicated by the fact that these data privacy regulations have been around much longer than AI and many of the other technologies that healthcare relies on today.

For medical AI to deliver on its promise, it is necessary to have huge amounts of data, such as MRI scans and X-rays, to develop appropriate training data sets. As such, there must be a transparent way for people to opt in to contribute to development. Fortunately, in most cases, it is possible to anonymize healthcare data for use in AI model development, without compromising its usefulness.

For example, V7 Labs recently worked with CloudFactory to release an annotated X-ray dataset to aid in COVID-19 research. V7 collected 6,000 lung images from multiple open-source datasets—a mix that included patients with and without COVID-19—and helped train CloudFactory’s managed workforce to combine AI-driven auto-labeling and precise human-led image annotation to optimize the data for machine learning.

2. Data Availability and Collection

The enormous amounts of data required in medical AI training presents logistical challenges as well. Depending on the purpose of the model, this may involve collecting data from a wide range of sources, such as electronic health records, insurance claims, pharmacy records, and consumer-generated data from devices like fitness trackers and wearable tech. Because this data is often fragmented across many different systems, it can require a major effort to curate comprehensive, high-quality data sets.

The continued digitization of healthcare records, combined with an easy and transparent way for patients to contribute to AI development, is critical for making this happen. Fortunately, new frameworks are being developed to treat clinical data as a public good, while protecting patient privacy, making it easier for research and development teams to collect the data they need to build viable models.

3. AI Bias

Another challenge that factors in during the data collection stage is the need for representative data sets. Without a high degree of diversity, AI bias can become a serious issue to the point of being detrimental to patient care. For example, AI designed to detect sickle cell disease must take into consideration the fact that the sickle cell trait is far more prevalent among African American communities and those with Central and South American ancestry. Thus, there needs to be an accurate, corresponding representation of those populations in the data sets used to train the model.

A related issue in AI bias is the model itself, which is created by humans who could introduce bias into the model. As Michael Li noted in his recent Harvard Business Review article, “The bias in our human-built AI likely owes something to the lack of diversity in the humans that built them.” It has, therefore, been noted that diversity within the engineering or data science team creating the model can be critical in reducing AI bias during development, before training data is even introduced.

4. Industry Expertise and Quality Assurance

Quality control and quality assurance are paramount in medical AI development. After all, AI in healthcare is no longer purely passive in nature. In addition to supporting practitioners in the decision-making process, AI may, in certain applications, act in their place. Because of this, there is a very low tolerance for errors. To ensure patient and provider trust, medical AI models must undergo extensive training and optimization. Model specialization is often a necessity during these stages.

For example, if AI is to interpret X-rays and MRIs to recommend a diagnosis and directly impact patient care, then the images used for training should go through an extensive QA/QC process to ensure labeling accuracy for each training image. On the other hand, if the project involves labeling data for initial, exploratory medical research, where speed and direction setting are prioritized, then a less rigorous QA process may be more appropriate. The key is to ensure that you have quality assurance methods in place that appropriately match the acceptable margin of error rates for your project. That said, it is still essential to have a trained workforce that can easily scale and bring the aptitude, ability, and patience needed to learn and work efficiently without compromising on quality.

Scaling Medical AI with a Managed Workforce

It might seem as though the challenges of medical data preparation and labeling can only be addressed by medical professionals, but that is not necessarily the case. After all, such an approach is difficult to scale, and it can be detrimental to crucial factors like patient outcomes if it involves taking medical professionals away from direct patient care or their research.

Due to the demands of quality control and assurance, traditional business process outsourcing and crowdsourcing are rarely suitable for medical data preparation. While it can often feel like the only option for this highly technical labeling is to get people trained in the specialization area to do the labeling, that can be a huge bottleneck and cost burden to scale. However, partnering with a trained managed workforce experienced in annotating highly specialized medical imagery can alleviate the burden without sacrificing quality or losing oversight. With a robust approach consisting of extensive vetting, skills development, and transparency, it will be possible to overcome the challenge of scale effectively.

Learn more about how our managed workforce can help you meet the challenges of scale in medical data annotation.

Medical Image Tagging Made Easier

Computer Vision Healthcare AI & Machine Learning Data Entry

Get the latest updates on CloudFactory by subscribing to our blog