As healthcare enters the fourth industrial revolution of ongoing automation and the use of smart technologies, artificial intelligence (AI) shows great promise in areas like drug discovery, patient activity tracking, and robot-assisted surgery. In fact, the National Library of Medicine predicts that AI applications could cut annual U.S. healthcare costs by 150 billion in 2026.

Yet, challenges remain for model development despite rapidly growing demand. These challenges include those intrinsic to machine learning, such as the need to address edge cases, prevent AI bias and drift, and provide an enormous amount of human work to build safe, accurate AI models with precise, secure data.

Overcoming these challenges is essential for delivering a meaningful and measurable impact on patient outcomes and medical research. Moreover, the healthcare sector comes with challenges of its own, and the ability to address these will be instrumental in ensuring the safe and timely transition of AI into real-world scenarios. After all, there is minimal room for error when it can mean the difference between sickness and health.

Let’s dig deeper into the four data hurdles in healthcare AI development:

  • Privacy and Regulatory Compliance
  • Data Availability and Collection
  • AI Bias
  • Industry Expertise and Quality Assurance

1. Privacy and Regulatory Compliance

All healthcare data is subject to stringent regulatory compliance such as HIPAA and HITECH in the U.S. and GDPR in Europe. These mandated directives are designed to ensure patient privacy and to govern the collection, management, and dissemination of personal health information (PHI), including healthcare records and medical imagery. Compliance in healthcare AI development is complicated because these data privacy regulations have been around much longer than AI and many of the other technologies that healthcare relies on today.

For healthcare AI to deliver on its promise, it needs to develop accurate training sets, which require huge amounts of data, such as microscopy for drug discovery, object recognition for radiology, and images for medical devices.

For example, biotech innovator Sartorius came to CloudFactory for help with the segmentation and annotation of complex cell imagery to create training data for AI cell identification. Sartorius’ Incycte Live-Cell Analysis System® automates live-cell imaging, producing far more microscopic images than any human could realistically analyze. Sartorius developed an open-source dataset of more than 5,000 images–1.6 million individually annotated cells–and used CloudFactory’s managed workforce to manually annotate the dataset.

In addition, there must be a transparent way for people to opt-in to contribute to development. Fortunately, in most cases, it is possible to anonymize healthcare data for use in AI model development without compromising its usefulness.

2. Data Availability and Collection

The enormous amounts of data required in healthcare AI training present logistical challenges as well. Depending on the purpose of the model, this may involve collecting data from a wide range of sources, such as electronic health records, insurance claims, pharmacy records, and consumer-generated data from devices like fitness trackers and wearable tech. Because this data is often fragmented across many different systems, it can require a major effort to curate comprehensive, high-quality data sets.

The continued digitization of healthcare records, combined with an easy and transparent way for patients to contribute to AI development, is critical for making this happen. Fortunately, new frameworks exist to treat clinical data as a public good while protecting patient privacy, making it easier for research and development teams to collect the data they need to build viable models.

3. AI Bias

Another challenge that factors during the data collection stage is the need for representative data sets. Without a high degree of diversity, AI bias can become a serious issue to the point of being detrimental to patient care and possibly causing incorrect diagnoses. For example, the Proceedings of the National Academy of Sciences (PNAS) study shed light on the importance of gender balance in training data sets used to train AI systems for computer-assisted diagnosis.

When creating a model to diagnose 14 common thoracic diseases using X-ray images, the PNAS study found a consistent decrease in performance for underrepresented genders when a minimum gender balance threshold was not met.

A related issue in AI bias is the model itself, which is created by humans who could introduce bias into the model. As Steve Nouri noted in the Forbes article “The Role of Bias in Artificial Intelligence", “Over the last few years, society has begun to grapple with exactly how much these human prejudices, with devastating consequences, can find their way through AI systems. Being profoundly aware of these threats and seeking to minimize them is an urgent priority when many firms are looking to deploy AI solutions.”

Nouri also references a Columbia University study that found “the more homogenous the [engineering] team is, the more likely it is that a given prediction error will appear.” This highlights that diversity within the engineering or data science team creating the model can be critical in reducing AI bias during development—before training data is even introduced.

4. Industry Expertise and Quality Assurance

Quality control and quality assurance are paramount in healthcare AI development. After all, not all AI is passive. When it comes to sorting through data for medical research, medical device outputs, and electronic health records, AI may be acting in the place of a human, looking for patterns to support clinicians and researchers. Because of this, there is a very low tolerance for errors, and healthcare AI models must undergo extensive training and optimization. Model specialization is often a necessity during these stages.

For example, when using AI in robotic surgery to complete precise, repetitive tasks, the images used for training require an extensive QA/QC process to ensure labeling accuracy for each training image. On the other hand, if the project involves labeling data for initial, exploratory medical research, where teams prioritize speed and direction setting, then a less rigorous QA process may be more appropriate.

The key is to ensure that you have quality assurance methods in place that appropriately match the acceptable margin of error rates for your project. That said, it's still essential to have a trained workforce that can easily scale and bring the aptitude, ability, and patience needed to learn and work efficiently without compromising on quality. It also helps to partner with an organization and workforce that understands the need for ethically designed AI systems.

Scaling Healthcare AI with a Managed Workforce

It might seem as though the challenges of medical data preparation and labeling can only be addressed by medical professionals, but that is not necessarily the case. After all, such an approach is difficult to scale, and it can be detrimental to crucial factors like patient outcomes if it involves taking medical professionals away from direct patient care or their research.

Due to the demands of quality control and assurance, traditional business process outsourcing and crowdsourcing are rarely suitable for medical data preparation. While it can often feel like the only option for this highly technical labeling is to get people trained in the specialization area to do the labeling, that can be a huge bottleneck and cost burden to scale.

However, partnering with a managed workforce experienced in annotating object recognition for applications like radiology imaging, cell identification, and robot-assisted surgery helps them to quickly learn and meet your data standards for safe, effective AI models. With a robust approach consisting of extensive vetting, skills development, and transparency, it will be possible to overcome the challenge of scale effectively.

Learn more about how CloudFactory’s managed workforce can help you meet the challenges of scale in clinical imaging and medical AI diagnostics.

Sartorius came to CloudFactory to speed the development of large training datasets for its machine learning models.

Computer Vision Healthcare AI & Machine Learning Data Entry

Get the latest updates on CloudFactory by subscribing to our blog