In recent years, we have witnessed impressive leaps in artificial intelligence, with computers that are able to interpret the world more clearly than ever. But it’s also important not to lose sight of the fact that these machine learning algorithms are ultimately trained by people, whose innate biases can end up being programmed in.
The fact that these biases can make their way into AI systems has worrying and potentially dangerous implications. In 2019, researchers at the Georgia Institute of Technology found that computer vision systems in self-driving cars were less likely to detect pedestrians with darker skin tones.
That kind of AI bias can enter the process during the early stages of AI development, typically during data collection and annotation. Taking every possible step to mitigate it is essential for the success of the project. In fact, the GIT team found that making slight changes to the autonomous driving algorithms during the training phase helped reduce racial biases.
How AI bias is similar to the HR hiring process
Human resources teams have a moral obligation to treat all candidates equally and fairly, regardless of factors such as ethnicity and age. However, the results of implicit association tests show us that people have unconscious biases that affect the decisions they make. For decades there have been important efforts within HR to build processes and educate everyone to avoid these biases throughout the hiring process.
The problem has been around far longer than many people realize. In 1988, a U.K. medical school was found guilty of discrimination after using a program that was biased against women and people with non-European names. The program was developed to determine which applicants would be invited to job interviews, and it was designed to match and speed human decision-making. Instead, it reflected the biases of the people who created the system. Over three decades later, we face similar problems on a bigger scale as we delegate more work to AI algorithms without applying many of our learnings to avoid bias by building the required rigor into our processes.
Mitigating AI bias during data preparation
AI bias typically creeps into models during the data collection and data labeling stages, either in the dataset itself or the processing of it. Datasets often lack diversity and reflect biased decision-making and pre-existing inequalities. For example, datasets used to train self-driving algorithms often don’t take into account conditions in other regions and countries. In another example, an AI trained to recognize human faces may experience lower accuracy recognizing darker skin tones if most of the images used to train it were photos of people with lighter skin tones.
While sensitive variables like ethnicity and gender might be removed to mitigate bias in certain cases, these differences might be essential in other cases. Sickle cell disease, for example, affects one out of every 365 African Americans, which is vastly higher than people of other ethnicities. If a medical AI designed to detect the disease hasn’t been trained using sufficiently diverse datasets, it simply won’t deliver the desired outcomes. Make sure you are using a representative sample of data for the outcome you need to achieve.
Applying ethical HR practices to AI training
Data preparation teams can learn from the approaches HR teams use to mitigate bias. In HR, mitigating bias may include diversified candidate sources, redacted information on resumes, standardized interview processes, or validated assessments. Using these and other approaches, HR teams can reduce unconscious biases, and HR teams are better equipped to make informed decisions based on competence, work ethic, and character.
The same can and should apply when training an AI model, including important training for those collecting and labeling the data. Human factors that can bias a model must be actively guarded against and addressed by all of the humans in the loop - from the people who gather and label the data to the people who build, train, tune, and test machine learning models. This is important throughout the AI lifecycle, from model development to maintaining models in production. Because bias can creep in at any stage of the AI lifecycle, including when refreshing models to train it on changing real-world conditions, it is important that everyone involved carefully consider their own biases and do what they can to prevent them from creeping into the AI lifecycle.
As such, it’s crucial to retain the right humans in the loop throughout every stage of the lifecycle. That’s only possible with a careful approach and a data annotation team that adheres to your unique requirements for annotating data. To learn about how we provide managed teams of data analysts who can scale your data annotation operations, contact us.
Or, to learn more about ethical data collection and use, AI bias, and mitigation tactics, check out my panel discussion from Cognilytica’s 2020 Data for AI Conference.