Artificial intelligence promises to expertly diagnose disease in medical images and scans. However, a close look at the data used to train algorithms for diagnosing eye conditions suggests these powerful new tools may perpetuate health inequalities.
A team of researchers in the UK analyzed 94 datasets—with more than 500,000 images—commonly used to train AI algorithms to spot eye diseases. They found that almost all of the data came from patients in North America, Europe, and China. Just four datasets came from South Asia, two from South America, and one from Africa; none came from Oceania.
The disparity in the source of these eye images means AI eye-exam algorithms are less certain to work well for racial groups from underrepresented countries, says Xiaoxuan Liu, an ophthalmologist and researcher at Birmingham University who was involved in the study. “Even if there are very subtle changes in the disease in certain populations, AI can fail quite badly,” she says.
The American Association of Ophthalmologists has shown enthusiasm for AI tools, which it says promise to help improve standards of care. But Liu says doctors may be reluctant to use such tools for racial minorities if they learn they were built from studying predominantly white patients. She notes that the algorithms might fail due to differences that are too subtle for doctors themselves to notice.
The researchers found other problems in the data, too. Many datasets did not include key demographic data, such as age, gender and race, making it difficult to gauge whether they are biased in other ways. The datasets also tended to have been created around just a handful of diseases: glaucoma, diabetic retinopathy, and age-related macular degeneration. Forty-six datasets that had been used to train algorithms did not make the data available.
The US Food and Drug Administration has approved several AI imaging products in recent years, including two AI tools for ophthalmology. Liu says the companies behind these algorithms do not typically provide details of how they were trained. She and her co-authors call for regulators to consider the diversity of training data when examining AI tools.
The bias found in eye image datasets means algorithms trained on that data are less likely to work properly in Africa, Latin America, or southeast Asia. This would undermine one of the big supposed benefits of AI diagnosis: their potential to bring automated medical expertise to poorer areas where it is lacking.
“You’re getting an innovation that only benefits certain parts of certain groups of people,” Liu says. “It’s like having a Google Maps that doesn’t go into certain postcodes.”
The lack of diversity found in the eye images, which the researchers dub “data poverty,” likely affects many medical AI algorithms.
Amit Kaushal, an assistant professor of medicine at Stanford University, was part of a team that analyzed 74 studies involving medical uses of AI, 56 of which used data from US patients. They found that most of the US data came from three states—California (22), New York (15), and Massachusetts (14).
“When subgroups of the population are systematically excluded from AI training data, AI algorithms will tend to perform worse for those excluded groups,” Kaushal says. “Problems facing underrepresented populations may not even be studied by AI researchers due to lack of available data.”
He says the solution is to make AI researchers and doctors aware of the problem, so that they seek out more diverse datasets. “We need to create a technical infrastructure that allows access to diverse data for AI research, and a regulatory environment that supports and protects research use of this data,” he says.
Vikash Gupta, a research scientist at Mayo Clinic in Florida working on the use of AI in radiology, says simply adding more diverse data might eliminate bias. “It’s difficult to say how to solve this issue at the moment,” he says.