Artificial intelligence is a big buzzword in discussions about the future of healthcare, as well as other industries. In fact, just this month, the Food and Drug Administration approved the first AI software that can interpret images and determine whether a patient has diabetic retinopathy without the assistance of a physician.
While AI has the potential to revolutionize modern medicine, there are challenges as well. A 2017 Stat investigation found that Watson for Oncology fell far short of the expectations IBM set for it.
At the Association of Health Care Journalist’s 25th conference in Phoenix last month, Barrow Artificial Intelligence Center director Igor Barani, law and bioethics professor Pilar Ossorio, and Stat national correspondent Casey Ross offered some key takeaways for journalists in a panel moderated by Stat managing editor Gideon Gil.
Pay attention to how algorithms are trained and what kind of data sets they’re trained on
Getting health care data out of a system isn’t easy for a variety of reasons. Hospitals are concerned about following regulations and protecting the privacy and confidentiality of their patients (and rightly so). There are technical concerns as well: different systems personalize or tailor software for their own use, or may use electronic health records in different ways. Data siloing is problematic, but combining different data sets can have its own issues as well.
An algorithm can struggle with the problem of overfitting, when it learns so well on one dataset that it doesn’t perform as well on a new set of data. When algorithms that artificial intelligence relies on are trained using only one data set or on limited data sets, that can lead to racial or gender disparities.
One example of this is in machine learning predicting the starting dose of warfarin, a medication used as an anticoagulant. Warfarin has a narrow window of effectiveness: it’s ineffective if too little is used, but patients can bleed out if the dose is too high. Research found that machine learning not only didn’t predict the starting dose any better than a simple clinical algorithm, but actually predicted it worse for African American patients. That’s because the algorithm was trained on data sets from the Midwest and had very little information on African Americans.
A 2015 study in Blood, the peer-reviewed medical journal published by the American Society of Hematology, shows that “the influence of known genetic variants on warfarin dose differs by race.” The race of the population set a machine is trained on is important.
Cleaning data can introduce bias
Ossorio pointed out that problems can arise when data are cleaned prior to training the algorithm. For example, sometimes records of people who don’t have enough touches within the healthcare system are dropped to speed up the algorithm.
But because people who are poor have fewer consistent touches with the health care system, this seemingly innocuous decision could be systematically dropping out data from poor people and exacerbating systematic patterns in the data used to train algorithms. It matters what kind of data the algorithm is trained on, and what decisions are made to clean that data — and that’s something business journalists should be asking questions about.
Algorithms can find misleading information
Even if algorithms are trained in a technical way, they can still find information that is misleading without further analysis. For example, an algorithm showed that people who have asthma who get pneumonia are less likely to die. However, research shows that people who have asthma and lung problems actually have a higher risk of dying. The algorithm did not account for the fact that doctors are more likely to give intensive treatment to patients with asthma who are having breathing problems, thus lowering their risk of death. This level of granular detail and causal information is simply not available in a single algorithm.
In addition to asking about how algorithms are trained, what kind of data sets they’re trained on, and ow the cleaning process changes the data set from the original, Ossorio recommends asking other questions that get at these details: How many times did you train the algorithm? What kinds of variability did you find? What unexpected things did you find? How do you envision the intended user? How do you plan to communicate the intended uses or to the intended user?
The FDA struggles to keep up with emerging tech
The FDA needs to be aware of how algorithms are trained, too. Ossorio and Moss both raised questions about whether the FDA has the staff or capacity to even grasp what it needs to understand in order to guide or oversee the marketing of healthcare algorithms. If the FDA can’t even distinguish between safer and more dangerous AI algorithms, it’ll be difficult for it to determine where to put its oversight focus, said Ossorio. And Moss agreed that recent FDA guidance failed to answer big questions about whether artificial intelligence is safe and effective.
The business questions
Barani pointed out that finding operational efficiencies in healthcare systems is most fertile ground for artificial intelligence. It’s the most likely application and is less dangerous for patients. Diagnosis, he said, will be more challenging.
Casey Moss found that it’s often difficult to find out how much hospitals are paying for artificial intelligence, which is always worth digging into, especially for business journalists. Whether the tools are valuable is another open question.
When hearing about the value of new AI technology, it’s worth asking how the machines are better than the old system. While Moss found that providers are often more than willing to discuss any glitches with the systems and how difficult they are to use, he recommends being open to the benefits of this technology as well.