by Radiological Society of North America
Chest radiograph (CXR) examples of (A, C) local (feature-based) AI explanations and (B, D) global (prototype-based) AI explanations from a simulated AI tool, ChestAId, presented to physicians in the study. In all examples, the correct diagnostic impression for the radiograph case in question is “right upper lobe pneumonia,” and the corresponding AI advice is correct. The patient clinical information associated with this chest radiograph was “a 63-year-old male presenting to the Emergency Department with cough.” To better simulate a realistic AI system, explanation specificity was changed according to high (ie, 80%−94%) or low (ie, 65%–79%) AI confidence level: bounding boxes in high-confidence local AI explanations (example in A) were more precise than those in low-confidence ones (example in C); high-confidence global AI explanations (example in B) had more classic exemplar images than low-confidence ones (example in D), for which the exemplar images were more subtle. Credit: Radiological Society of North America (RSNA)
When making diagnostic decisions, radiologists and other physicians may rely too much on artificial intelligence (AI) when it points out a specific area of interest in an X-ray, according to a study published today in Radiology.
“As of 2022, 190 radiology AI software programs were approved by the U.S. Food and Drug Administration,” said one of the study’s senior authors, Paul H. Yi, M.D., director of intelligent imaging informatics and associate member in the Department of Radiology at St. Jude Children’s Research Hospital in Memphis, Tennessee.
“However, a gap between AI proof-of-concept and its real-world clinical use has emerged. To bridge this gap, fostering appropriate trust in AI advice is paramount.”
In the multi-site, prospective study, 220 radiologists and internal medicine/emergency medicine physicians (132 radiologists) read chest X-rays alongside AI advice. Each physician was tasked with evaluating eight chest X-ray cases alongside suggestions from a simulated AI assistant with diagnostic performance comparable to that of experts in the field.
The clinical vignettes offered frontal and, if available, corresponding lateral chest X-ray images obtained from Beth Israel Deaconess Hospital in Boston via the open-source MIMI Chest X-Ray Database. A panel of radiologists selected the set of cases that simulated real-world clinical practice.
For each case, participants were presented with the patient’s clinical history, the AI advice and X-ray images. AI provided either a correct or incorrect diagnosis with local or global explanations. In a local explanation, AI highlights parts of the image deemed most important. For global explanations, AI provides similar images from previous cases to show how it arrived at its diagnosis.
“These local explanations directly guide the physician to the area of concern in real-time,” Dr. Yi said. “In our study, the AI literally put a box around areas of pneumonia or other abnormalities.”
The reviewers could accept, modify or reject the AI suggestions. They were also asked to report their confidence level in the findings and impressions and to rank the usefulness of the AI advice.
Using mixed-effects models, study co-first authors Drew Prinster, M.S., and Amama Mahmood, M.S., computer science Ph.D. students at Johns Hopkins University in Baltimore, led the researchers in analyzing the effects of the experimental variables on diagnostic accuracy, efficiency, physician perception of AI usefulness, and “simple trust” (how quickly a user agreed or disagreed with AI advice). The researchers controlled for factors like user demographics and professional experience.
The results showed that reviewers were more likely to align their diagnostic decision with AI advice and underwent a shorter period of consideration when AI provided local explanations.
“Compared with global AI explanations, local explanations yielded better physician diagnostic accuracy when the AI advice was correct,” Dr. Yi said. “They also increased diagnostic efficiency overall by reducing the time spent considering AI advice.”
When the AI advice was correct, the average diagnostic accuracy among reviewers was 92.8% with local explanations and 85.3% with global explanations. When AI advice was incorrect, physician accuracy was 23.6% with local and 26.1% with global explanations.
“When provided local explanations, both radiologists and non-radiologists in the study tended to trust the AI diagnosis more quickly, regardless of the accuracy of AI advice,” Dr. Yi said.
Study co-senior author, Chien-Ming Huang, Ph.D., John C. Malone Assistant Professor in the Department of Computer Science at Johns Hopkins University, pointed out that this trust in AI could be a double-edged sword because it risks over-reliance or automation bias.
“When we rely too much on whatever the computer tells us, that’s a problem, because AI is not always right,” Dr. Yi said. “I think as radiologists using AI, we need to be aware of these pitfalls and stay mindful of our diagnostic patterns and training.”
Based on the study, Dr. Yi said AI system developers should carefully consider how different forms of AI explanation might impact reliance on AI advice.
“I really think collaboration between industry and health care researchers is key,” he said. “I hope this paper starts a dialog and fruitful future research collaborations.”
More information: Drew Prinster et al. Care to Explain? AI Explanation Types Differentially Impact Chest Radiograph Diagnostic Performance and Physician Trust in AI, Radiology (2024). DOI: 10.1148/radiol.233261, pubs.rsna.org/doi/10.1148/radiol.233261
Journal information: Radiology
Provided by the Radiological Society of North America
Explore further
ChatGPT’s diagnostic capabilities evaluated in comparison to radiologists: Could AI boost results?
Leave a Reply