The Medical Futurist | 3 min | 17 January 2023
Large language models (LLMs), these excitingly versatile algorithms became a topic of general conversation in December 2022, when OpenAI released its GPT3 agent, also known as ChatGPT. LLMs are developed to carry on conversations in human-like ways, they are designed to understand complex queries and respond in a nuanced manner. We introduced potential medical use cases in this article.
By now we all have a track record with some kind of a chatbot – if nothing else, a not-very-bright algorithm at a service provider. These interactions rarely left anyone particularly impressed, they sometimes contribute to solving our problems, but more often than not just result in a frustrated user leaving with a promise of contact from a human support staff member that may or may never happen.
Large language models undoubtedly have changed this field forever, they are capable of such high-quality assistance that was never seen earlier. Only a few short weeks after the release of ChatGPT, Google/DeepMind announced the release of MedPaLM, a large language model specifically designed to answer healthcare-related questions, based on their 540-billion parameter PaLM model.
This model was trained on six existing medical Q&A datasets (NedQA, MedMCQA, PubMedQA, LiveQA, MedicationQA, and MMLU), and the developer teams also created their own HealthSearchQA, using questions about medical conditions and the associated symptoms.
At the moment MedPaLM can’t be tested by the general public, but you can read the researcher’s paper here.
It “performs encouragingly but remains inferior to clinicians”
The document lists a number of possible medical applications, including knowledge retrieval, clinical decision support, summarisation of key findings in studies, and triaging patients’ primary care concerns among others, but also noted that MedPaLM “performs encouragingly, but remains inferior to clinicians.”
The attached diagrams show that MedPaLM was still underperforming human clinicians in several areas:
- incorrect retrieval of information was 16.9% for Med-PaLM, which compares to 3.6% of clinicians
- incorrect reasoning was seen in 10.1% of the MedPaLM answers and in 2.1% of clinician answers
- incorrect comprehension happened in 18.7% of cases for the algorithm and in 2.2% for the clinicians
Some examples of answers crafted by MedPaLM
Large language models may easily be the best option we’ll have for medical consultations
Although the model is obviously not perfect, it does significantly better than any previous algorithms, and the field is improving fast. The LLM-based chatbot algorithms provide a never-before-seen quality of human-AI interaction.
This is what I think will happen: as these models get better and better, the risk of missing care due to capacity shortages in healthcare will soon outweigh the risk of the algorithms being wrong. We will be better off familiarising ourselves with communicating with such an LLM algorithm – purely because long waiting for medical answers due to the lack of healthcare personnel will pose a higher threat.
Live consultation with a doctor will become a luxury in the 21st century, and any solution addressing this issue (from asynchronous telemedicine to medical chatbots) will actually improve our health prospects.
Leave a Reply