AI can predict neuroscience study results better than human experts, study finds

by University College London

neuroscienceCredit: Pixabay/CC0 Public Domain

Large language models, a type of AI that analyzes text, can predict the results of proposed neuroscience studies more accurately than human experts, finds a study led by UCL (University College London) researchers.

The findings, published in Nature Human Behaviour, demonstrate that large language models (LLMs) trained on vast datasets of text can distill patterns from scientific literature, enabling them to forecast scientific outcomes with superhuman accuracy.

The researchers say this highlights their potential as powerful tools for accelerating research, going far beyond just knowledge retrieval.

Lead author Dr. Ken Luo (UCL Psychology & Language Sciences) said, “Since the advent of generative AI like ChatGPT, much research has focused on LLMs’ question-answering capabilities, showcasing their remarkable skill in summarizing knowledge from extensive training data. However, rather than emphasizing their backward-looking ability to retrieve past information, we explored whether LLMs could synthesize knowledge to predict future outcomes.

“Scientific progress often relies on trial and error, but each meticulous experiment demands time and resources. Even the most skilled researchers may overlook critical insights from the literature. Our work investigates whether LLMs can identify patterns across vast scientific texts and forecast outcomes of experiments.”

The international research team began their study by developing BrainBench, a tool to evaluate how well large language models (LLMs) can predict neuroscience results.

BrainBench consists of numerous pairs of neuroscience study abstracts. In each pair, one version is a real study abstract that briefly describes the background of the research, the methods used, and the study results. In the other version, the background and methods are the same, but the results have been modified by experts in the relevant neuroscience domain to a plausible but incorrect outcome.

The researchers tested 15 different general-purpose LLMs and 171 human neuroscience experts (who had all passed a screening test to confirm their expertise) to see whether the AI or the person could correctly determine which of the two paired abstracts was the real one with the actual study results.

All of the LLMs outperformed the neuroscientists, with the LLMs averaging 81% accuracy and the humans averaging 63% accuracy. Even when the study team restricted the human responses to only those with the highest degree of expertise for a given domain of neuroscience (based on self-reported expertise), the accuracy of the neuroscientists still fell short of the LLMs, at 66%.

Additionally, the researchers found that when LLMs were more confident in their decisions, they were more likely to be correct. The researchers say this finding paves the way for a future where human experts could collaborate with well-calibrated models.

The researchers then adapted an existing LLM (a version of Mistral, an open-source LLM) by training it on neuroscience literature specifically. The new LLM specializing in neuroscience, which they dubbed BrainGPT, was even better at predicting study results, attaining 86% accuracy (an improvement on the general-purpose version of Mistral, which was 83% accurate).

Senior author Professor Bradley Love (UCL Psychology & Language Sciences) said, “In light of our results, we suspect it won’t be long before scientists are using AI tools to design the most effective experiment for their question. While our study focused on neuroscience, our approach was universal and should successfully apply across all of science.

“What is remarkable is how well LLMs can predict the neuroscience literature. This success suggests that a great deal of science is not truly novel, but conforms to existing patterns of results in the literature. We wonder whether scientists are being sufficiently innovative and exploratory.”

Dr. Luo added, “Building on our results, we are developing AI tools to assist researchers. We envision a future where researchers can input their proposed experiment designs and anticipated findings, with AI offering predictions on the likelihood of various outcomes. This would enable faster iteration and more informed decision-making in experiment design.”

The study involved researchers at UCL, University of Cambridge, University of Oxford, Max Planck Institute for Neurobiology of Behavior (Germany), Bilkent University (Turkey) and other institutions in the UK, US, Switzerland, Russia, Germany, Belgium, Denmark, Canada, Spain and Australia.

More information: Large language models surpass human experts in predicting neuroscience results, Nature Human Behaviour (2024). DOI: 10.1038/s41562-024-02046-9

Journal information:Nature Human Behaviour

Provided by University College London


Explore further

As LLMs grow bigger, they’re more likely to give wrong answers than admit ignorance

Bryn Henderson, DO <[email protected]>Sat 30 Nov, 07:26 (2 days ago)
to BdsHossein[email protected]joRnSPraveenRebekah[email protected]SanaNathan[email protected]MarcelByounghi[email protected], Phil, me, Mhave, Arasi, Jenna

We best start using AI – LLM’s on every CG & protocol – probably need to revise staffing and office expertise.

Praveen please help revise the billing to codes reflective the added research and create a process to optimize the data needed (MTM’s, conf calls and data analysis, etc.) to honestly reflect the process – this may be a ton of time until the process is automated within the AI. Need to begin far more tele-consults daily –  with templates – probably at least 10 per providerand their assistants.

Need to incorporate in the new emr!!

HH


Leave a Reply

Your email address will not be published.