Ewen Callaway
A scan using functional magnetic resonance imaging, or fMRI, shows areas of the brain active during speech. Credit: Zephyr/Science Photo Library
In 2019, neuroscientist Scott Marek was asked to contribute a paper to a journal that focuses on child development. Previous studies had shown that differences in brain function between children were linked with performance in intelligence tests. So Marek decided to examine this trend in 2,000 kids.
Brain-imaging data sets had been swelling in size. To show that this growth was making studies more reliable, Marek, based at Washington University in St. Louis, Missouri (WashU), and his colleagues split the data in two and ran the same analysis on each subset, expecting the results to match. Instead, they found the opposite. “I was shocked. I thought it was going to look exactly the same in both sets,” says Marek. “I stared out of my apartment window in depression, taking in what it meant for the field.”
Now, in a bombshell 16 March Nature study1, Marek and his colleagues show that even large brain-imaging studies, such as his, are still too small to reliably detect most links between brain function and behaviour.
As a result, the conclusions of most published ‘brain-wide association studies’ — typically involving dozens to hundreds of participants — might be wrong. Such studies link variations in brain structure and activity to differences in cognitive ability, mental health and other behavioural traits. For instance, numerous studies have identified brain anatomy or activity patterns that, the studies say, can distinguish people who have been diagnosed with depression from those who have not. Studies also often seek biomarkers for behavioural traits.
“There’s a lot of investigators who have committed their careers to doing the kind of science that this paper says is basically junk,” says Russell Poldrack, a cognitive neuroscientist at Stanford University in California, who was one of the paper’s peer reviewers. “It really forces a rethink.”
The authors emphasize that their critique applies only to the subset of research that seeks to explain differences in people’s behaviour through brain imaging. But some scientists think that the critique tars this field with too broad a brush. Smaller, more detailed studies of brain–behaviour links can produce robust findings, they say.
Weak correlations
After his botched replication, Marek set out to understand the reasons for the failure together with Nico Dosenbach, a neuroscientist at WashU, and their colleagues. That work resulted in the latest study, in which they analysed magnetic resonance imaging (MRI) brain scans and behavioural data from 50,000 participants in several large brain-imaging efforts, such as the UK Biobank’s collection of brain scans.
Some of these scans gauged aspects of brain structure, for instance the size of a particular region. Others used a method called functional MRI (fMRI) — the measurement of brain activity while people do a task, such as memory recall, or while at rest — to reveal how brain regions communicate.
The researchers then used subsets drawn from these large databases to simulate billions of smaller studies. These analyses looked for associations between MRI scans and various cognitive, behavioural and demographic traits, in samples ranging from 25 people to more than 32,000.
In simulated studies involving thousands of people, the researchers identified reliable correlations between brain structure and activity in particular regions and different behavioural traits — associations that they could replicate in different subsets of the data. However, these links tended to be much weaker than those typically reported by most other studies.
Researchers measure correlation strength using a metric called r, for which a value of 1 means a perfect correlation and 0 none at all. The strongest reliable correlations Marek and Dosenbach’s team found had an r of 0.16, and the median was 0.01. In published studies, r values above 0.2 are not uncommon.
To understand this disconnect, the researchers simulated smaller studies and found that these identified much stronger associations, with high r values, but also that these findings did not replicate in other samples, large or small. Even associations identified in a study of 2,000 participants — large by current standards — had only a 25% chance of being replicated. More typical studies, with 500 or fewer participants, produced reliable associations around just 5% of the time.
Even larger studies
The study did not attempt to replicate other published brain-wide association studies. But it suggests that high r values common in the literature are almost certainly a fluke, and not likely to be replicated. Factors that hinder reproducibility in other fields, such as the tendency to publish only statistically significant results with large effect sizes, means that these spurious brain–behaviour associations fill the literature, says Dosenbach. “People are only publishing things that have a strong enough effect size. You can find those, but those are the ones that are most wrong.”
To make such studies more reliable, brain-imaging studies need to get much bigger, Marek, Dosenbach and their colleagues argue. They point out that genetics research was plagued by false positives until researchers, and their funders, started looking for associations in very large numbers of people. The largest genome-wide association studies (GWAS) now involve millions of participants. The team coined the term brain-wide association study, or BWAS, to draw parallels with genetics.
For brain imaging, Marek says, “I don’t know if we need hundreds of thousands or millions. But thousands is a safe bet.”
“What the Marek paper suggests is that a lot of the time, if you don’t have these really large samples, you are most likely wrong or lucky in finding a good brain–behaviour correlation,” says Caterina Gratton, a cognitive neuroscientist at Northwestern University in Evanston, Illinois. The paper appeared as a preprint in 2020, and Gratton says she has sat on grant-review panels that have cited it when raising scepticism over relatively small BWAS studies. “This is an important paper for the field,” she adds.
But some researchers argue that smaller BWAS studies still have value. Peter Bandettini, a neuroscientist at the National Institute of Mental Health in Bethesda, Maryland, says that studies such as the ones Marek’s team simulated looked for correlations between crude measurements of behaviour or mental health (self-reported surveys, for example) and brain scans whose conditions might vary from participant to participant, diluting bona fide associations.
By selecting participants carefully and analysing brain-imaging data using sophisticated approaches, it might be possible to find associations between brain scans and behaviour that are stronger than those identified in the study, says Stephen Smith, a neuroscientist at the University of Oxford, UK who leads the UK Biobank’s brain imaging efforts. “I fear this paper may be overestimating unreliability.”
Leave a Reply