The new algorithm was trained on dozens of complete human genomes and has a 95% accuracy rate at identifying complex structural variants, which can consist of long stretches of DNA. The chromosome on the right has a complex structural variant—a missing segment of DNA (B, orange) and a section of DNA that flipped around backwards (C, purple)—that the chromosome on the left does not have. Credit: Emily Moskal/Stanford Medicine
The 3 billion base pairs that constitute the human genome—the matching jigsaw puzzle pieces of adenine pairing with thymine and cytosine pairing with guanine—are not just the body’s instruction manual. Rearrangements in the order of those base pairs are markers of the origins of disease and of our evolutionary history. They can be simple, when a handful of base pairs switch places. They can also be complex, such as when a stretch of tens of thousands of base pairs inverts and is missing multiple sections.
Current state-of-the art techniques for reading out the genome, called whole-genome sequencing, are suitable for finding simple variations but they fall short when it comes to finding complex structural variations. Now a new Stanford Medicine-led study has developed an artificial intelligence-based method capable of identifying complex structural variants from whole-genome sequencing data.
The study, which was published Sept. 30 in Cell, created a catalog of complex structural variants using more than 4,000 human genomes from around the globe. These variants often occur in genes governing the brain and were found in regions of the genome linked to human evolution.
The researchers also showed that some of the complex structural variants affected how the instructions contained in brain-related genes were read out in the brains of people who had been diagnosed with schizophrenia or bipolar disorder.
“This work is a major step forward in figuring out the genetic and molecular basis for psychiatric disorders and suggests that brain-related diseases and in general disorders that have a strong genetic component should have a complex structural variant analysis,” said senior author of the study Alexander Urban, Ph.D., associate professor of psychiatry and behavioral sciences, and of genetics.
“Any whole genome sequence should be run through this new algorithm; this will allow us to unearth important answers in the data that are currently ignored.”
Urban and Wing Wong, Ph.D., the Stephen R. Pierce Family Goldman Sachs Professor of Science and Human Health and Professor of Statistics and of Biomedical Data Science, were co-senior authors.
The genome in wide angle
Almost all the variations that have been discovered in the human genome so far are simple. But the new algorithm’s output showed that each genome also has between 80 and 100 complex structural variations.
“Looking for only simple variations is like proofreading a book manuscript and searching exclusively for typos that change single letters,” Urban said. “You are overlooking words that are scrambled or duplicated, or in the wrong order—you might even miss that half a chapter is gone. All these things should be caught before the manuscript is sent to the print shop.”
Credit: Cell (2024). DOI: 10.1016/j.cell.2024.09.014
The Automated Reconstruction of Complex Structural Variants algorithm, ARC-SV for short, catches all kinds of DNA rearrangements and has an accuracy rate of 95% in finding complex structural variants. The algorithm uses an AI model and was trained on dozens of complete human genomes, called pangenomes, from people with diverse ancestry.
The algorithm found more than 8,000 distinct complex structural variants, which ranged in length between 200 and 100,000 base pairs. Many variants were located in regions of the genome that regulate brain development and function. The researchers looked more closely at whether these variants were associated with psychiatric disease.
Genetics and psychiatric disease
The ability to easily find and study complex structural variations could help explain which alterations in the genome lead to psychiatric diseases that are heritable. The study examined two such diseases, schizophrenia and bipolar disorder. Genome-wide association studies, called GWAS, have identified many locations in the genome that carry a risk of being diagnosed with a psychiatric disease. But GWAS results fall short of explaining the genetic risk with enough detail to act on it.
“We have made amazing progress in identifying genetic components of psychiatric diseases, but there is still something important missing,” Urban said. “GWAS results tell us where in the genome some DNA change related to a disorder is located. But the information from GWAS is somewhat vague. It is like knowing that there are errors somewhere on pages 118, 237, and 304 in a book. But we do not know what kind of errors they are or which words are involved.”
Urban explained that while GWAS results might direct researchers to look for something wrong on page 118, knowing the sequence of complex structural variants is like having yellow highlighter on the actual 10-word sentence on that page that has one scrambled word and another word duplicated.
“It’s that exact,” he said.
The researchers put the output of the ARC-SV algorithm to the test. They used whole-genome sequences combined with measures of gene expression from more than 100 postmortem brain tissue samples from healthy individuals and people who had been diagnosed with schizophrenia or bipolar disorder to investigate what complex structural variations might be doing.
The variants tended to be located near or overlapped with GWAS locations known to be associated with the risk of developing schizophrenia or bipolar disorder. The complex structural variants also affected how nearby genes were expressed—changing the readout of the instructions contained in DNA—which suggests the variants could be contributing to the disease.
“Identifying and studying complex structural variants will give us more understanding of the ways DNA can vary and will provide molecular clues that will allow mapping of the trajectory of biological function that leads to disease and to the treatment of disease,” said Bo Zhou, Ph.D., an instructor in psychiatry and behavioral sciences and a first author on the study.
Leave a Reply