New Paper Outlines the Purpose and Progress of GREGoR Consortium

‘Long-read sequencing has ushered in new diagnostic opportunities in rare diseases’

Share:

Danny Miller Danny Miller, M.D, Ph.D.: 'Collaboration has been the key to the consortium’s success. It’s been helpful to see how researchers at the other sites are analyzing really complex cases of these rare diseases.'

Scientists at BBI and the University of Washington, along with four other universities and a research institute, have been collaborating since 2021 with about 7,000 patients among 3,000 families studying the genomics of rare diseases and ways to treat them.

Their collaborations are part of the Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium. It is composed of the Broad Institute, Stanford, the Baylor College of Medicine and the University of California, Irvine. In addition, the Genetic Analysis Center in UW’s Department of Biostatistics served as the consortium’s Data Coordinating Center.

“Collaboration has been the key to the consortium’s success,” said Danny Miller, M.D., Ph.D., who leads BBI’s long-read sequencing program. “It’s been helpful to see how researchers at the other sites are analyzing really complex cases of these rare diseases.”

Miller is also one of several authors on a new paper, “GREGoR: Accelerating Genomics for Rare Diseases,” published in Nature.

“The past decade has seen rapid progress in clinical genetics, due to increased discovery of genes and variants involved in Mendelian diseases and ongoing advances in sequencing, variant analysis, and data sharing,” the authors state. “Despite this progress, most individuals who undergo clinical genetic testing for a suspected Mendelian condition remain undiagnosed.”

Miller’s specialty, long-read sequencing, “has recently ushered in new diagnostic opportunities in rare diseases,” according to the paper. “GREGoR and others have demonstrated that use of targeted long-read sequencing can reveal variants, particularly structural variants spanning repetitive sequences, in both known and novel disease genes that are missed or difficult to detect by short-read sequencing.” Despite such progress, “one of the major challenges with use of long-read sequencing in rare diseases remains the absence of control datasets for filtering and prioritizing variants,” the authors state.

Miller does not disagree. In fact, he said helping develop control data sets has been “a key part of our team’s contribution to the work.”

“When you sequence someone using long-read sequencing, you find a lot more variants than with short read sequencing,” Miller said. “The challenge is answering the question, ‘Are those variants common or rare and should you pay attention to them or not?’”

The paper concludes with an allusion to the future of – and further challenges to address – in the study and treatment of genetic variants:

“Further, even when all genes for Mendelian conditions are identified, causal variant discovery will remain far from saturated. This is particularly true for missense variants, which often require functional validation, and for noncoding variants, where the regulatory mechanisms are complex and poorly characterized. Far from being a task of ‘wrapping up the edges,’ these challenges represent a vast forefront in genomic research, demanding both innovative methodologies and sustained collaboration to make meaningful progress.”

Challenges in Diagnosing Rare Genetic Diseases:

A) The pathogenic variant(s) may be located in a gene yet to be implicated in disease. Until now, over 5,000 protein-coding genes have been implicated in at least one disease, but it is estimated that still 10,000+ disease gene relationships are undiscovered in just the remaining protein-coding genes.

B) The pathogenic variant(s) may be located in the noncoding genome, where the mechanisms for how a variant manifests a clinical phenotype are not well understood leading to challenges in identifying candidates.

C) The variant may be difficult to detect from solely short-read, exome or genome sequencing such as long repeats, inversions, and complex genomic rearrangements. The variant may be detectable. but bioinformatic algorithms may struggle to call the variant correctly such as multi-nucleotide and mosaic variants. The variant may be detectable and called correctly but asserting its functionality or pathogenicity may require unavailable, orthogonal lines of evidence.

D) More complex inheritance patterns such as multi-locus pathogenic variation, oligogenic, polygenic, variable expressivity, incomplete penetrance, imprinting, and/or mosaicism may also be confounding a diagnosis and necessitate a broader approach to understanding the mechanism of disease.

E) A gene-disease relationship may be published or submitted to a genetic database but has yet to be reviewed and incorporated into clinical testing. This is compounded by the rapidly increasing number of Variants of Uncertain Significance (VUS).

F) Many candidate variants and genes are n=1 regardless of the best data sharing practices.

G) Because of the nature of novel discovery, there is not always a functional assay available to provide orthogonal evidence for or against a candidate variant or gene. Many times, if a candidate does meet the inclusion criteria for an existing assay, the molecular phenotype measured in the assay may not match the potential mechanism of disease or fully recapitulate the pathophysiological impact, resulting in ambiguous results or an incorrect prediction of pathogenicity.

H) Databases predominantly capture genetic information from individuals of European-like genetic ancestry potentially propagating biases in tools and reference data for variant classification for individuals of non-European-like genetic ancestry.

I) Newer genomic technologies may offer advantages over short-read DNA sequencing, but effective and widespread use of these technologies requires clear guidance and broad demonstration of efficacy.

Share: