New BBI Study Reveals Need for Gene-Specific Calibration in AI-Predicted Variant Effect Data

‘Output from these AI models requires careful calibration before it can be used in clinical genetics…. Data must be tested against control variants and adjusted as needed.’

Share:

Malvika Tejura Co-lead Author Malvika Tejura: 'It is important to recognize that the output from AI models requires careful calibration before it can be used in clinical genetics.'

A new AI-based study by researchers at the Brotman Baty Institute and the University of Washington concludes that calibration of specific genes is needed to more accurately use data from models that predict whether genetic variants are pathogenic or benign.

The study, “Calibration of variant effect predictors on genome-wide data masks heterogeneous performance across genes,” was published online last month in the American Journal of Human Genetics.

“Our paper shows that aggregating genomic data on a whole genome basis – and then ‘calibrating’ that data for variant classification masks genes that do not perform well with that calibration,” said Malvika Tejura, a Ph.D. student in the Fowler and Starita labs and the paper’s co-first author. “It is important to recognize that the output from AI models requires careful calibration before it can be used in clinical genetics. Data generated by these models is not immediately ready for practical application. It must be tested against control variants and adjusted as needed.”

Tejura and her colleagues devoted two years to this project – one and a half years of research, followed by six months to perfect the figures and complete the draft manuscript.

Genetic variant analysis represents a significant research priority of the BBI.

As the paper explains, guidelines set forth by the American College of Medical Genetics and Association for Molecular Pathology (ACMG/AMP) define how multiple pieces of evidence can be combined to interpret genetic variants as: pathogenic, likely pathogenic, likely benign, benign. Rare missense variants are particularly challenging to interpret because many key pieces of evidence are missing or uninformative. As a result, those variants are classified as VUSs.

'Increased genetic testing and the expansion of clinical tests to more genes have resulted in an explosion of VUSs, and currently, approximately 86 percent of missense variants in the ClinVar database are VUSs.'

“Increased genetic testing and the expansion of clinical tests to more genes have resulted in an explosion of VUSs, and currently, approximately 86 percent of missense variants in the ClinVar database are VUSs,” the paper states. “Timely resolution of this large and rapidly growing VUS problem requires high-quality evidence that can be generated for all variants. Variant effect predictions from various tools are available for nearly all possible single-nucleotide variants in most genes. Thus, variant effect predictors could have a profound impact in reducing VUSs. However, predictor evidence was limited to supporting strength in the 2015 ACMG/AMP guidelines because of the relatively low sensitivity and specificity of predictors available at the time.”

Tejura said “a central issue” addressed by the ClinGen Sequence Variant Interpretation (SVI) Working Group was the insufficient number of control variants (variants classified as either pathogenic or benign) available for calibrating variant effect predictors, however, she noted that VEP data calibrated on a genome-wide basis obscures variation in the performance of specific genes.

“The ClinGen SVI showing that we can use predictors as more than weak evidence was a massive step forward,” Tejura said. “Although we understood the rationale behind using genome-wide data for calibration, we aimed to present the heterogeneous performance of genes with these predictors in a mathematically elegant way. Developing a model that effectively illustrated these variations took considerable time and effort.”

Tejura explained some of the more challenging aspects of the research.

“This project turned out to be more bioinformatically demanding than we initially anticipated,” she said. “While we had a solid grasp of the project's requirements, translating this into bioinformatics posed significant challenges, especially given that it was not our primary area of expertise. To overcome this, we sought help from researchers with stronger bioinformatics backgrounds and committed ourselves to extensive self-learning through online resources and training.”

Tejura and her colleagues have also developed a web-based resource that enables clinicians and researchers to visualize the distribution of variant effect predictions for 3,668 disease-relevant genes. The resource, she said, “will assist clinicians and researchers in identifying which genes require gene-specific calibration and hopefully help them determine the appropriate strength of evidence to assign the predictor when classifying variants in a gene.”

In addition to Tejura, other authors from BBI and the University of Washington are: Shawn Fayer, Ph.D. student; Abbye E. McEwen, M.D., Ph.D.; Jake Flynn, M.S.; Lea M. Starita, Ph.D.; and Doug Fowler, Ph.D.

Share: