Drs. Marsh (left) and Starita: 'Simple, practical guidelines could help make these tools more useful, transparent, and impactful for both research and clinical use.'
Researchers from BBI, the Atlas of Variant Effects Alliance, the University of Edinburgh, and other institutions have collaborated on a paper providing guidelines for computational methods assessing potential impacts of genetic mutations, commonly referred to as variant effect predictors (VEPs).
The paper was published in Genome Biology as part of collection of articles entitled, “Towards an atlas of variant effects.” The corresponding author is Professor Joe Marsh, Ph.D., of the MRC Human Genetics Unit, Institute of Genetics and Cancer, at the University of Edinburgh. He is also a member of the AVE Alliance.
“We wrote this paper because the field of variant effect prediction has grown rapidly, but without clear standards for how new tools should be shared,” said Marsh. “Many predictors are difficult to access, poorly documented, or can’t be fairly evaluated. As many of us within the Atlas of Variant Effects Alliance have worked extensively with these tools, we realised that simple, practical guidelines could help make these tools more useful, transparent, and impactful for both research and clinical use.”
BBI’s Lea Starita, Ph.D., one of the authors, credits Marsh for his thorough analysis of VEPs and determination to create transparency.
“Joe led the efforts to pull back the curtain on what each of these variant effects predictors is doing,” Starita said.
The paper, “Guidelines for releasing a variant effect predictor,” explains complications with variant effect predicters.
“VEPs vary widely in their algorithms, training data, prediction interpretation, output format, and accessibility,” the authors write. “Despite progress in the field, this diversity complicates end users’ ability to select the most suitable VEP and poses challenges for unbiased assessment, as new predictors often claim superiority over others…. Fair assessment also demands clear knowledge of training data, which is often poorly detailed in publications.”
Marsh is one of more than 700 members of the AVE Alliance and serves as the chair of its “Analysis, Modeling, and Prediction” workstream group, focusing on computational methods for variant effect prediction and analyzing multiplexed assays of variant effect data.
“By setting out clear expectations for how to release a new variant effect predictor, the paper makes it easier for others to use, test, and compare these tools,” Marsh said. “This will improve trust in the predictions, help avoid common pitfalls, and ultimately allow better interpretation of genetic variation. This matters for everything from basic biology to diagnosing genetic disease.”
Among the guidelines explained in the paper are:
- Making variant effect prediction methods freely available and open source, with a clear Open Source Initiative approved license. By making VEP methods and their corresponding codebases accessible and clearly documented, developers empower researchers across the globe to contribute to the evolution of these tools, enhancing their accuracy, efficiency, and utility.
- Hosting the code for a VEP on a public, open-source platform like GitHub or Huggingface, thereby providing high levels of visibility, version control, and the opportunity to integrate documentation.
- Clearly documenting the methodology underlying a novel VEP, including a list of all features in the final model with links to sources and code or replicable methodology that can be used to engineer these features, if necessary.
The paper states: “It is crucial for the integrity and transparency of a VEP that all variants employed in its training are disclosed upon release. Ideally, these should be shared in the same format as the variant effect scores themselves, rather than merely referencing the databases, due to the dynamic nature of these resources and the potential variability in mapping methods to different sequence identifiers.”
Moreover, to increase the visibility and discoverability of new VEPs, Marsh and colleagues compiled an extensive list of tools, including: classifications of VEPs’ underlying methodologies and features; details on author-recommended pathogenicity prediction thresholds; links to web servers, and variant effect scores.
While these guidelines are important, they raise an equally important question: What will it take to for researchers to adopt them?
“That is on editors and reviewers of papers describing new predictors,” Starita said. “Authors should be required to follow these or their data set won’t be complete.”
Marsh added: “Adoption will depend on journals, funders, and tool developers recognizing the value of transparency and usability. Success would mean that most new predictors are openly shared, well-documented, and come with clearly explained training data and outputs, making it far easier for researchers and clinicians to use them confidently and effectively.”