Dr. Sudarshan Pinglay: 'Genome-Shuffle-seq enables precise identification of SV breakpoints at base-pair resolution without requiring whole-genome sequencing'
[Editor’s note: Dr. Sudarshan Pinglay and researchers in the UW Department of Genome Sciences and other institutions published the paper, “Multiplex generation and single cell analysis of structural variants in mammalian genomes” in the journal Science. That study, in tandem with another paper by the Wellcome Sanger Institute in the same edition of Science, advances the study of structural variants. Here, Dr. Pinglay discusses his paper and offers insights into those advances.]
What is the premise of this paper?
The fundamental premise is there are genetic variations in all of us that are structural variants (SV) that, pound for pound, impact more base pairs in any human genome than other forms of genetic variations. So, if you consider single nucleotide variants – and there are few millions of them in each of our genomes – those impact one base pair. Structural variants affect more than 50 base pairs. And an individual SV can sometimes affect millions of base pairs, even though there are numerically fewer of them. However, understanding their impact has been challenging.
Why are structural variants, more than single nucleotide variants, so difficult to interpret?
We have two general paradigms for studying genetic variation. In the first, we sequence people, and see if we can associate the existence of certain variants with certain phenotypes. But , each individual structural variant is less frequent in the population than single nucleotide variants. And because they impact so many base pairs, the odds of you sequencing enough people with the exact same structural variant, so that you can associate them with phenotypes in a statistically confident way, is hard.
The other paradigm is engineering these structural variants into cellular models in the lab, that is, creating them synthetically to see what effect they might have. We’ve done this for single nucleotide variants for a long time, either randomly, or with CRISPR screens. Because of their size and inefficiencies in the technologies associated with making genetic variants, it is hard to generate structural variants at scale. We can do one or two variants, but we have not been able to do hundreds or even thousands of these structural variants and assess their impacts.
How does Genome-Shuffle-seq, the new technique you developed, address this problem?
Genome-Shuffle-seq is a straightforward method for generating and mapping large numbers of SVs across a mammalian genome. It enables precise identification of SV breakpoints at base-pair resolution without requiring whole-genome sequencing.
My colleagues and I applied Genome-Shuffle-seq to induce and map thousands of genomic SVs of several major forms, including deletions, inversions, chromosomal translocations, and others. Additionally, we demonstrated that this method can capture SV identities alongside single-cell transcriptomic data, making it possible to conduct large-scale pooled screens that link SVs to their effects on gene expression
We anticipate that Genome-Shuffle-seq will be widely useful for investigating how SVs influence gene regulation, chromatin organization, and 3-D nuclear architecture. It may also facilitate in vitro modeling of extrachromosomal DNA in disease progression and contribute to efforts toward designing a minimal mammalian genome.
What is the backstory on how this paper come about?
During my Ph.D. at New York University, I worked on engineering mammalian genomes with large synthetic DNA constructs. However, the major limitation here was throughput; we could not build many variants of these DNA constructs outside of cells, and then deliver it to them. To address this problem, I was exploring ways to build and engineer genomic variants inside of cells, such that we would only have to build and deliver one DNA construct. But that sequence would have the ability to generate many different structural variants of itself. I realized that while this was powerful at the scale of one genomic locus, it could also be applied to the entire genome.
When I arrived at the University of Washington and the Shendure Lab in January of 2023, I realized it was the perfect environment to execute on this vision. Some components of the system I had imagined had already been reduced to practice in the Shendure Lab for other applications. In addition, we also were able to partner with colleagues at the Wellcome Sanger Institute who also were also interested in “scrambling” mammalian genomes to study SVs. So, we decided to coordinate the publication of both papers together in Science.