With the widespread use of low-cost genome-wide diagnostic screening tests, unanticipated but possibly pathogenic dosage changes affecting single genes are discovered with increasing frequency. Clinical management demands facile validation of such incidental findings, often necessitating the design of custom variant-specific assays. Although deletion variants can be readily confirmed using a range of next-generation sequencing strategies, characterising duplication variants, at nucleotide resolution, remains demanding. We have addressed this challenge by deploying a novel Cas9 enrichment strategy combined with long-read sequencing with the Oxford Nanopore MinION. We used bulk genomic DNA without the need for PCR amplification. We present the diagnostic resolution of two problematic cases in which incompletely characterised duplication variants had been identified by array CGH. The first patient presented with learning difficulties and autism spectrum disorder but had been found to have an incidental 1.7-kb imbalance which included a partial duplication of VHL exon 3. This was inherited from the patient’s father, who had renal cancer aged 38 years. In the second case, we identified an incidental 200-kb duplication which included DMD exons 30-44. Parental testing was consistent with this variant having arisen de novo. In both cases, the single-molecule sequencing yielded sufficient information to define precisely the architecture of the rearranged region, enabling Sanger sequencing assays across the integration sites and surrounding homologous regions, that likely gave rise to the duplicated sequences. Adoption of this approach by diagnostic laboratories promises to enable rapid and cost-effective characterisation of challenging duplication-containing alleles.
Clive is Chief Technology Officer at Oxford Nanopore Technologies. On the Executive team, he is responsible for all of the Company’s product-development activities. Clive leads the specification and design of the Company’s nanopore-based sensing platform, including strand DNA/RNA sequencing and protein-sensing applications with a strong focus on scientific excellence and successful adoption by the scientific community.
Clive joined Oxford Nanopore Technologies from the Wellcome Trust Sanger Institute (Cambridge, UK) where he played a key role in the adoption and exploitation of next-generation DNA sequencing platforms. This involved helping to set up the world’s largest single installation of Illumina (formerly Solexa) Genome Analyzers in a production sequencing environment, initially used to pioneer the 1000 Genomes Project. From early 2003 he was Director of Computational Biology and IT at Solexa Ltd, where he was central to the development and commercialisation of the Genome Analyzer (GA). Solexa was sold to Illumina for $650m in early 2007 after the successful placement and adoption of 12 instruments. The Solexa technology, now commercialised by Illumina, is the market-leading DNA sequencing technology driving the renaissance in DNA-based discovery.
He has a strong background in computer science and genetics/molecular biology and manages interdisciplinary teams including mechanical engineering, electronics, physics, surface chemistry, electrophysiology, software engineering and applications (of the technology). Clive applies modern agile management techniques to the entire product-development lifecycle. Clive has also held various management and consulting positions at GlaxoWellcome, Oxford Glycosciences and other EU- and US-based organisations. He has worked at the interface between computing and science, ranging from genetics to proteomics. He holds degrees in Genetics and Computational Biology from the University of York.
Complex environmental matrices, such as soil, sediment and excreta, are often synonymous with diverse microbial communities. Long read sequencing of DNA extracted from such communities can yield highly contiguous genomic data and provide information on both genetic composition and structure. However, DNA extracted from such matrices is often impure, fragmented and can potentially lack complete representation. Therefore, techniques such as isolation, enrichment and metagenomic assembly are used to answer questions on function and diversity. Here we present the sequencing and assembly of two environmental AMR harboring plasmids and one novel gc-rich genome isolated from the environment. Furthermore, we describe our exploration into the sequencing and analysis of long-range amplicon-based enrichment for AMR associated mobile genetic elements, and undertake metagenomic analysis of the community composition of two fractions of industrial anaerobic digesters. This has permitted us to investigate the evolution and selective drivers of AMR in the environment.
The human transcriptome is highly diverse and complex, as evidenced by cell-type specific expression of unique transcript isoforms. In particular, transcripts derived from non-coding regions are the most qualitatively diverse class of genetic elements, encompassing over 70% of the genome. In contrast, about 80% of GWAS SNPs reside in non-coding regions, which suggests long non-coding RNAs may be the missing link, at least to some degree. I will present our recent work on high-resolution cDNA sequencing of non-coding regions associated with neuropsychiatric disorders using targeted sequencing on the PromethION. I will detail the discovery of new non-coding RNAs, mRNA isoforms, long-range exon dependencies, and how these relate to mental health and neurodegeneration. Given the apparent involvement of RNA modifications in neurological diseases, I will then segue into direct RNA sequencing and describe our in vitro strategy to train RNA base callers and detect modified RNA bases. This will include data from our recent preprint on detecting m6A in RNA with 90% accuracy. Finally, I will present how we maximise flowcell output by complementing an innovative RNA barcoding strategy and 'AI'.
Nonsense-mediated mRNA decay (NMD) is a translation-dependent RNA degradation pathway that targets mRNAs with premature termination codons, as well as some endogenous mRNAs that encode full-length proteins. The features that render an mRNA sensitive to NMD are still poorly understood, except for the presence of an exon junction complex (EJC) >55 nts downstream of the termination codon. Obscuring the identification of NMD-inducing features is the fact that previous transcriptome-wide analyses of endogenous NMD targets did not reveal which specific splice isoforms are degraded by NMD. This is mostly attributed to the insufficient coverage of splice junction sites and the lack of information regarding non-annotated mRNA isoforms that are enriched upon NMD inhibition. A recent comparative transcriptome analysis from our lab of cells, in which three essential NMD factors were knocked down and then rescued, identified a high-confidence set of genes whose transcripts react to NMD (Colombo et al., RNA, 2016). However, because the analysis was based on short-reads only, we could not obtain reliable isoform-specific information. For an isoform-specific analysis, we now use cDNA nanopore sequencing, which allows us to identify full-length mRNAs that are stabilized upon NMD inhibition. Our approach can detect full-length isoforms that are enriched, or even appear, when NMD is inactivated and we have experimentally verified several examples. We integrate long and short-read sequencing to accurately quantify the expression of individual isoforms and thereby identify those that are targeted by NMD. We aspire to reveal the regulatory role of NMD at isoform-specific level and generate a resource that will enable the study of features that render a specific mRNA sensitive to NMD.
Evangelos is a postdoc in Oliver Mühlemann’s group at the Department of Chemistry and Biochemistry, University of Bern, Switzerland. He is interested in post-transcriptional mRNA regulation in mammalian cells and applies nanopore sequencing to identify endogenous mRNAs that are sensitive to nonsense-mediated mRNA decay. He is a biochemist from Greece with background in transcriptomics, translation termination and RNA decay.
Release of the first human genome assembly was a landmark achievement, and after nearly two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no one chromosome has yet been finished end to end, and hundreds of gaps persist across the genome. These unresolved regions include segmental duplications, ribosomal rRNA gene arrays, and satellite arrays that harbor unexplored variation of unknown consequence. We aim to finish these remaining regions and generate the first truly complete assembly of a human genome.
Here we announce a whole-genome de novo assembly that surpasses the continuity of GRCh38, along with the first complete, telomere-to-telomere assembly of a human X chromosome. In total, we collected 40X coverage of ultra-long Oxford Nanopore sequencing for the CHM13hTERT cell line, including 44 Gb of sequence in reads >100 kb and a maximum read length exceeding 1 Mb. This unprecedented coverage of ultra-long reads enabled the resolution of most repeats in the genome, including large fractions of the centromeric satellite arrays and short arms of the acrocentrics. A de novo assembly combining this nanopore data with 70X of existing PacBio data achieved an NG50 contig size of 75 Mb (compared to 56 Mb for GRCh38), with some chromosomes broken only at the centromere. Using this assembly as a basis, we chose to manually finish the X chromosome. The few unresolved segmental duplications were assembled using ultra-long reads spanning the individual copies, and the ~2.3 Mbp X centromere was assembled by identifying unique variants within the array and using these to anchor overlapping ultra-long reads. These results demonstrate that it is now possible to finish entire human chromosomes without gaps, and our future work will focus on completing and validating the remainder of the genome.
Karen H. Miga, PhD, is an Assistant Research Scientist at UCSC. Dr. Miga’s research program combines innovative computational and experimental approaches to produce the high-resolution sequence maps of human centromeric and pericentromeric DNAs.
In light of widespread resurgence of the respiratory disease whooping cough, ongoing research aims to identify changes to the causative bacterium, Bordetella pertussis. B. pertussis is traditionally described as a highly clonal species at the single-base level, hence our research largely focusses on identifying differences between strains on a whole-genome scale. Long-read sequencing has enabled us to produce closed genome sequences for B. pertussis isolates on an unprecedented scale, allowing visualisation of extensive inter-strain genomic rearrangements. This work also led to the unexpected discovery of a second phenomenon: large duplications which are present in some recent isolates but not in the B. pertussis reference genome. Intriguingly, these duplications may be present in only a fraction of the cells of duplication-carrying strains. At London Calling 2019, I will discuss this developing story, including the essential role of long and ultra-long nanopore sequencing in proving the existence of the duplications and characterising variable populations, alongside continuing work to quantify the phenotypic effects of the duplications.
Natalie Ring graduated from the University of Bath with a BSc in Biochemistry in 2012. She then spent four years working at MRC Harwell as a data wrangler for the International Mouse Phenotyping Consortium, as well as completing a post-graduate qualification in Science Communication from the University of Edinburgh. She is currently a PhD student at the University of Bath in the Bagby and Preston groups, studying the genome of Bordetella pertussis, the bacterium responsible for whooping cough.
Transposable elements (TEs) are long known to be expressed in different cells during early mammalian development. However, the role of TEs in cellular differentiation has remained elusive. We have developed a new experimental and computational methods to understand the role of TEs in cellular differentiation at single cell resolution.
Single cell (sc) RNA sequencing (RNA-seq) has been developed extensively in recent years to study cell-to-cell variability of gene expression. These methods, however, have exclusively used short read sequencing technologies, which do not allow for TE mapping. We have developed a novel plate-based long read scRNA-seq protocol, which will overcome this limitation. Full-length transcripts are tagged with unique molecular identifiers (UMIs) prior to amplification, permitting accurate transcript counting. We also introduced PCR barcoding allowing for pooling of samples, this will further decrease the PCR amplification.
We have devised a computational method to error correct reads using UMIs, by calculating a consensus from multiple sequence alignments of all reads flagged as PCR duplicates.
This protocol allowed for the first time for long-read sequencing of scRNA-seq libraries incorporating error correction of Oxford Nanopore reads. We used this method to study transposable element expression in single cells at single molecule resolution in Dicer KO mouse embryonic stem cells.