Measuring transcriptomic diversity induced by genome SCRaMbLEing with nanopore direct RNA sequencing
Eukaryotic genomes are organized non-randomly. We are studying how abrupt changes to genome organization shapes the transcriptional landscape using Sc2.0, a yeast strain composed entirely of designer, synthetic DNA. Synthetic chromosome rearrangement and modification by loxP-mediated evolution (SCRaMbLE) is a key design feature of the synthetic yeast genome project. SCRaMbLE generates stochastic and complex rearrangements at engineered recombinase sites throughout the genome on-demand using Cre-mediated recombination. We have characterized extensive transcriptional diversity in >60 independent SCRaMbLE strains containing a circular synthetic chromosome, synIXR. These transcriptional alterations appear to be isolated to the SCRaMbLEd chromosome and dependent on rearrangements specific to each SCRaMbLE strain. Using Oxford Nanopore’s direct RNA sequencing, we have observed novel transcriptional events in SCRaMbLEd genomes, including alteration to transcript start and termination sites. Our results suggest an inextricable link between physical organization of the genome and transcript isoform expression.
Dr Aaron Brooks earned his PhD from the University of Washington and is currently an EMBL Interdisciplinary Postdoc (EIPOD) working with Dr Lars Steinmetz in Heidelberg, Germany. Aaron’s research harnesses synthetic and evolutionary biology to understand how the physical layout of the genome shapes its function. Aaron and his team have relied on nanopore sequencing to detect abrupt reorganization events in synthetic genomes and measure their consequences.
From ancient tomb to animal viruses: mobile suitcase lab for nanopore sequencing at field setting
Nanopore sequencing technology can be applied to identify the pathogen responsible for an outbreak through sequencing all nucleic acids existing in the collected sample in a single run. In addition, it gives insight about the origin and variant of the causative agent. We have established a novel sequencing protocol relying on nanopore sequencing and offline BLAST search beside a microbiome screening of an ancient tomb. The whole procedure was conducted in a solar powered mobile suitcase laboratory, which is easy to use at the point of need. The procedure was completed in 5 hours including extraction, barcoding, sequencing and data analysis, which did not require a bioinformatician. Our protocol enables rapid and reliable foot and mouth disease virus serotyping and the differentiation of the Capri poxviruses (Sheep poxvirus, Goat poxvirus and Lumpy Skin Disease virus). The microbiome composition of the ancient tomb revealed potential threat of respiratory illness due to bacteria from family of Bacillaceae. Furthermore, bacteria from family of Pseudomonadaceae gave hints to the former use of the tomb as a byre.
Dr Abd El Wahed studied veterinary medicine at Mansoura University in Egypt, and received his PhD in biology from Göttingen University, Germany in 2011. He has participated in the development of 30 point-of-care assays for the detection of infectious agents, and In 2013, he was awarded the Young Investigator award from the ASTMH on the establishment of a mobile laboratory for rapid detection of haemorrhagic fever viruses at low resource settings. Recently, he established a mobile suitcase laboratory for rapid detection of viruses, bacteria and parasites. The mobile setup was in field trials in Guinea, Sri Lanka, Nepal, Senegal, Egypt, Bangladesh and Brazil.
Going full circle: Assembly of high-quality, single-contig microbial genomes from the rumen microbiome using long-read sequencing
Ruminants such as cows and sheep are important livestock species. They convert low nutritional value plant matter into high-quality meat and dairy products. Within a specialised stomach called the rumen, microbes ferment the plant matter producing short-chain fatty acids from difficult to digest plant matter. The composition of the rumen microbial community can affect the animal’s health, feed efficiency and level of methane production. Species in the rumen are typically difficult to culture and despite its importance, it remains an underexplored environment. DNA sequencing of the contents of the rumen offers the potential to identify microbial species without culture techniques. Here we sequence cow rumen fluid using Oxford Nanopore sequencing. We show that despite these data coming from a highly complex microbial sample we can assemble high-quality, single-contig whole genomes and plasmids of known and novel species, including numerous circular contigs. Additionally, we compare and validate the assemblies of these genomes with binned genomes generated from short read Illumina assemblies. We show that the long-read assembly out performs the short-read assembly in contiguity and in incorporation of important features such as AMR genes and marker genes..
Amanda Warr recently completed her PhD at The Roslin Institute in Edinburgh, UK. Her PhD research involved using genomics to investigate reproductive traits in pigs and reassembling the pig genome using long-read sequencing. Although this work was primarily in bioinformatics, she also spent time in the lab using the MinION and training others to use the sequencer. She has accumulated a number of MinION-related side projects and collaborations, including work in a variety of species on anti-microbial resistance, viral epidemiology, genome assembly in mammals and microbiomes, and diagnostics. Currently she is employed as a Postdoctoral Research Fellow at The Roslin Institute with Mick Watson and Christine Tait-Burkard, with main projects focussing on the rumen microbiome, functional genomics in chickens and tracking the spread of porcine reproductive and respiratory syndrome virus in the Philippines.
Genomic profiling in acute myeloid leukemia with complex karyotype
Acute myeloid leukemia (AML) represents clonal expansion of malignant cells. A stratification of patients in risk groups is based on cytogenetics and molecular markers for a genotype-based treatment strategy. Conventional karyotyping, which is necessary for classification of “high-risk” AML, is available after 5 to 7 days. Using Oxford Nanopore sequencing, we established karyotyping based on shallow genome sequencing within 24 hours. The throughput of one flowcell was sufficient to achieve 3-fold genome coverage and reproduce results of conventional karyotyping in 20 AML patients. To discover structural variations, we applied direct RNA sequencing and analysed fusion genes based on 1.2 million reads. A single run is sufficient to detect a balanced translocation t(9;22), a fusion gene BCR-ABL1, in the cell line K-562. While a study of a larger AML patient cohort is ongoing, parallel low coverage genome and transcriptome analysis allows identification of high-risk AML during the initial diagnostic work-up of 24 hours.
Anna Dolnik is PostDoc at the Charité – University Medicine Berlin, Campus Virchow in Germany. Trained as a biologist, she switched to processing of Illumina short-gun sequencing in 2012, working at the edge of biology and bioinformatics. In 2016 Anna first experienced working with Oxford Nanopore technology through resequencing of novel fusion genes found in AML with complex karyotype and now routinely uses a GridION for better characterization of high-risk AML. Her research focusses on clonal evolution in blood cancer (acute myeloid leukemia), identification of cancer driving genes and characterization of complex changes in cancer genome by whole genome sequencing.
Long-read NGS guided preimplantation genetic testing for chromosomal structural rearrangement
It is well-known that patients diagnosed with chromosomal structural abnormality will lead to increased miscarriage rate. Preimplantation genetic testing for chromosomal structural rearrangement (PGT-SR) can increase the pregnancy rate, and provides a chance to avoid the genetic defect in the successive generation if handles appropriately. Diagnosis of the DNA breakpoint is always a difficult task from both effective and economical point of views. In the past, we performed haplotype analysis to phase the disease and wild-type chromosomes. Although it is feasible, the occurrence of double crossover between breakpoints and markers can generate false negative result. In this session, I will demonstrate the approach of breakpoints determination using latest technology advancement that maximize the efficiency and minimize the economic burden. Precise identification of breakpoints simplifies PGT-SR to an approach similar to PGT-M (preimplantation genetic testing – Monogenic disease). This strategy requires only a couple of PCR reactions to distinguish the derivative from wild-type chromosomes accurately. Moreover, such breakpoint information is applicable to all family members affected by the same chromosomal aberrations, where haplotype approach require reassessment of microsatellite markers for every couple even shared the same chromosome aberrations.
Dr Chan graduated from the University of Newcastle Upon Tyne in the UK and gained his PhD at the University of Hong Kong. His research interest is in genetics and epigenetics of hereditary cancers as well as clinical genetic screening. Dr Chan has long been using cutting-edge technologies in his research, including the early application of pyrosequencing in quantification of DNA methylation, leading to the discovery of the mechanism of transcriptional read-through as the cause of Lynch Syndrome. He has been using an NGS approach in preimplantation genetic testing (PGT) since 2015 and his latest work involves the use of long-read NGS for structural variants.
Clinical application of long-read sequencing
With the widespread use of low-cost genome-wide diagnostic screening tests, unanticipated but possibly pathogenic dosage changes affecting single genes are discovered with increasing frequency. Clinical management demands facile validation of such incidental findings, often necessitating the design of custom variant-specific assays. Although deletion variants can be readily confirmed using a range of next-generation sequencing strategies, characterising duplication variants, at nucleotide resolution, remains demanding. We have addressed this challenge by deploying a novel Cas9 enrichment strategy combined with long-read sequencing with the Oxford Nanopore MinION. We used bulk genomic DNA without the need for PCR amplification. We present the diagnostic resolution of two problematic cases in which incompletely characterised duplication variants had been identified by array CGH. The first patient presented with learning difficulties and autism spectrum disorder but had been found to have an incidental 1.7-kb imbalance which included a partial duplication of VHL exon 3. This was inherited from the patient’s father, who had renal cancer aged 38 years. In the second case, we identified an incidental 200-kb duplication which included DMD exons 30-44. Parental testing was consistent with this variant having arisen de novo. In both cases, the single-molecule sequencing yielded sufficient information to define precisely the architecture of the rearranged region, enabling Sanger sequencing assays across the integration sites and surrounding homologous regions, that likely gave rise to the duplicated sequences. Adoption of this approach by diagnostic laboratories promises to enable rapid and cost-effective characterisation of challenging duplication-containing alleles.
Christopher studied molecular biology and human genetics at the University of Manchester and Mayo Clinic in Florida, USA. He subsequently moved to Leeds to undertake clinical scientist training and attained HCPC registration in 2013. As the NHS lead in the Translational Genomics Unit, he has overseen the clinical implementation of numerous short-read sequencing instruments and next-generation sequencing assays. He is currently a visiting research fellow at the University of Leeds where he is focussed on understanding the clinical utility of long-read sequencing, particularly for the diagnosis of rare Mendelian disease.
Redefining the transcriptional complexity of viral pathogens using direct RNA sequencing
Viral genomes typically exhibit a higher gene density and more diversified transcriptome than the host cell, despite their smaller, more constrained size. Coding potential is maximized by the use of overlapping open reading frames, alternative polyadenylation, and/or the deployment of complex splicing patterns. These properties make it more challenging to accurately dissect viral transcriptomes and there is a need for new tools and approaches that either complement or supplant conventional approaches. Here, we demonstrate how direct RNA sequencing of polyadenylated RNAs collected from primary cells infected with herpesviruses enables us to examine splicing patterns, transcription initiation and termination, RNA editing, and the placement of RNA modifications for both host and viral transcripts – all performed without inherent bias and at the single molecule level. We examine the pros and cons of using Illumina sequencing to perform error-correction and how ‘pseudo-transcripts’ can be used to examine the protein coding potential of individual RNAs. In addition to cataloguing novel splice variants of canonical transcripts, we have also discovered a number of novel non-coding RNAs that regulate viral infections. Most surprisingly, we have uncovered a novel class of transcript that results from read-through transcription and back-splicing to produce viral fusion proteins. Taken together, direct RNA sequencing offers an invaluable tool for dissecting complex viral transcriptomes leading to new biological understanding.
Daniel is an Assistant Professor at the New York University School of Medicine, currently setting up his first group. His research interests encompass transcriptional regulation, host-pathogen interactions, systems virology, and evolution. The primary focus of his lab is on understanding the regulation of transcription in herpesviruses during lytic, latent, and reactivating infections – as viewed through both wet- and dry-lab approaches. He received a PhD in molecular parasitology from the University of York in 2009 and subsequently defected to virology, gaining postdoctoral experience at the Wellcome Trust Sanger Institute, University College London, and the NYU School of Medicine.
Comparison of single nucleotide variants identified by Illumina and Oxford Nanopore technologies in the context of a potential outbreak of Shiga Toxin producing E.coli
Short-read sequencing platforms have been adopted by public health agencies for infectious disease surveillance worldwide and have proved to be a robust and accurate method for quantifying relatedness between bacterial genomes. However, this approach offers less flexibility for urgent, small scale sequencing that is often required during public health emergencies. In contrast, Oxford Nanopore Technologies offers a range of rapid real-time sequencing platforms, although at this time it has been suggested that lower read accuracy compared to other sequencing technologies might be problematic for variant identification. We compared Illumina and Oxford Nanopore sequencing data of two isolates of Shiga toxin producing Escherichia coli to assess the utility of nanopore technologies for urgent, small scale sequencing. We investigated whether the same single nucleotide variants were identified by the two sequencing technologies and whether inference of relatedness was consistent. We show that with optimised variant calling using nanopore sequencing data alone, it is possible to rapidly determine whether or not two cases of were likely to be epidemiologically linked.
David graduated from the University of Bedfordshire with a BSc in Biomedical Science in 2014 before completing a MSc in Biomedical Science, specialising in Medical Microbiology, from Ulster University in 2015. He then joined the Gastrointestinal Bacteria Reference Unit (GBRU) at Public Health England in London, for the laboratory typing of gastrointestinal pathogens, before moving to the bioinformatics team where he performed data analysis on whole genome sequencing data. David is currently working as a bioinformatician at Public Health England and is a part-time PhD student with the University of Edinburgh, Roslin Institute studying the use of Oxford Nanopore sequencing technologies for the investigation of outbreaks of Shiga-toxin producing Escherichia coli in humans.
Pore-C: a method for genome-wide, multi-contact chromosome conformation capture
The DNA within the nucleus of an interphase cell is organised into a complex hierarchy of folds and loops known as the 3D Genome. The development of various chromatin conformation capture methods has enabled the detection of the structures that define each level of this hierarchy e.g. chromosome territories, A/B compartments, topologically associated domains (TADs) and promoter-enhancer loops. This in turn has facilitated functional studies which have uncovered some of the mechanisms behind the formation and maintenance of these structures, as well as their effect on gene expression. However, most of these studies rely on methods that could only capture interactions between two points on the genome, and thus lacked the ability to resolve higher-order interactions. We will share our progress on Pore-C, a method to generate genome-wide, multi-contact chromatin conformation maps. We will also demonstrate how it can be used to improve whole genome assemblies and help resolve complex structural variants in cancer.
Eoghan Harrington is the Associate Director of Genomic Applications Bioinformatics working out of Oxford Nanopore’s New York office. He brings over a decade's worth of experience in genome sequencing to bear on his role in the Genomic Applications Group, a multi-disciplinary team tasked with finding novel uses for Oxford Nanopore devices and communicating them to a wide audience. To achieve this goal, Eoghan works closely with internal and external collaborators to identify and develop high-impact applications and publicise the results in posters, presentations and scientific publications. After graduating from Trinity College Dublin with a BA in Human Genetics and an Msc. in High Performance Computing, Eoghan went to EMBL Heidelberg to carry out his doctoral research. While there he used comparative genomes to study alternative splicing, in addition to some of the first shotgun metagenomic datasets. He went on to do postdoctoral research in single-cell microbial genomics at Stanford University. Prior to joining Oxford Nanopore Technologies, he worked at two start-ups: a leading personal genomics company and an oncology-focused electronic healthcare record and analytics company.
Splice isoform-specific analysis of endogenous NMD targets in human cells
Nonsense-mediated mRNA decay (NMD) is a translation-dependent RNA degradation pathway that targets mRNAs with premature termination codons, as well as some endogenous mRNAs that encode full-length proteins. The features that render an mRNA sensitive to NMD are still poorly understood, except for the presence of an exon junction complex (EJC) >55 nts downstream of the termination codon. Obscuring the identification of NMD-inducing features is the fact that previous transcriptome-wide analyses of endogenous NMD targets did not reveal which specific splice isoforms are degraded by NMD. This is mostly attributed to the insufficient coverage of splice junction sites and the lack of information regarding non-annotated mRNA isoforms that are enriched upon NMD inhibition. A recent comparative transcriptome analysis from our lab of cells, in which three essential NMD factors were knocked down and then rescued, identified a high-confidence set of genes whose transcripts react to NMD (Colombo et al., RNA, 2016). However, because the analysis was based on short-reads only, we could not obtain reliable isoform-specific information. For an isoform-specific analysis, we now use cDNA nanopore sequencing, which allows us to identify full-length mRNAs that are stabilized upon NMD inhibition. Our approach can detect full-length isoforms that are enriched, or even appear, when NMD is inactivated and we have experimentally verified several examples. We integrate long and short-read sequencing to accurately quantify the expression of individual isoforms and thereby identify those that are targeted by NMD. We aspire to reveal the regulatory role of NMD at isoform-specific level and generate a resource that will enable the study of features that render a specific mRNA sensitive to NMD.
Evangelos is a postdoc in Oliver Mühlemann’s group at the Department of Chemistry and Biochemistry, University of Bern, Switzerland. He is interested in post-transcriptional mRNA regulation in mammalian cells and applies nanopore sequencing to identify endogenous mRNAs that are sensitive to nonsense-mediated mRNA decay. He is a biochemist from Greece with background in transcriptomics, translation termination and RNA decay.
Genomics from roadkill - high quality mammalian genomes using hybrid assembly with MinION long reads
With thousands of fatalities due to car collisions with wildlife reported each year, roadkill are an underexploited resource in genomics. Here we show that mammalian roadkill samples could be used as a suitable source of DNA for long-read sequencing using the MiniON device for two carnivoran species frequently encountered along South African roads: the bat-eared fox (Otocyon megalotis) and the aardwolf (Proteles cristatus). For both species, hybrid assembly of 150PE Illumina reads at ~85X coverage (~215 Gb) and MiniON long reads at ~12X coverage (~30 Gb) using the MaSuRCA assembler provided genomes with high contiguity (~10,000 contigs with N50 of ~700 Kb) and completeness (>90% of complete BUSCOs). We further demonstrate that about 90% of the 14,509 single-copy orthologous genes of the OrthoMaM database could be successfully retrieved from these assemblies. These figures compare favourably with current mammalian genome assemblies and set our genomes among the best carnivore genomes currently available. This cost-effective strategy to obtain high quality reference mammalian genomes opens the way for large-scale population genomic studies of mammalian wildlife using resequencing of samples collected from roadkill. We illustrate the potential of the approach for genome scale species delimitation in both species for which subspecies have been defined based on disjunct distributions and morphological differences.
Frédéric Delsuc is Research Director at the French National Centre for Scientific Research (CNRS), working in the Institute of Evolutionary Sciences at the University of Montpellier. He received his PhD in molecular phylogeny from the University of Montpellier, then worked on mammalian and tunicate phylogenomics during post-doctoral positions in New Zealand and Canada before coming back to Montpellier as a permanent CNRS researcher. He is currently directing the ERC ConvergeAnt project aimed at understanding convergent evolution in ant-eating mammals through an integrative approach combining morphology, genomes, and microbiomes. The project team has adopted nanopore sequencing technology using the MinION to produce long-reads combined with Illumina short-reads to assemble mammalian genomes mostly from roadkill animals.
Long reads reveal small scale genome structural variations in Brassica napus
In this era of climate change and global warming it is our responsibility as the scientific community to find sustainable ways for meeting our energy and fuel requirements. Canola, Brassica napus, based biodiesel provides a perfect alternative to the use of fossil fuels and can help us cut our Greenhouse gas emissions by up to 90%. In order to counteract the ever-increasing demand for fuel and energy, it is crucial to maintain a high yield for this crop without generating a huge environmental footprint. However, B. napus is a very complex genome originating from an inter-specific hybridization event between Brassica oleracea (Mediterranean cabbage) and Brassica rapa (Asian cabbage or turnip). Due to high levels of homology between the two sub-genomes, making up the Canola genome it is extremely difficult to identify the novel genome polymorphism underlining important traits such as yield, disease resistance and abiotic stress tolerance. Next generation genome sequencing had been a game changer when it comes to deciphering complex quantitatively inherited traits in B. napus. However, the resolution offered by the second-generation sequencing technologies, such as Illumina sequencing, was severely limited due to the small size of the sequencing reads. With Oxford Nanopore technology it is now possible to zoom into the Canola genome to identify gene level structural variation associated with key traits such as yield. We have sequenced 4 Canola genotypes using nanopore technology and identified insertions and deletions ranging from 50bp to 10,000bp in genes involved in a plethora of important traits like disease resistance, flowering time etc. This knowledge will enable us to engineer a future ready Canola plant.
Harmeet Singh Chawla is a PhD student in the Department of Plant Breeding at the Justus Liebig University Giessen. Harmeet completed a MSc in Agro-biotechnology at JLU Giessen, and is interested in studying the impact of genome structural variations on eco-geographical adaptation and various other agronomically important traits in Brassica napus, Canola.
Direct sequencing of nascent RNA exposes splicing kinetics and order
Human genes contain many long introns with degenerate sequence information at splice sites, requiring sophisticated mechanisms to locate and coordinate the excision of multiple introns within the same pre-mRNA transcript. Fundamental aspects of this process remain unexplored due to a lack of quantitative approaches that monitor RNA processing as transcripts are produced. Here we performed nanopore sequencing of nascent, or newly synthesized, RNA to directly probe the timing and patterns of mRNA splicing. Direct RNA sequencing by the Oxford Nanopore Technologies MinION reveals the native context of long RNA molecules from 3’ to 5’ without amplification-associated biases. By combining direct RNA nanopore sequencing with stringent purification of nascent RNAs, we measure both the active transcription site (nascent RNA 3’ ends) and the splice isoform of single RNA molecules as they are transcribed. Application to human K562 cells reveals that co-transcriptional splicing occurs after RNA Polymerase II has transcribed several kilobases past the 3’ splice site of most introns. We also observe that the order of intron removal is not influenced by transcription direction in human cells. By contrast, we analyzed nascent RNA from Drosophila S2 cells, which have a different gene structure, and found that co-transcriptional splicing occurs more rapidly and in the order of transcription. Treating cells with the splicing inhibitor Pladienolide B abolishes co-transcriptional splicing in both species. Altogether, directly sequencing nascent RNAs through nanopores exposes critical molecular processes that occur during transcription in living cells.
Heather received her bachelor’s degree in Molecular Biology from Princeton University where she worked in Laura Landweber’s lab studying ciliate genome rearrangement. She then spent a year in Bob Langer’s lab at MIT developing a device to predict drug sensitivity in solid tumors. She is currently completing her PhD in Genetics at Harvard University in the lab of Stirling Churchman, working with nascent RNA and nanopore sequencing. She is interested in all aspects of RNA biology and tools to measure co-transcriptional processing.
Mobile Malaria Project
Despite reductions in malaria prevalence in the last two decades, the World Health Organization still reported an estimated 435 thousand deaths in 2017, the majority occurring in children under the age of five. Moreover, continued progress is threatened by emerging drug and insecticide resistance. Our team won the 2019 Land Rover Bursary, supported by the Royal Geographic Society, on a proposal to convert a 2019 Land Rover Discovery into a mobile sequencing lab and drive it 6300km across Africa, from the Atlantic to Indian Ocean. During our journey, we met with local research teams and policy makers striving to combat malaria, and produced materials aiming to raise public awareness and keep malaria on the global development agenda. My role in the project was to develop and pilot the mobile lab which, with local collaborators, we used to sequence antimalarial resistance genes in Zambia, and whole mosquito genomes in Kenya. We hope our project promotes the feasibility of a decentralized approach to pathogen and vector sequencing and marks the beginning of long-term collaborations incorporating in-country nanopore sequencing with policy-directed malaria research.
Jason is reading for his DPhil in statistical genetics at the University of Oxford, focusing on how malaria genetic data can be leveraged to support malaria control. He is enrolled as part of the four-year Genomic Medicine and Statistics Programme funded by the Wellcome Trust. Prior to Oxford, he studied biochemistry at the University of Toronto, graduating with an H.BSc. and an M.Sc. He joined the Mobile Malaria Project in June 2018 to develop the lab and bioinformatic pipelines that were deployed in field settings during their trip across Africa.
Tapestry: assessing small eukaryotic genome assemblies with long-reads
Assemblies of small eukaryotic genomes using long-reads are often close to complete. However, these assemblies remain difficult to validate, especially when genomes have complex features such as large inversions, translocations, ploidy variations, and where chromosome number may not be known. While many tools for assessing assemblies with short-reads exist, long-reads have far greater power for confirming the accuracy and completeness of contigs. I will present Tapestry, a tool for validating the contigs of a small assembly automatically and visualising the contigs so the structure of the assembly can be refined before polishing. I will show how Tapestry has helped us to resolve the complex genomes of several small eukaryotes.
John Davey is a bioinformatician at the University of York, working in the Department of Biology Technology Facility. He received his PhD from the University of Edinburgh and then worked with Mark Blaxter and Edinburgh Genomics during the development of Illumina sequencing, developing methods for analysing Restriction-site Associated DNA (RAD) Sequencing data, among many other things. He then held a fellowship at the University of Cambridge, working with Chris Jiggins on speciation of Heliconius butterflies, completing a chromosomal genome assembly of H. melpomene. He now works on a wide range of genomes and metagenomes at York, mostly trying to figure out how to turn raw nanopore sequence into completed genome assemblies.
The Three Peaks Challenge and developing extraction methods suitable for long-read, ultra-deep stool metagenomics on the PromethION
At present, most metagenomic surveys are performed using short-read sequencing. This approach limits the specificity of taxonomic assignment and result in highly fragmented assemblies. Single molecule sequencing platforms are able to sequence much longer molecules and the output of these platforms, particular the PromethION from Oxford Nanopore, now supports the study of complex microbial communities using shotgun metagenomics. We assessed a variety of commercially available and manual extraction methods using both a ten-species mock community and clinical samples of stool to find a method capable of generating ultra-long reads (>100 kb). Neither bead-beating or column-based extraction methods were found to support reads of the desired length and moving to magnetic bead and manual extraction methods allowed significant improvements in read-length. We also demonstrated the power of using solely chemical and enzymatic cell lysis methods for extracting high-molecular weight DNA from recalcitrant organisms, such as Gram-positive bacteria and fungi, over popular physical disruption methods. Development of these methods is critical to support the growing field of clinical microbiome research, including the ability to perform strain tracking and produce high-quality metagenome assembled genomes (MAGs) from metagenomic samples.
Joshua is a molecular biologist specialising in sample preparation for nanopore sequencing. He is a post-doc in Nick Loman's lab at the University of Birmingham which explores the use of cutting-edge genomics and metagenomics approaches to the diagnosis, treatment and surveillance of infectious disease. In March 2015 he travelled to Guinea in West Africa with the MinION to establish the first mobile laboratory performing viral surveillance of Ebola virus during an epidemic. Later he developed a tiling, multiplex PCR method for sequencing Zika virus from low-titre clinical samples used during the 2016 outbreak in Brazil. He also developed the popular ultra-long read sequencing method used to assembly the E. coli genome in 8 reads, sequence the human genome with an N50 > 100 kb with read lengths up to 882 kb and later to generate the first telomere-to-telomere assembly of the X chromosome. He is currently working on methods to perform untargeted sequencing to bring rapid, pathogen identification out of the lab and into the clinic.
Single cell isoform profiling, 10xGenomics scRNA-seq and nanopore long read sequencing
Single cell transcriptome sequencing has become a powerful tool for high-resolution analysis of gene expression in individual cells. However, current high throughput approaches only allow sequencing of one extremity of the transcript (transcriptome profiling). Information crucial for an in-depth understanding of cell-to-cell heterogeneity on splicing, chimeric transcripts and sequence diversity (SNPs, RNA editing, imprinting) is lost. Here we present an approach that uses Oxford Nanopore sequencing with unique molecular identifiers to obtain error corrected full length single cell sequence information with the 10xGenomics single cell isolation system and apply it to examine differential RNA alternative splicing and RNA editing events in the embryonic mouse brain.
Kevin Lebrigand is Head of Bioinformatics at UCAGenomiX, the functional genomics platform of Nice-Sophia-Antipolis, one of the core nodes of the "France Genomique" network, using next generation sequencing to perform a broad range of sequencing projects such as de novo genome assembly, RNA-seq, small RNA-seq, CHIP-seq and CLIP-seq. In 2014 the platform decided to focus their expertise on methodological developments around single cell transcriptomics using the Fluidigm C1, and more recently the 10xGenomics Chromium system, on which more than 120 samples has been profiled. Last summer Kevin acquired a PromethION long-read sequencer to perform isoform-level profiling at the single cell level.
Retrotransposon variation in human genome and tumorigenesis
Retrotransposons are transposable genetic sequences that copy themselves into an RNA intermediate and insert elsewhere in the genome by reverse transcription. Almost half of the human genome is derived from transposon derived sequences but only some dozens of full length Long Interspersed Nuclear Element-1s (LINE1s) in the human germline are expected to be retrotransposition competent. Mapping and genotyping retrotransposons with short-reads is complicated due to their size and high copy number of their consensus sequence in the reference genome, so we applied nanopore sequencing to study multiple aspects of LINE1 retrotransposition. First, we sequenced Inverse-PCR products with MinION detecting highly subclonal insertion sites of a particular LINE1 element in two colon cancer tumors. Second, we have whole genome sequenced few germline and tumor genomes with PromethION, detecting the whole range of retrotransposon insertions that are variable within humans or inserted during tumorigenesis. Finally, we studied DNA methylation around the LINE1 insertion and source sites in human tumors in order to understand the mechanisms of LINE1 activation in during tumorigenesis. Presently we are extending these studies to ~300 whole genomes of Uterine Leiomyoma tumors and their respective normal sequences.
Kimmo Palin completed his PhD in Computer Science in University of Helsinki in 2007, focusing on comparative modelling of mammalian gene enhancer elements. Between 2008 to 2012, he was a postdoctoral fellow at the Wellcome Trust Sanger Institute in Hinxton, UK, working on human genetics and genome sequencing. Since 2012, he has been working as a staff scientist in University of Helsinki with Prof. L. Aaltonen and Prof. J. Taipale on tumor genetics and genomics, including copy number variation, mutational signatures, gene regulation and chromatin structure.
Nano-C: targeted poly-contact chromatin interactions for comprehensive profiling of cell-to-cell variation of 3D genome organization
Three-dimensional (3D) genome organization is an essential aspect of genomic function. Experimentally, the spatial association of regulatory elements with their targets and the characterization of complex relationships in spatial genome architecture have emerged as key challenges. To unravel these structures, Chromosome Conformation Capture (3C) methods are used, which are based on the principle that cross-linked interacting genomic regions can be cut and re-ligated such that sequences in close physical proximity become frequently joined together. The resulting ligation junctions thus reflect the 3D organization of the genome at the time of fixation. However, common short-read Illumina-based 3C methods generate pairwise interactions present within the library of re-ligated fragments (3C library), thus limiting the ability to untangle complex multiway interaction networks. For instance, if different interactions happen simultaneously, exclusively or in subsets, cannot be discerned by methods based on pairwise interactions. The ultimate aim of my project is to determine if the CTCF insulator protein, a major architectural protein in the mammalian nucleus, functions by structuring mostly homogeneous or heterogeneous chromatin domains, particularly TADs (Topologically Associating Domains). To address this question, I have developed “Nano-C”, a method combining 3C with nanopore sequencing. Nano-C uses in-vitro transcription of long DNA molecules in a 3C library followed by direct-RNA nanopore sequencing, thereby enabling the detection of up to 10-15 uniquely mapped interactions from defined genomic loci that occur within single cells. Analysis of such poly-contact chromatin interactions, both intra- and inter-chromosomal, in hundreds of cells at a time allows the uncovering of complex 3D genome architecture. I will discuss our recent Nano-C results that provide a first measure of TAD border heterogeneity when the CTCF architectural protein is bound.
Li-Hsin Chang is a postdoctoral researcher at the Chromatin Dynamics group led by Dr. Daan Noordermeer at the Institute for Integrative Biology of the Cell (I2BC) in France. Li-Hsin received her PhD in cellular and developmental biology in 2017 from the University of Illinois at Urbana-Champaign, where she studied the function and regulatory landscape of zinc-finger transcription factors with Professor Lisa Stubbs. Her current research interests focus on 3D chromatin organization, aiming to uncover the cellular heterogeneity of Topologically Associating Domains (TADs). To this end, she has developed a new method “Nano-C”, which combines chromosome conformation capture (3C) technique with nanopore sequencing for detecting poly-contact chromatin interactions.
An international collaborative effort for infectious disease analyses using MinION
GRAID is the Global Research Alliance for Infectious Disease, a collaborative international effort for infectious disease research using MinION. In this framework, we try to educate researchers and develop the methods and guidelines for field analysis of many aspects of infectious disease. We have conducted four summer schools in three developing countries: Thailand, Indonesia, and Kenya, as part of our efforts to introduce MinION in those countries. We are collaborating with researchers and have produced papers about serotype identification of dengue virus and comprehensive drug resistance identification of malaria parasites. Using MinION and isothermal amplification, we identified serotype of dengue virus in Manado, Indonesia and Hanoi, Vietnam. We found that this method simplifies the amplification of the virus nucleic acid by using only blood or serum and a water bath or a thermal block prior to library preparation. We analyzed 141 Indonesian and 80 Vietnamese patients. The overall successful detection rate was 79% and it depends largely on the viral titer. We also determined that the serotype of dengue virus is different in Indonesia and Vietnam, which is DENV1 and DENV3, respectively. Our next collaborative project is to comprehensively describe the drug resistance of malaria parasites in Indonesia, Vietnam, and Thailand. Here, we used PCR to amplify nine genes correlated to the drug resistance phenotype. We sequenced 118, 11, and 5 samples from Indonesia, Thailand, and Vietnam in multiplex manner and described the drug resistance pattern in each country. We found a position in K13 gene non-propeller region mutated quite frequently from our Indonesian samples. Although we believe that the mutation is not related to artemisinin resistance, we think that the parasites may be on selective pressure due to the artemisinin administration in the region. We also are working with bioinformaticians to develop a graphical user interface tools for researchers or clinicians who are unfamiliar with bioinformatics analysis. We have published Nano Pipe, which serves as an easy to use MinION data analysis tool for a ‘regular’ user. We have ongoing and prospective projects in the framework, such as HLA typing in the severe dengue patients, identification of unknown fever-causing pathogens, and determining the drug resistance pattern in HIV. We are confident that our consortium will make an impact in the infectious disease community to switch to sequencing in the research context while laying some foundations for preventive or therapeutic medicine in the future.
After graduating from the Sam Ratulangi University Manado in Indonesia, Lucky Ronald Runtuwene worked as a medical doctor in a community health center on the island of Siau. There he became interested in infectious diseases, and so decided to pursue a research career in the field. He completed a PhD in Japan, where he probed the gene expressions of a vector mosquito infected with dengue virus followed by a post-doc in the University of Tokyo, where his laboratory was one of the early adopters of the MinION technology. Lucky is interested in field work and so the portability of the MinION and ease of processing aid his research, leading to the conception of the GRAID consortium which he will introduce at London Calling 2019.
Mapping DNA replication using nanopore sequencing
We have harnessed nanopore sequencing to study DNA replication genome-wide at the single-molecule level. Using in vitro prepared DNA substrates, we characterized the effect of bromodeoxyuridine (BrdU) substitution for thymidine on the MinION nanopore electrical signal. Using a neural-network basecaller trained on yeast DNA containing various amount of BrdU, we identified BrdU-labelled tracts in yeast cells synchronously entering S phase in the presence of hydroxyurea and BrdU. As expected, the BrdU-labelled tracts coincided with previously identified early-firing, but not late-firing, replication origins. After BrdU pulse-labelling of asynchronous cells, we could detect and orientate dozens of thousands of individual replication tracts. This allowed us to reproduce RFD profiles obtained by OK-seq and to map thousands of initiation events, the vast majority of them coinciding with the well characterized ARSs. These results open the way to high-throughput, high-resolution, single-molecule analysis of DNA replication in many experimental systems.
Magali Hennion completed a PhD in molecular biology in Toulouse with Dr Emmanuel Käs and Dr Olivier Cuvier, where she worked on chromatin organisation and insulator proteins. She then moved to Göttingen in Germany where she worked with Dr Steven Johnson in a postdoc internship on chromatin changes during stem cell differentiation, before a second postdoc with Stefan Bonn, where she focussed on epigenetic changes associated with memory formation and maintenance in the mouse brain. Since 2016, Magali has worked in Dr Olivier Hyrien's team in Paris, where she is developing new techniques to study DNA replication based on nanopore sequencing.
Metagenomics of India's largest River Ganges confluence at Prayagraj, India using MinION sequencing
River confluence and their microbial dynamics are least explored throughout globe. River Ganges is one of the most important and holy rivers of India and has a great mythological history and important for mass bathing events. Mainly Yamuna River is known to meet Ganges to form a confluence at Prayagraj (Allahabad), India. However, the influence of Yamuna river on taxonomic and functional profiling of microbial communities at the confluence of Ganga and Yamuna and in the succeeding downstream of confluence remain uncharacterized. Therefore, in year 2017 we undertook mega study under the directives of Government of India’s River cleaning mission (under the aegis of National Mission for Clean Ganga) and network program led by Council of Scientific and Industrial Research (CSIR) and Indian Council of Medical Research (ICMR) to understand India’s largest and sacred river i.e. Ganges. Water and sediment samples collected from Ganga-Yamuna confluence were processed for deciphering microbiome diversity and taxonomic functions using MinION sequencing technology. Preliminary investigations of confluence microbiome at Parayagraj revealed similar taxonomic (bacterial, fungal, archeal, and phages) and functional (resistomes) microbial profiles in the upstream (of Ganges) and farther downstream sites of the confluence revealed a transient influence of Yamuna River on holy River Ganges. Overall, Ganges River harbors plethora of microbial diversity and hidden treasure of functional potential that could be useful to depict non-putrefying properties of this river.
Mahesh Dharne is a scientist in National Collection of Industrial Microorganisms (NCIM) at CSIR-National Chemical Laboratory, Pune, India. He has established state-of-art facilities include molecular identification systems (like Sanger sequencing, next generation sequencing) and biochemical identification systems (like VITEK2 and VITEK-MS), which are usually required for industries and academia. His research interests are into environmental and industrial microbiology.
Using full-length transcript sequencing to reveal the fate of mRNA in aging seeds
After seeds mature on the mother plant, they contain all the molecular machinery they will need for germination. Generally, some time elapses between maturation and germination. At the National Center for Genetic Resource Preservation, this time may be months to decades, or, optimistically, centuries. But, during this time, seeds eventually lose the ability to germinate. One explanation for this change is that the molecular machinery has gradually accumulated damage. When trying to assess the health of a seed lot, the standard method is to use a subset of seeds for a germination test, but these results are not straightforward to interpret. We hypothesized that accumulated damage at the molecular level could be quantified as an independent measure of seed lot health and chose to examine RNA. We showed that integrity of total RNA declines with storage time in seeds of many species. To show whether mRNA was similarly affected, we compared transcript integrity in 23-year-old and 2-year-old soybean seeds by sequencing full-length cDNA using MinION. In 23-year-old seeds, certain transcripts were only partially sequenced, and we confirmed that this was because of transcript fragmentation at random sites. We quantified transcript degradation for all transcripts and found that degradation increased with transcript length. This result supports the hypothesis that random damage accumulates at the molecular level. We now anticipate using the integrity of long transcripts to assess seed health over time.
Margaret Fleming obtained her PhD in Botany from Colorado State University in 2015 for her work on the role of the structural cell wall protein extensin in biomass recalcitrance in the context of biofuel production. Margaret then completed a postdoc with Dr Christina Walters at the National Laboratory for Genetic Resource Preservation, where she studied how time and environment affect seeds of both cultivated and wild plants, focusing on the interrelationship of RNA degradation and seed viability. Her current work with Dr Chris Saski focusses on the transcriptomic effects of Armillaria (root-rot) infection of susceptible and resistant peach rootstock and will soon join the lab of Dr Marjorie Weber at Michigan State University to study the evolution of mite domatia in Vitis (grape).
Using long-read nanopore sequencing to unravel structural genomic variations in plants
Transposable elements (TEs) are mobile DNA elements potentially able to move and multiply within genomes. TEs account for 40% of total size genome in the cultivated rice, Oryza sativa, and 10% in Arabidopsis thaliana and their insertion polymorphisms are responsible for most of the structural variations between varieties or ecotypes of these two species. Although large genomic datasets are available to study the diversity within these two species, most of these datasets consist in Illumina short read files and not in fully assembled genomes. To detect TE insertion polymorphisms (TIPs) in these large datasets, we developed a new software, TRACKPOSON and applied it to characterise TIPs for 31 families of retrotransposons in 3,000 cultivated rice varieties. We used DNA nanopore sequencing to unambiguously validate our in silico results. We further took advantage of cDNA nanopore sequencing to analyze structural variation within transcripts in rice and Arabidopsis and to improve transcriptome annotation with alternative splicing detection. Moreover, for the first time we could detect long reads corresponding to entire TE transcripts.
Marie-Christine Carpentier graduated from University of Paris VII Diderot in 2009, then spent two years at the Laboratory of Biometry and Evolutionary Biology (LBBE) in Lyon as a bioinformatician, working on invertebrate RNASeq analyses. Marie-Christine then moved to the Plant Genome and Development laboratory in Perpignan where she currently works as a bioinformatician. She is also a member of Olivier Panaud’s team, where she develops new concepts and techniques for the analysis of structural variations in plant genomes. This research is increasingly involving the use of nanopore long-read technology, both at genomic and transcriptomic levels.
Long-read sequencing technologies resolve most dark and camouflaged gene regions
Complex genomes, including the human genome, contain ‘dark’ regions that standard short-read sequencing technologies do not adequately resolve, including protein-coding genes, leaving many variants that may be relevant to disease entirely overlooked. We systematically identified gene regions that are ‘dark by depth’ (few mappable reads), and others that are ‘camouflaged’ (ambiguous alignment). More than 100 protein-coding genes are 100% camouflaged using standard short-read sequencing. Many known disease-relevant genes are also camouflaged, including CR1, a top Alzheimer’s disease gene, and other disease-relevant genes include NEB, SMN1 and SMN2, and ARX. We further assessed how well long-read technologies resolve these regions, including 10x Genomics, PacBio’s Sequel, and Oxford Nanopore PromethION (Cliveome v. 3.0). We found that long-read technologies largely resolve the camouflaged gene regions, making it possible to identify mutations that may be important in human disease.
Dr. Ebbert is an Assistant Professor of Neuroscience at the Mayo Clinic with a background in computational biology and bioinformatics, focusing on Alzheimer’s disease, amyotrophic lateral sclerosis (ALS), frontotemporal dementia (FTD). He also has experience in genomics studies and analyses, algorithm design, and statistics. He has published in respected journals across cancer, bioinformatics, and Alzheimer’s disease, and recently published a manuscript demonstrating that long-read technologies can traverse the challenging C9orf72 ‘GGGGCC’ repeat expansion.
Finding disease-causing complex mutations
Various kinds of complex mutation (e.g. tandem-repeat expansion/contraction, homologous recombination, chromosome shattering, virus/transposon insertion) are known to cause diseases but have been neglected because they are hard to identify. I will describe our pipeline to find such mutations in patients. We perform whole-genome nanopore sequencing, find and group reads with structural differences from a reference genome, then de-prioritize differences shared by humans without the disease. We find most probable alignments between reads and reference, allowing for arbitrary rearrangements, based on probabilities of nucleotides, substitutions, insertions, and deletions. Our pipeline discovered the cause of neuronal intranuclear inclusion disease: expansion of a GGC tandem repeat in NOTCH2NLC. We can fully characterize complex congenital mutations caused by chromosome shattering. Some important properties of these rearrangements, such as sequence loss, are holistic: they are not present in any part of the rearrangement but are apparent only in the rearrangement as a whole.
Martin has a cross-appointment as a researcher at the AIST Artificial Intelligence Research Center, and as a Professor at the University of Tokyo Department of Computational Biology and Medical Sciences. He is broadly interested in analyzing genetic sequences to understand the information encoded in them, their evolutionary history, and their role in disease. He studied Physics and Philosophy at University of Oxford, Mathematics at University of Cambridge, taught English in Beijing, completed a PhD in Bioinformatics at Boston University, and carried out postdoctoral research at the University of Queensland in Australia and RIKEN in Japan.
Dissecting RNA biology, one molecule at a time
The human transcriptome is highly diverse and complex, as evidenced by cell-type specific expression of unique transcript isoforms. In particular, transcripts derived from non-coding regions are the most qualitatively diverse class of genetic elements, encompassing over 70% of the genome. In contrast, about 80% of GWAS SNPs reside in non-coding regions, which suggests long non-coding RNAs may be the missing link, at least to some degree. I will present our recent work on high-resolution cDNA sequencing of non-coding regions associated with neuropsychiatric disorders using targeted sequencing on the PromethION. I will detail the discovery of new non-coding RNAs, mRNA isoforms, long-range exon dependencies, and how these relate to mental health and neurodegeneration. Given the apparent involvement of RNA modifications in neurological diseases, I will then segue into direct RNA sequencing and describe our in vitro strategy to train RNA base callers and detect modified RNA bases. This will include data from our recent preprint on detecting m6A in RNA with 90% accuracy. Finally, I will present how we maximise flow cell output by complementing an innovative RNA barcoding strategy and 'AI'.
Martin Smith is Head of the Genomic Technologies program at the Kinghorn Centre for Clinical Genomics, located at the Garvan Institute of Medical Research in Sydney, Australia. He has been using nanopore sequencing since 2014, with a heavy focus on transcriptomic applications. Martin is a computational biologist from Canada with a background in genomics, microbiology and immunology.
Leveraging long reads for high-throughput multiomic analyses of cellular diversity in human tumours
Plant de novo genome sequencing and assembly using Oxford Nanopore Technology
Oxford Nanopore sequencing technology has made it possible to sequence and de novo assemble plant genomes at relatively low cost with fast turn-around times. Continuous improvements in the technology as well as our optimization of DNA extraction, size selection and library preparation in combination with current long-read assemblers enable us to assemble larger plant genomes to a high-quality draft level. Using Hi-C we are further able to improve those high-quality contiguous genomes to chromosome-scale assemblies. This talk will highlight the progress we made in the past 3 years with this technology on the examples of the genomes of Solanum pennellii, Vinca minor, Physalis ixocarpa and Physalis alkekengi and also show how genes can be easily structurally annotated in de novo genome assemblies using the Oxford Nanopore RNA-seq technology. The low error-rate of these gene models derived from polished nanopore assemblies also allows for high-throughput functional annotation with the Mercator 4 pipeline.
Maximilian Schmidt was awarded a BSc in Biotechnology from the University of Cooperative Education Riesa in 2012, before moving to RWTH Aachen to complete a MSc, where he studied genes involved in plant cell wall biosynthesis. He is currently a PhD student with Prof. Usadel at RWTH Aachen where he is interested in de-novo plant genome sequencing.
Untangling heterogeneity in DNA replication with nanopore sequencing
Genome replication is a stochastic process whereby each cell exhibits different patterns of origin activation and replication fork movement. Despite this heterogeneity, replication is a remarkably stable process that works quickly and correctly over hundreds of thousands of iterations. Existing methods for measuring replication dynamics largely focus on how a population of cells behave on average, which precludes the detection of low probability errors that may have occurred in individual cells. These errors can have a severe impact on genome integrity, yet existing single-molecule methods, such as DNA combing, are too costly, low-throughput, and low-resolution to effectively detect them. We have created a method called D-NAscent that uses Oxford Nanopore sequencing to create high-throughput genome-wide maps of DNA replication dynamics in single molecules. I will discuss the informatics approach that our software uses, as well as questions pertaining to DNA replication and genome stability that our method is uniquely positioned to answer.
Michael Boemo is a postdoctoral research assistant in the Sir William Dunn School of Pathology at University of Oxford with Professor Conrad Nieduszynski, and currently holds the Emanoel Lee Junior Research Fellowship at St. Cross College. Michael completed his PhD in condensed matter physics in 2016 at the University of Oxford where, together with Professor Andrew Turberfield and Professor Luca Cardelli, he developed a computing system comprised of autonomous robots made from DNA. Dr Boemo is interested in developing computational methods to study systems biology, and his current work aims to develop methods to study DNA replication dynamics at single-molecule resolution and a new process algebra for the simulation of biological systems.
Deep transcriptomic sampling with long-read single cell RNA sequencing
Single cell RNA-seq (scRNA-seq) is rapidly gaining favour for understanding biology. Long-read methods have great potential for scRNA-seq by facilitating the identification and quantification of gene isoforms, allowing cell specific variation in isoform expression and splicing to be characterised. Previous methods for transcriptome-wide long-read scRNA-seq have been limited by low numbers of cells and/or few reads per cell. We have developed an improved nanopore long-read scRNA-seq method that allows the profiling of hundreds of single-cells at high read-depth and demonstrate its use by profiling five human cancer cell lines. These results demonstrate the power of long-read sequencing to characterise gene and isoform expression in single cells.
Mike Clark is head of the Transcriptomics and Neurogenetics group at the University of Melbourne in Australia. His research sits at the intersection of genomics and neuroscience, utilizing a number of genomic approaches, including nanopore sequencing, to investigate gene expression and function in the human brain and in neuropsychiatric disorders.
Generating high-quality reference human genomes using PromethION nanopore sequencing
To catalogue and associate all forms of human genetic variation to health and disease, a new generation of genome sequencing and assembly technologies is required. However, current workflows for producing high-quality human genome assemblies have overall cost and production time bottlenecks that prohibit scaling to hundreds of individuals. We designed and evaluated an optimized PromethION-based workflow to produce near reference quality genome assemblies for the offsprings from ten parent-offspring trios. We demonstrate the production of long read, high-quality, and high-coverage genomes with a less than one-week total turnaround time from sample extraction to complete assembly, and a total projected cost of less than $10k per genome. To lower costs and improve quality we have developed three new tools: 1) Shasta - a nanopore de novo long read assembler that on a single compute node can produce complete human genomes in around 6 hours; 2) marginPolish - a new graphical model-based assembly polisher that improves on earlier methods in both cost and accuracy; and 3) HELEN - an RNN-based multi-task learning model that further refines the base and run-length prediction for each genomic position and produces state-of-the-art results. We evaluate the performance based on assembly accuracy, throughput/timing, and cost and demonstrate improvements relative to current best-of-breed in all areas. Recognizing that even 100kb reads are insufficient to scaffold through the most repetitive regions of the human genome, we augment this sequencing with a Hi-C long-range library to facilitate scaffolding and haplotype phasing.
Miten is an Assistant Research Scientist at the University of California, Santa Cruz. His research interests include developing methods for long-read sequencing of DNA and RNA, methods for detection of base modifications, and software for analysis of MinION and PromethION data.
Characterizing large homology directed repair (HDR) insertions by CRISPR/Cas9 using MinION long-read sequencing technology
Precise genome editing by the CRISPR/Cas9 system has proven to be ground-breaking in basic research. Cas9 protein is increasingly being used for genome editing by direct transfection of an active guide RNA Cas9 ribonucleoprotein (RNP) complex into cells which introduce double-stranded breaks (DSBs) at targeted genomic loci. DSBs are repaired by endogenous cellular pathways such as non-homologous end joining (NHEJ) and homology-directed repair (HDR). Providing a ssDNA template during repair allows researchers to precisely introduce a desired mutation by utilizing the HDR pathway. However, rates of HDR are often low compared to NHEJ-mediated repair and analysis of large (>100 nt) insertions can be challenging. Long read sequencing technology allows for a more comprehensive analysis of the outcome of large insertions or deletions created by CRISPR/Cas9 genome editing. Here, we use a target enrichment approach to selectively sequence a region of interest (ROI) around the CRISPR edited site to measure the rates of precise insertion by HDR.
Mollie Schubert is a Research Scientist in the molecular genetics research group at Integrated DNA Technologies. Mollie received her master’s degree in biochemistry from Iowa State University and has been at IDT since 2013. For the past five years, she has focused on studying CRISPR gene editing, including high-throughput screening of CRISPR-Cas9 guides for the development of a site selection tool, optimizing the composition and delivery of synthetic RNA reagents complexed to recombinant CRISPR nucleases, and developing methods for efficient gene editing with a recent focus on improvements to homology directed repair.
Nanopore sequencing and analysis of plant pathogenic viruses: more than just rapid diagnostics?
The use of the portable MinION sequencer in plant pathology is rapidly increasing. Many studies have shown that the accuracy, portability and reduced time to result using the MinION are actively changing the way we do diagnostics and new diagnostic development for pests and diseases in agriculture. The first advantage of the MinION is clearly the ability to obtain rapid preliminary IDs of unknown pests and disease in the field. This was demonstrated recently by the Cassava Virus Action Project, taking just 4 hours to identify the virus present in symptomatic cassava plants in the field. The other advantage that can be overlooked is the opportunity to reduce the turn-around time to diagnosis for unknown pests and pathogens in a laboratory setting, as well as avoiding the need for expensive specialised equipment. While much has been made of the advances in in-field diagnostics, we wanted to answer the question of how does the data stack up in a laboratory setting when compared with other technologies? The small start-up costs and easy access to MinION sequencing, compared to other technologies makes it very attractive to plant virologists. Our team took a set of plant RNA samples from field pea with a known viral composition of a Potyvirus (Pea seed-borne mosaic virus - polyadenylated) and a Polerovirus (Turnip yellows virus – non polyadenylated), which we already had an Illumina data set for, and sequenced the samples using a cDNA and direct RNA kit on a MinION. We compared downstream analyses performed with both the Illumina and MinION data. The results of our research suggest that not only is MinION suitable for rapid diagnostics in the laboratory and the field, but it is also useful in a wider research capacity.
Dr. Monica Kehoe is a Plant Virologist and Molecular Plant Pathologist working for the Western Australian Department of Primary Industries and Regional Development (WA DPIRD) in the diagnostic and laboratory services section. Her current work focuses mainly on the development, validation and use of molecular methods for plant disease diagnostics across a broad range of broadacre and horticultural crops. Research interests include the cassava brown streak and mosaic viruses, luteoviruses in pulses and oilseeds, grapevine viruses, viruses of vegetable crops, supercomputing for plant disease diagnostics and the use of portable sequencing for rapid diagnostics in plant pathology, in both the field and the laboratory. Monica has a B.Sc from the University of Melbourne, Honours in Plant Virology from Murdoch University, and in 2014 completed her PhD in Plant Virology at the University of Western Australia.
Resolution of germline hereditary cancer structural variants using nanopore sequencing
Structural variants (SVs) are difficult to ascertain using short-read sequencing technology. As part of the Personalized OncoGenomics (POG) study, tumour and matched normal blood Illumina whole-genome sequencing was performed in patients with advanced cancers. We used Oxford Nanopore sequencing for validation and breakpoint resolution of four germline SVs in hereditary cancer genes: 1) ATM deletion, 2) NTHL1-TSC2-TRAF7 complex rearrangements, 3) IFT140-TSC2 inversion and 4) UIMC1-NSD1 complex rearrangements. The 12 breakpoints of these 4 SVs were seen in the nanopore data. Long-read sequencing was necessary for the resolution of SVs and corrected the initial interpretation in 3 out of 4 cases. Our results also showed the suspected IFT140-TSC2 large inversion to be a small intronic inverted duplication event that did not disrupt either gene. Our results suggest that short-read technology may not be sufficient for SVs assessment. Long-read sequencing technology may eventually be considered as an option for the detection and validation of clinically relevant germline SVs.
Dr. My Linh Thibodeau is currently training in the Medical Genetics Residency Program at the University of British Columbia. In 2017, she won entry to the Royal College of Physicians and Surgeons Canada Clinician Investigator Program to apply bioinformatic approaches to the discovery and characterization of hereditary cancer predispositions. During her work in the Personalized OncoGenomics study at BC Cancer in Vancouver, Canada, Dr. Thibodeau acquired expertise in the analysis and integration of whole genome and whole transcriptome datasets. Taken together with her medical training, these experiences have allowed Dr. Thibodeau to develop a unique clinical-bioinformatic skillset.
Extracting megabase DNA
Ultra-long DNA extraction and library prep protocols have generated sequencing reads of up to 2.3 Mb on MinION flow cells, with individual runs yielding multiple reads longer than 1 Mb and N50 values higher than 100 kb. Optimisation experiments are currently being carried out to increase flow cell yields and further improve read length statistics across all Oxford Nanopore platforms. Results shown will compare DNA extracted from a diverse range of organisms and sample types and include QC comparisons from different ultra-high molecular weight (ultra-HMW) extraction protocols including manual and automated approaches. We will also share progress updates comparing library preps from these samples run over Flongle, MinION and PromethION flow cells.
Nadine is currently working as the Senior Technical Specialist at Deep Seq, a multi-platform sequencing facility at the University of Nottingham. Becoming a specialist in Next Generation Sequencing (NGS) followed on from a research career in bacterial genomics that focussed on the early adoption of new sequencing and comparative genomics platforms. Deep Seq is a certified service provider for GridION and PromethION sequencing and routinely receive requests to extract DNA from a wide range of different organisms and sample types, for Oxford Nanopore long-read sequencing.
Ultra-long reads and ultra-long duplications: deciphering the mysteries of the Bordetella pertussis genome
In light of widespread resurgence of the respiratory disease whooping cough, ongoing research aims to identify changes to the causative bacterium, Bordetella pertussis. B. pertussis is traditionally described as a highly clonal species at the single-base level, hence our research largely focusses on identifying differences between strains on a whole-genome scale. Long-read sequencing has enabled us to produce closed genome sequences for B. pertussis isolates on an unprecedented scale, allowing visualisation of extensive inter-strain genomic rearrangements. This work also led to the unexpected discovery of a second phenomenon: large duplications which are present in some recent isolates but not in the B. pertussis reference genome. Intriguingly, these duplications may be present in only a fraction of the cells of duplication-carrying strains. At London Calling 2019, I will discuss this developing story, including the essential role of long and ultra-long nanopore sequencing in proving the existence of the duplications and characterising variable populations, alongside continuing work to quantify the phenotypic effects of the duplications.
Natalie Ring graduated from the University of Bath with a BSc in Biochemistry in 2012. She then spent four years working at MRC Harwell as a data wrangler for the International Mouse Phenotyping Consortium, as well as completing a post-graduate qualification in Science Communication from the University of Edinburgh. She is currently a PhD student at the University of Bath in the Bagby and Preston groups, studying the genome of Bordetella pertussis, the bacterium responsible for whooping cough.
Blood donor genotyping - how can long range sequencing help?
To ensure the safety of blood transfusions it is critical to match the blood type of both donor with the recipient. Current typing methods use monoclonal antibodies, however, reagents for rare blood groups are expensive, unavailable or unreliable. DNA-based identification of human blood groups has been used to overcome these limitations and its application has reduced rates of alloimmunisation in chronically transfused patients. While recent studies have shown the high degree of blood typing accuracy that can be achieved with modern high-throughput molecular techniques, structural variants and rare recombination events in the genome remain a source of error. Long range sequencing technologies can be leveraged to produce high quality haplotype reference sequences for the blood group encoding genes which can be used to improve current typing algorithms.
Nicholas Gleadall is a PhD student working in the laboratory of Professor Willem Ouwehand at the University of Cambridge. His work focuses on the genetics of human blood group antigens and development of techniques for high throughput, DNA based donor typing. Nicholas has previous experience introducing new technologies into clinical service by developing and validating diagnostic laboratory assays for large organisations such as Public Health England, where he worked on HIV whole genome sequencing for national surveillance and resistance genotyping, and NHS England on a project focussed on exome sequencing for diagnosis of rare human inherited disorders.
Nanopype: processing and quantification of short tandem repeats
The availability of substantially longer reads with the Oxford Nanopore approach opens new possibilities in many fields and explains the increasing use of the nanopore technology. To facilitate access and match storage as well as processing routines to the higher demand, we assembled Nanopype a modular, parallelized and easy-to-use pipeline to process the sequencing data from the raw signal output into standardized formats. Specifically, Nanopype facilitates the essential steps of base calling, quality control, and alignments, as well as various downstream applications by incorporating field-specific tools and complemented by custom utility scripts. To illustrate its application, we apply it to the assessment of short tandem repeats that have been implicated in neuropsychiatric disorders. Combined with a Cas12a-based enrichment strategy and the STRique package we show efficient targeting and quantification on raw signal level, as well as determination of the associated methylation status.
Pay received his MSc in Electrical Engineering from the Kiel University of Applied Sciences with a focus on embedded systems and hardware accelerated signal processing. He is currently a PhD student in Alex Meissner's lab at the Max-Planck-Institute for Molecular Genetics in Berlin. Pay is interested in the epigenetic regulation of the genome, direct base modification detection and developing tools and pipelines to process third generation sequencing data.
Nanopore direct RNA sequencing enables comprehensive transcriptome profiling and modification detection
RNA sequencing provides insight into the molecules that are actively expressed in a given tissue type at a certain point in time. Traditional RNA sequencing approaches generate cDNA from RNA templates using reverse transcriptase, usually with PCR amplification. Therefore, traditional approaches are subject to biases that are associated with the reverse transcriptase enzyme and PCR and are not able to assess RNA modifications. Oxford Nanopore’s new direct RNA sequencing approach does not depend on reverse transcriptase or require PCR, allowing users to overcome biases typically associated with their use. Here, we apply this direct RNA sequencing approach to interrogate two important human cell lines: a widely used lymphoblastoid cell line (GM12878) and induced pluripotent stem cells. Using this approach, we extensively sampled the transcriptome, sequencing ~3 million molecules in a single sequencing run. These reads reached a median length of 1kb, with maximum read lengths >13kb. More than 90% of these reads align to the human reference genome and existing transcriptome, allowing us to examine known transcript variants. Additionally, existing RNA modifications such as 6mA and 5mC can be readily detected using Tombo at known transcript locations. Analyzing this data in two distinct cell types allows us to discover new cell-type specific transcripts in coding and noncoding regions of the genome and assess their specific associated modifications. Overall, this direct RNA sequencing methodology allows for efficient, comprehensive transcriptome profile.
Rachel Goldfeder is a Computational Scientist on the Genome Technologies team at The Jackson Laboratory for Genomic Medicine. Her research interest is in using novel sequencing approaches to aid disease understanding, diagnosis, prognosis, and treatment. Rachel holds a BS in Biomedical Engineering from Washington University in St. Louis and a PhD in Biomedical Informatics from Stanford University.
From amplicons to metagenomes: Long read sequencing the environment
Complex environmental matrices, such as soil, sediment and excreta, are often synonymous with diverse microbial communities. Long read sequencing of DNA extracted from such communities can yield highly contiguous genomic data and provide information on both genetic composition and structure. However, DNA extracted from such matrices is often impure, fragmented and can potentially lack complete representation. Therefore, techniques such as isolation, enrichment and metagenomic assembly are used to answer questions on function and diversity. Here we present the sequencing and assembly of two environmental AMR harboring plasmids and one novel gc-rich genome isolated from the environment. Furthermore, we describe our exploration into the sequencing and analysis of long-range amplicon-based enrichment for AMR associated mobile genetic elements, and undertake metagenomic analysis of the community composition of two fractions of industrial anaerobic digesters. This has permitted us to investigate the evolution and selective drivers of AMR in the environment.
Dr Rob James is currently working as a post-doctoral research fellow with Prof. E. Wellington as lead investigator on the BBSRC funded project; “Mycobacterium bovis and the farmland ecosystem: understanding transmission dynamics between animals and the environment.” This collaborative project between the University of Warwick, the Zoological society of London and Imperial College, aims to identify the environmental reservoirs of infection in agricultural land use types, and routes of transmission between mammalian hosts and the environment. Furthermore, Rob has an interest in the evolution and selection of antimicrobial resistance genes in the environment and has recently undertaken work to quantify AMR gene abundances in farmland and residential areas of Karatchi and Islamabad.
Long-read sequencing and assembly of a large environmental blaCTX-M-15 harbouring plasmid
Infections caused by antimicrobial resistant bacterial pathogens are fast becoming an important global public health issue. Using next generation sequencing data of whole sediment and cultured fractions, our research group have identified wastewater treatment plants (WWTPs) as hotspots for the dissemination of antimicrobial resistance genes/bacteria (ARG/ARB) into the environment. Whilst WWTPs can remove up to 99.9% faecal coliforms, our results suggest that anaerobic digestors and the water treatment process positively select for ARG/ARB. The persistence of plasmid mediated ARGs outside of the host-associated system may play a compounding role in shaping the community-acquired resistome. The aim of our research is to understand the mechanisms of enzyme secretion in E. coli and determine why only ESBLs are in the exoproteome and no other beta-lactamases. Here we investigated the secretory mechanism of an ESBL-producing E. coli strain ST131 isolated from a UK water system. Strains of E. coli ST131 carrying multiple resistance genes, including blaCTX-M-15 (encoding extended spectrum beta-lactamase, ESBL), were isolated from the rivers downstream of WWTPs. We then quantified survival under prolonged anaerobic digestion in the presence and absence of selective antibiotics. We also confirmed if the gene was plasmid borne and studied the secretory mechanisms associated with all beta-lactamases in the genome. Here we present a method to rapidly sequence, assemble and undertake primary annotation of the bla genes carrying plasmid that are associated with our environmental E. coli ST131. The use of Oxford Nanopore long-read sequencing has permitted accurate de novo assembly and has helped further resolve the AMR genes location, composition, order, function and putative mechanism of transposition. Such assembly has been previously unachievable using our existing short-read sequence data set.
Séverine Rangama is currently a PhD student at the School of Life Sciences, University of Warwick. Her research aims to develop an increased understanding of beta-lactam resistance gene expression and to elucidate the secretion of the enzyme via the SecA pathway.
Obtaining high quality DNA from plant tissues for nanopore sequencing
Setting up your first few nanopore sequencing runs are super exciting but far too often they end with disappointing yields for those of us who work with plants and other more challenging sample types. Frequently, the DNA extracted using standard kits and common solvent-based techniques are contaminated with substances that are incompatible with nanopore sequencing. I will share with you some methods I have found useful to prepare high quality DNA from plants, how to recognise what contaminants are present and some methods to get rid of them.
Stella is currently a platform coordinator at a small, not-for-profit sequencing centre and NGS training facility at Deakin Genomics Centre in Melbourne, Australia. She has been technical specialist at Deakin University for over a decade, providing training and support to undergraduate and post graduate students in a broad range of molecular and cell biology techniques. Stella completed her undergraduate degree in Science at Latrobe University and PhD at the University of Sydney, Australia. She has extensive experience working with difficult specimens such as formalin-fixed museum specimens, plants, insects and soil.
Targeted nanopore sequencing with Cas9 for studies of methylation, structural variants and mutations
There is an existing need for clinical tools that can be used to rapidly assess genomic variants and epigenetic changes at medically relevant genes. We have been using the CRISPR/Cas9 system for target-enrichment nanopore sequencing. We show the ability of this method to generate greater than 200X average coverage at 10 genomic loci (mean size 18kb) with a single MinION flow cell. We demonstrate that this high coverage data enables us to (1) profile DNA methylation patterns at cancer driver genes, (2) detect structural variations at known hot spots, and (3) survey for the presence of single nucleotide mutations. We demonstrate applications of this technique by examining the well-characterized GM12878 cell line as well as three breast cell lines (MCF-10A, MCF-7, MDA-MB-231) with varying tumorigenic potential as a model for cancer.
Timothy Gilpatrick is a current MD/PhD student at Johns Hopkins University in Baltimore, USA. He is doing his PhD work in the lab of Winston Timp, where his studies have centred on the use of nanopore sequencing to study cancer epigenetics and structural variation. He received his BSc in Biochemistry from the University of Delaware, working in a protein structure lab to characterize the role of lipoprotein-associated enzymes in atherosclerosis. Prior to starting his graduate studies, he worked as a research fellow at the National Institutes of Health, examining how microRNAs regulate histone modifications in embryonic stem cells.
The fever tree: extracting and preparing the DNA of Cinchona pubescens
Nanopore sequencing of the CYP2D6 pharmacogene
The accurate genotyping of CYP2D6 is hindered by the very polymorphic nature of the gene, high homology with its pseudogene CYP2D7, and the occurrence of structural variations. Using the GridION nanopore sequencer, we sequenced 32 samples covering various haplotypes of CYP2D6, including four samples with gene duplication, over two sequencing runs. The haplotypes of 26 samples could be matched accurately to known alleles or subvariants, while the remaining 6 samples had either novel variants or variant patterns not matched to the current PharmVar CYP2D6 haplotype database. Small insertions/deletions associated with several key haplotypes were detected accurately, and five novel variants not yet catalogues in PharmVar were reported. Allele duplication could be determined by analyzing the allelic balance between the sample haplotypes. Nanopore sequencing of CYP2D6 offers a high throughput method for genotyping, accurate haplotyping, and detection of new variants and duplicated alleles.
Yusmiati is currently a PhD Candidate at the Gene Structure and Function Lab in the Department of Pathology and Biomedical Science at the University of Otago, Christchurch. Her research focus includes application of nanopore sequencing and other sequencing methods in pharmacogenetics and adverse drug reaction. She holds a master’s degree in biomedical science from the University of Hasanuddin in Indonesia and has previously worked in a molecular diagnostic Lab in Jakarta, Indonesia.