Variation in transcript structure via RNA splicing and differences in the 5’ and 3’ untranslated regions is a key feature of gene regulation. Disruption of transcript structure is one of the primary functional changes behind a large proportion of disease variants, both common and rare. These discoveries have so far been partly based on analysis of RNA-sequencing data of 50-100 bp reads, which cannot directly measure the real biological unit of the transcriptome – transcripts. The advent of long read technologies for RNA-seq has the potential to transform transcriptome analysis, since it can directly measure full-length isoforms. In this study, we created a large long-read transcriptome sequencing data set across diverse samples of the GTEx project using Oxford Nanopore technology and developed a new computational approach for investigating the effects of genetic variants on the transcriptome. We generated cDNA sequencing data using the PCR-based protocol from 88 samples across 15 different tissues, for which we also had access to Illumina RNA-seq data and 73 had whole genome sequencing data. We identified novel transcripts and showed how transcript quantifications vary across tissues. We developed pipelines to perform allele specific expression (ASE) and allele specific transcript structure (ASTS) analyses genome wide. In ASTS analysis we test for differences in transcripts originating from each haplotype of a sample by splitting the reads according to the haplotype of a heterozygous variant and determining to which transcript that read had been assigned to – an analysis that is not informative with short-read Illumina data. In order to do that we first devised a new mapping strategy to mediate the high reference bias observed in Oxford Nanopore data. Across all the samples, 14,212 genes showed allelic expression in at least one sample, of which 1,230 fulfilled the requirements for ASTS analysis. We found 187 genes that had significant allelic differences in their transcript distributions in at least one sample, only half of which also showed ASE. ASTS events capture splicing quantitative trait loci that have been mapped as part of the GTEx consortium, verifying that it reflects true genetic effects on splicing. We also discovered rare splice-disrupting variants using ASTS data. Finally, we knocked-down PTBP1, an RNA binding protein that mediates splicing, using siRNA in fibroblast cell lines generated from five donors. We demonstrate the successful ablation of ASTS in some cases, indicating that genetic effects on splicing can be modified by the cellular state. Altogether, our results provide evidence of the widespread nature of genetically driven allelic differences in transcript structure, and the power of long-read data and careful computational approaches to study it in human population samples.
Dr. Dafni Glinos is a Postdoctoral Researcher in the Lappalainen lab in the New York Genome Center and Columbia University. She is interested in the contribution of coding and non-coding variants on the molecular mechanisms which define human traits, with a focus on diseases. She is leveraging allelic data from the human transcriptome to quantify variation within individuals. Dafni obtained her PhD at the Wellcome Sanger Institute and University of Cambridge, under the supervision of Gosia Trynka. Her research focusses on the human gene regulatory landscape of T cells, studying the impact of non-coding variants on immune processes.
Human translational research
Medulloblastoma (MB) can be classified into four molecular subgroups (WNT group, SHH group, group 3, and group 4). The gold standard of assignment of molecular subgroup through DNA methylation profiling uses Illumina EPIC array. However, this tool has some limitations in terms of cost and timing, in order to get the results soon enough for clinical use. We present an alternative DNA methylation assay based on nanopore sequencing efficient for rapid, cheaper, and reliable subgrouping of clinical MB samples. Low-depth whole genome with long-read single-molecule nanopore sequencing was used to simultaneously assess copy number profile and MB subgrouping based on DNA methylation. The DNA methylation data generated by nanopore sequencing were compared to a publicly available reference cohort comprising over 2,800 brain tumors, including the four subgroups of MB (Capper et al. Nature, 2018), to generate a score that estimates a confidence with a tumor group assignment. Among the 24 MB analyzed with nanopore sequencing (six WNT, nine SHH, five group 3, and four group 4), all of them were classified in the appropriate subgroup established by expression-based Nanostring subgrouping. In addition to the subgrouping, we also examine the genomic profile. Furthermore, all previously identified clinically-relevant genomic rearrangements (mostly MYC and MYCN amplifications) were also detected with our assay. In conclusion, we are confirming the full reliability of nanopore sequencing as a novel, rapid, and cheap assay for methylation-based MB subgrouping. We now plan to implement this technology to other embryonal tumors of the central nervous system.
Julien Masliah-Planchon has worked as a practician in oncogenetics at the Institut Curie in Paris, France since 2018. Julien completed his PhD in Oncogenetics at the Université Paris-Saclay, and worked as an assistant in Oncogenetics at the Institut Curie between 2013 and 2018.
Using customized protocols, we were able to produce molecular karyotypes from cell-free DNA (cfDNA) of 4 healthy subjects and 7 cancer patients, obtaining 18-32M raw reads from a single Oxford Nanopore run. To date, this is the first successful attempt to obtain a high-resolution Copy-Number Variation profile from cfDNA using Oxford Nanopore technology, as the yield of the runs reported in previous studies was insufficient. For 4 patients, Illumina sequencing has been performed and the results highly correlate to Oxford Nanopore results (R = 0.96 – 0.99, p << 0.001), with concordant log2ratio values in 97-99% of genome positions.
Filippo is a bioinformatician and molecular biologist from ISPRO in Florence, Italy. During his master’s degree in Medical and Pharmaceutical Biotechnologies at University of Florence, and his internship at IRST in Meldola, Italy, he gained experience in molecular biology and circulating cell-free DNA. He is currently completing his PhD in Genetics, Oncology and Clinical Medicine at University of Siena in Italy studying mobile elements, DNA/RNA editing and copy-number variations in cancer, with computational approaches.
Dan Turner is Vice President, Applications at Oxford Nanopore Technologies. He provides leadership for multi-disciplinary teams in Oxford, New York and San Francisco. The Applications group aims to bring together sample prep technologies, genomics applications and bioinformatics, to expand the utility of Oxford Nanopore Technologies devices and illustrate the benefits of these technologies to the wider world. The team is also responsible for providing Field Applications Support. Before joining Oxford Nanopore Technologies, Dan was Head of Sequencing Technology Development at the Wellcome Trust Sanger Institute, and prior to this he held postdoctoral positions at the Sanger Institute and Cornell University Medical College in Manhattan.
Update from Oxford Nanopore Technologies
Fera Science Limited is the national reference laboratory for plant health in England and Wales, providing scientific advice and services to Defra, other government bodies, and commercial customers. It is responsible for providing diagnostic services to the Plant Health and Seed Inspectorate who work at UK ports and airports protecting UK agriculture and the natural environment from the introduction of invasive plant pests and diseases. For the last 10 years these services have included an increasing use of high throughput sequencing for environmental monitoring and disease diagnosis. Due to the high cost of the existing platforms, samples need to be batched (24 to over 100 samples per batch) to make the process viable leading to regular runs of particular samples types and waits of greater than a month for all but the highest priority samples. The cost and scalability of the MinION with the Flongle adapter opens up the potential to run single or small batches of samples as soon as they are requested. This concept is being tested in a number of case study trials. Much of our work involves correctly identifying insects from larvae intercepted on imported fruit. This is carried out using morphological examination and confirmed by DNA barcoding, with the sequencing being carried out by offsite commercial suppliers using capillary sequencers. In a trial with 8 samples, within an hour the Flongle had produced enough sequence to correctly and accurately identify the insect species present. A number of problematic samples with the target infected with a second insect (parasitoid or mite) were also tested. This approach has been extended and used to assess the invertebrate populations of sticky traps looking for invasive insect pests using metabarcoding. The Flongle has also been successfully used to identify novel plant viruses in import samples (direct RNA sequencing), identify the species of a disease-causing bacteria (whole genome sequencing) and confirm infections with phytoplasma, which are obligate bacterial parasites of plants (amplicon sequencing). Finally, using the Flongle we have also been able to show that we can determine the diet of spittle bugs (metabarcoding), possible vectors of Xyella fastidiosa, a bacterial pest currently causing serious damage across Europe. Knowledge of the vectors and their hosts will be important if this disease comes to the UK. The MinION with the option of the Flongle is showing great promise delivering rapid, scalable, useful data and will soon become a standard tool defending the UK from invasive plant pests and diseases.
Dr. Ian Adams is a researcher at Fera Science Ltd, previously a government agency, but now a private company part owned by the UK Department for Environment, Food & Rural Affairs. He develops molecular diagnostics for the detection of invasive pests and diseases on plants and related produce in support of UK biosecurity. For the last ten years much of the focus of this work has been on the use of sequencing technology to allow rapid, untargeted detection of plant pathogens and these techniques are now regularly deployed to protect UK agriculture and biodiversity.
Variation in transcript structure via RNA splicing and differences in the 5’ and 3’ untranslated regions is a key feature of gene regulation. Disruption of transcript structure is one of the primary functional changes behind a large proportion of disease variants, both common and rare. These discoveries have so far been partly based on analysis of RNA-sequencing data of 50-100 bp reads, which cannot directly measure the real biological unit of the transcriptome – transcripts. The advent of long read technologies for RNA-seq has the potential to transform transcriptome analysis, since it can directly measure full-length isoforms. In this study, we created a large long-read transcriptome sequencing data set across diverse samples of the GTEx project using Oxford Nanopore technology and developed a new computational approach for investigating the effects of genetic variants on the transcriptome. We generated cDNA sequencing data using the PCR-based protocol from 88 samples across 15 different tissues, for which we also had access to Illumina RNA-seq data and 73 had whole genome sequencing data. We identified novel transcripts and showed how transcript quantifications vary across tissues. We developed pipelines to perform allele specific expression (ASE) and allele specific transcript structure (ASTS) analyses genome wide. In ASTS analysis we test for differences in transcripts originating from each haplotype of a sample by splitting the reads according to the haplotype of a heterozygous variant and determining to which transcript that read had been assigned to – an analysis that is not informative with short-read Illumina data. In order to do that we first devised a new mapping strategy to mediate the high reference bias observed in Oxford Nanopore data. Across all the samples, 14,212 genes showed allelic expression in at least one sample, of which 1,230 fulfilled the requirements for ASTS analysis. We found 187 genes that had significant allelic differences in their transcript distributions in at least one sample, only half of which also showed ASE. ASTS events capture splicing quantitative trait loci that have been mapped as part of the GTEx consortium, verifying that it reflects true genetic effects on splicing. We also discovered rare splice-disrupting variants using ASTS data. Finally, we knocked-down PTBP1, an RNA binding protein that mediates splicing, using siRNA in fibroblast cell lines generated from five donors. We demonstrate the successful ablation of ASTS in some cases, indicating that genetic effects on splicing can be modified by the cellular state. Altogether, our results provide evidence of the widespread nature of genetically driven allelic differences in transcript structure, and the power of long-read data and careful computational approaches to study it in human population samples.
Dr. Dafni Glinos is a Postdoctoral Researcher in the Lappalainen lab in the New York Genome Center and Columbia University. She is interested in the contribution of coding and non-coding variants on the molecular mechanisms which define human traits, with a focus on diseases. She is leveraging allelic data from the human transcriptome to quantify variation within individuals. Dafni obtained her PhD at the Wellcome Sanger Institute and University of Cambridge, under the supervision of Gosia Trynka. Her research focused on the human gene regulatory landscape of T cells, studying the impact of non-coding variants on immune processes.
The African Orphan Crops Consortium (AOCC) is a global partnership promoting strategic, genome-enabled improvement of under-researched crops for biodiversity-based nutritious food solutions in Africa. We present current status, opportunities and examples successes of AOCC. Orphan crops like Lablab (Lablab purpureus; ~452 Mbp), African Yam bean (Sphenostylis stenocarpa; ~800 Mbp) and Moringa (Moringa oleifera; ~315 Mbp), are often of high nutritive value and are climate resilient. The Oxford Nanopore MinION was used to generate long reads of all three crops, to compliment and generate contiguous draft genome assemblies. Work is under way to improve these assemblies to chromosome-scale using Hi-C mapping.
Bernice Waweru has a background in plant breeding, biotechnology and bioinformatics through the Bioinformatics Community of Practice fellowship, conducted by the John Innes Center, Earlham Institute and BecA-ILRI hub, ILRI-Nairobi She worked at KALRO studying resistance to stem rust of wheat in collaboration with CIMMYT. Bernice now works on genomics and bioinformatics at BecA-ILRI Hub. She and her colleagues are working to develop the first African-led draft genome of the African Yam Bean, fully sequenced and analyzed in Africa.
Dr. Allen Van Deynze is the Director of the Seed Biotechnology Center and Associate Director of the Plant Breeding Center at University of California, Davis. He has a Ph.D. in plant breeding from University of Guelph, Canada. As part of the SBC’s mission to serve as a liaison between public institutions and seed industry, Allen is responsible for developing, coordinating and conducting research and generating and disseminating scientific and informational content for the Seed Biotechnology Center’s and Plant Breeding Center’s educational and outreach programs. His research focuses on developing and integrating genomics into plant breeding of California and African crops. He has programs on breeding for disease resistance and quality in pepper and spinach, and development and application of genomics in crops. With Dr. Kent Bradford he co-developed and is organizer for the Plant Breeding Academy and past chair of the US Plant Breeding Coordinating Committee. He has been involved in International and National policy including US Regulations for Biotechnology. He is an instructor for the African Plant Breeding Academy and Scientific Director for the African Orphan Crops Consortium.
Plant genomics
Dan Turner is Vice President, Applications at Oxford Nanopore Technologies. He provides leadership for multi-disciplinary teams in Oxford, New York and San Francisco. The Applications group aims to bring together sample prep technologies, genomics applications and bioinformatics, to expand the utility of Oxford Nanopore Technologies devices and illustrate the benefits of these technologies to the wider world. The team is also responsible for providing Field Applications Support. Before joining Oxford Nanopore Technologies, Dan was Head of Sequencing Technology Development at the Wellcome Trust Sanger Institute, and prior to this he held postdoctoral positions at the Sanger Institute and Cornell University Medical College in Manhattan.
Update from Oxford Nanopore Technologies
Dr. Benedict Paten is an assistant professor in the department of Biomolecular Engineering at the University of California Santa Cruz (UCSC) and an associate director of the UCSC Genomics Institute. He directs the Computational Genomics Lab at UCSC, which is broadly focused on computational genomics, creating algorithms, software and services addressing biomolecular challenges. He has a PhD from the University of Cambridge and the European Molecular Biology Laboratory in computational biology.
Acute leukemia is an aggressive malignancy of the bone marrow characterized by the accumulation of immature cells defective in their maturation and function. Recurrent translocations of the 11q23 (Mixed Lineage Leukemia-MLL) locus are found in acute myeloid and lymphoblastic leukemia (ALL). Such translocations can arise de novo or due to prior chemotherapy and can involve over 85 partner genes. The detection and study of these translocations, and other genetic changes occurring in such forms of leukemia, can be difficult using current technology. Nanopore sequencing overcomes many of the limitations, as it is possible to generate sequences of up to an average of 6kb, for a single molecule of DNA or RNA, without the need for amplification. We identified leukemia samples from the Princess Margaret Leukemia Tissue Bank with known MLL translocations. Samples at diagnosis and relapse were identified. Whole genome sequencing was performed using Oxford Nanopore PromethION on each sample aiming for 30x coverage. Alignments were performed with minimap2; structural variants were identified with Sniffles; and methylation calling was characterized with Nanopolish. Differentially methylated regions (DMRs) were identified using dispersion shrinkage for sequencing data (DSS). Structural variants (SVs) including MLL were readily identified. Previous samples that had been FISH positive, but the partner fusion had not been identified using cytogenetics, were readily observed using nanopore sequencing. One sample was found to involve the 11q23 locus but not MLL highlighting the inaccuracy of cytogenetics. Additional SVs were identified at relapse, readily identifying clonal evolution. Methylation changes were quantifiable and identified several DMRs for each sample. When compared to established methylation analysis which was performed on the TCGA cohort, our samples readily clustered into the AML cohort demonstrating that nanopore technology can be used to assess methylation in acute leukemia. Nanopore sequencing can be used in clinical acute leukemia samples to identify complex structural variants. In addition, DMRs, can be analyzed concurrently. We plan to extend this study to clarify the impact of methylation changes on treatment response, prognosis, and the SV and methylation changes at relapse.
Tracy received her medical degree in University College Cork in Ireland and then completed general medical training in Cork. She then moved to Cambridge in the UK to complete her clinical and laboratory haematology training based in Addenbrookes Hospital. She obtained her FRCPath in 2004. On completion of training, she started her leukemia fellowship in Princess Margaret Cancer Centre in Toronto. She has been on staff as a clinician investigator since 2018.
Genomic Epidemiology panel
Human transcriptomics
Cancer research
The hummingbird occupies a unique place in the vertebrate world. It has the highest known metabolic rate, needed to fuel incredible energetic demands of hovering flight the bird performs daily to collect nectar from flowers. Understanding the molecular basis of such extreme physiology will provide foundational knowledge to enable rational engineering of metabolic circuits in mammalian cells. To explore how the hummingbird is able to accomplish these incredible feats, we used a combination of nanopore, and Illumina sequencing methods set out to characterize the genome and transcriptome of fasted and fed hummingbirds.
Ariel Gershman is a 2nd year graduate student in the Biochemistry, Cellular and Molecular Biology program at Johns Hopkins. She is in the Timp lab where she focuses on using long-read sequencing for genome and transcriptome assembly. She has worked on generating reference genomes for the Ruby throated hummingbird (Archilochus colubris) and Tobacco Hornworm moth (Manduca sexta). She is also interested in exploring the epigenome using nanopore sequencing in hard to assemble areas, contributing to methylation analysis in the Telomere-to-Telomere consortium.
Acute leukemia is an aggressive malignancy of the bone marrow characterized by the accumulation of immature cells defective in their maturation and function. Recurrent translocations of the 11q23 (Mixed Lineage Leukemia-MLL) locus are found in acute myeloid and lymphoblastic leukemia (ALL). Such translocations can arise de novo or due to prior chemotherapy and can involve over 85 partner genes. The detection and study of these translocations, and other genetic changes occurring in such forms of leukemia, can be difficult using current technology. Nanopore sequencing overcomes many of the limitations, as it is possible to generate sequences of up to an average of 6kb, for a single molecule of DNA or RNA, without the need for amplification. We identified leukemia samples from the Princess Margaret Leukemia Tissue Bank with known MLL translocations. Samples at diagnosis and relapse were identified. Whole genome sequencing was performed using Oxford Nanopore PromethION on each sample aiming for 30x coverage. Alignments were performed with minimap2; structural variants were identified with Sniffles; and methylation calling was characterized with Nanopolish. Differentially methylated regions (DMRs) were identified using dispersion shrinkage for sequencing data (DSS). Structural variants (SVs) including MLL were readily identified. Previous samples that had been FISH positive, but the partner fusion had not been identified using cytogenetics, were readily observed using nanopore sequencing. One sample was found to involve the 11q23 locus but not MLL highlighting the inaccuracy of cytogenetics. Additional SVs were identified at relapse, readily identifying clonal evolution. Methylation changes were quantifiable and identified several DMRs for each sample. When compared to established methylation analysis which was performed on the TCGA cohort, our samples readily clustered into the AML cohort demonstrating that nanopore technology can be used to assess methylation in acute leukemia. Nanopore sequencing can be used in clinical acute leukemia samples to identify complex structural variants. In addition, DMRs, can be analyzed concurrently. We plan to extend this study to clarify the impact of methylation changes on treatment response, prognosis, and the SV and methylation changes at relapse.
Tracy received her medical degree in University College Cork in Ireland and then completed general medical training in Cork. She then moved to Cambridge in the UK to complete her clinical and laboratory haematology training based in Addenbrookes Hospital. She obtained her FRCPath in 2004. On completion of training, she started her leukemia fellowship in Princess Margaret Cancer Centre in Toronto. She has been on staff as a clinician investigator since 2018.
Dr. Benedict Paten is an assistant professor in the department of Biomolecular Engineering at the University of California Santa Cruz (UCSC) and an associate director of the UCSC Genomics Institute. He directs the Computational Genomics Lab at UCSC, which is broadly focused on computational genomics, creating algorithms, software and services addressing biomolecular challenges. He has a PhD from the University of Cambridge and the European Molecular Biology Laboratory in computational biology.
Techniques for plant genome characterisation
With genome sequencing no longer being the limiting factor, there is clear need for efficient tools that can assemble, compare and analyze pan-genomes comprising hundreds of individuals at a speed matching data production. At KeyGene, we combine state-of-the-art sequence analysis software with entirely novel algorithms to tackle the complexities of plant genomes, both at the individual level (heterozygosity, polyploidy, and high repetitiveness) and at population scale (high nucleotide diversity and large-scale structural variation). In this presentation we highlight our breakthrough algorithmic innovations in this field and provide a future perspective on specialized algorithms we are developing to integrate novel Oxford Nanopore-based technologies such as Pore-C within our pan-genomics platform.
Erwin Datema has a PhD in Bioinformatics and a strong background in plant biotechnology. Currently, he is a senior scientist in KeyGene’s Genome Informatics group and focuses on generating genome insights from DNA sequencing data. His core interest is in designing and developing efficient algorithms to solve challenging bioinformatics problems through complexity reduction. Outside working hours, Erwin is most commonly found petting, talking about, or watching videos of, cats.
Plant genome assembly has been developing rapidly with costs declining and scaffold size and genome coverage improving. However, with short-read technologies, underlying contig size remains limited and it is inevitable that some genomic regions will not be captured and duplicated, or repetitive regions are often collapsed. Concomitant with these improvements there is a growing appreciation that copy number variants, presence/absence variants and structural rearrangements have played an important role in the adaptation of phenotype. Long read sequencing technologies offer a unique opportunity to capture often elusive structural variation in genomes. To test applicability of this technology to complex crop genomes we have attempted de novo genome assemblies using Oxford Nanopore technologies in repeat rich diploids, a paleohexaploid and recent polyploid species. As an example, a genome assembly was generated for Brassica nigra, a paleohexaploid. This produced very large N50 contig length of 17.1 Mb (58 contigs) that were then error corrected and developed into pseudomolecules using HiC and genotype data. The Oxford Nanopore assembly extended the original reference assembly by 59 Mb, covering ~89% of the expected genome size. The majority (85%) of the additional assembled sequence represented repetitive DNA, yet ~3,500 additional genes were added to the new assembly. The contiguity and coverage allowed unprecedented access to low complexity regions of the genome. Pericentromeric regions and coincidence of hypo-methylation enabled localization of active centromeres and identified a novel centromere-associated ALE class I element which appears to have proliferated through relatively recent nested transposition events (<1 million years ago). Other example genome assemblies using similar approaches for the larger genomes in Lens and Poaceae species will also be presented.
Dr. Andrew Sharpe received his PhD at the University of East Anglia in 1997 and previously led research in plant genetics and genomics at both the National Research Council and Agriculture and Agri-Food in Saskatoon. He is co-lead of the Canadian Triticum Applied Genomics project that made a major contribution to the sequencing of the wheat genome. He was also involved in the sequencing of the canola genome. He recently established the Omics and Precision Agriculture Laboratory (OPAL) at GIFS Agriculture Laboratory (OPAL) at GIFS.
The hummingbird occupies a unique place in the vertebrate world. It has the highest known metabolic rate, needed to fuel incredible energetic demands of hovering flight the bird performs daily to collect nectar from flowers. Understanding the molecular basis of such extreme physiology will provide foundational knowledge to enable rational engineering of metabolic circuits in mammalian cells. To explore how the hummingbird is able to accomplish these incredible feats, we used a combination of nanopore, and Illumina sequencing methods set out to characterize the genome and transcriptome of fasted and fed hummingbirds.
Ariel Gershman is a 2nd year graduate student in the Biochemistry, Cellular and Molecular Biology program at Johns Hopkins. She is in the Timp lab where she focuses on using long-read sequencing for genome and transcriptome assembly. She has worked on generating reference genomes for the Ruby throated hummingbird (Archilochus colubris) and Tobacco Hornworm moth (Manduca sexta). She is also interested in exploring the epigenome using nanopore sequencing in hard to assemble areas, contributing to methylation analysis in the Telomere-to-Telomere consortium.
Live Lounge | Mini Theatre talks | Data Analysis Theatre talks | Secret Cinema | Oxford Nanopore Technologies posters | Delegate posters
Mini Theatre: Adam Ameur, Uppsala University, Sweden | Rebecca Richards, University of Auckland, New Zealand | Elizabeth Ross, The University of Queensland, Australia | Juan Lobaton Garces, University of New England, Australia Data Analysis Theatre: Hasindu Gamaarachchi, University of New South Wales & Garvan Institute of Medical Research, Australia | Ploy Pratanwanich, Genome Institute of Singapore, Singapore