Compressing Oxford Nanopore signal into CRAM
Watch the video
Ewan Birney is Director of EMBL-EBI with Dr Rolf Apweiler and runs a small research group. He is also EMBL-EBI's Joint Head of Research, alongside Dr Nick Goldman. Ewan completed his PhD at the Wellcome Sanger Institute with Richard Durbin. In 2000, he became Head of Nucleotide data at EMBL-EBI and in 2012 he took on the role of Associate Director at the institute. He became Director of EMBL-EBI in 2015. Ewan led the analysis of the Human Genome gene set, mouse and chicken genomes and the ENCODE project, focusing on non-coding elements of the human genome. Ewan’s main areas of research include functional genomics, DNA algorithms, statistical methods to analyse genomic information (in particular information associated with individual differences in humans and Medaka fish) and use of images for chromatin structure. Ewan is a non-executive Director of Genomics England, and a consultant and advisor to a number of companies, including Oxford Nanopore Technologies. Ewan was elected an EMBO member in 2012, a Fellow of the Royal Society in 2014 and a Fellow of the Academy of Medical Sciences in 2015. He has received a number of awards including the 2003 Francis Crick Award from the Royal Society, the 2005 Overton Prize from the International Society for Computational Biology and the 2005 Benjamin Franklin Award for contributions in Open Source Bioinformatics.
SquiggleKit: a toolkit for manipulating nanopore signal data
Watch the video
The management of raw nanopore sequencing data poses a challenge that must be overcome to accelerate the development of new bioinformatics algorithms predicated on signal analysis. SquiggleKit is a toolkit for manipulating and interrogating nanopore data that simplifies file handling, data extraction, visualisation, and signal processing. Its modular tools can be used to reduce file numbers and memory footprint, identify poly-A tails, target barcodes, adapters, and find nucleotide sequence motifs in raw nanopore signal, amongst other applications. SquiggleKit serves as a bioinformatics portal into signal space, for novice and experienced users alike. It is comprehensively documented, simple to use, cross-platform compatible and freely available (https://github.com/Psy-Fer/SquiggleKit).
James Ferguson is a Genomic Systems Analyst in the Genomic Technologies Group at the Kinghorn Centre for Clinical Genomics, located at the Garvan institute of Medical Research in Sydney, Australia. With a background in clinical pathology testing, algorithm development, and computer hacking, James applies his unique skill set to develop new bioinformatic tools, as well as design and support nanopore sequencing infrastructure.
Tapestry
More info
Assemblies of small eukaryotic genomes using long-reads are often close to complete. However, these assemblies remain difficult to validate, especially when genomes have complex features such as large inversions, translocations, ploidy variations, and where chromosome number may not be known. While many tools for assessing assemblies with short-reads exist, long-reads have far greater power for confirming the accuracy and completeness of contigs. I will present Tapestry, a tool for validating the contigs of a small assembly automatically and visualising the contigs so the structure of the assembly can be refined before polishing. I will show how Tapestry has helped us to resolve the complex genomes of several small eukaryotes.
John Davey is a bioinformatician at the University of York, working in the Department of Biology Technology Facility. He received his PhD from the University of Edinburgh and then worked with Mark Blaxter and Edinburgh Genomics during the development of Illumina sequencing, developing methods for analysing Restriction-site Associated DNA (RAD) Sequencing data, among many other things. He then held a fellowship at the University of Cambridge, working with Chris Jiggins on speciation of Heliconius butterflies, completing a chromosomal genome assembly of H. melpomene. He now works on a wide range of genomes and metagenomes at York, mostly trying to figure out how to turn raw nanopore sequence into completed genome assemblies.
PSI-Sigma: a comprehensive splicing-detection method for short-read and long-read RNA-seq analysis
Watch the video
Percent Spliced-In (PSI) values are commonly used to report alternative pre-mRNA splicing (AS) changes. Previous PSI-detection tools were limited to specific AS events and were evaluated by in silico RNA-seq data. We developed PSI-Sigma, which uses a new PSI index, and we employed actual (non-simulated) RNA-seq data from spliced synthetic genes (RNA Sequins) to benchmark its performance (i.e., precision, recall, false positive rate, and correlation) in comparison with three leading tools (rMATS, SUPPA2, and Whippet). PSI-Sigma outperformed these tools, especially in the case of AS events with multiple alternative exons and intron-retention events. We also briefly evaluated its performance in long-read RNA-seq analysis, by sequencing a mixture of human RNAs and RNA Sequins with nanopore long-read sequencers. Based on the long-read RNA-seq data of RNA sequins, we found that nanopore long-read RNA-seq is qualitatively reliable. Also, in human U87 cells, we found that ~1 million long reads can already detect major AS changes in ~3,500 protein-coding genes with at least 10 supporting long reads. PSI-Sigma is implemented in Perl and is available at https://github.com/wososa/PSI-Sigma
Kuan-Ting Lin is a Computational post-doc at Cold Spring Harbour Laboratory, where he focusses on quantitative biology and transcriptomics technologies. His long-term research interests involve the use of mathematical, statistical or computational techniques to develop understanding of how alterations in RNA transcription contribute to human health and his academic training and research experience has provided an excellent background in drug discovery, data mining and quantitative biology.
Accelerated de novo assembly on GPUs
Watch the video
Recent years has seen an uptake in the use of GPUs for Genomics, from basecalling (e.g. Guppy) to variant calling (e.g. Deep Variant). Long-read sequencing technology such as Oxford Nanopore sequencing holds the promise of simple and cost-effective de novo assembly. This is important for generating reference sequences (even for complex, polyploid organisms) and identifying structural variants such as deletions and translocations. One of the difficulties however of high-quality de novo assembly is its substantial computational cost. Post-sequencing assembly can take longer than the sequencing experiment itself. The Nvidia genomics team is harnessing the power of GPUs to develop a pipeline for massive acceleration of de novo assembly. Our end goal is real-time long-read de novo assembly.
Mike Vella is a Senior Deep Learning and Genomics Engineer at NVIDIA corporation. Mike works on using GPUs to help researchers with the analysis of high-throughput sequencing data. Mike has a PhD in Computational Neuroscience from the University of Cambridge and an undergraduate degree in Physics from the University of Bristol.
It is all about accessibility: Galaxy as a framework for democratizing Oxford Nanopore data analysis
More info
Thanks to Oxford Nanopore Technologies, long-read sequencing is becoming more accessible for a much broader range of applications and end-users. Bioinformatics analysis was already a bottle-neck with the previous generations of sequencing technologies, but even more so with the new generations. Nanopore-based sequencing technologies are so much more accessible and can rapidly produce so much more data that the data analysis challenges can become fundamental. Community-driven solutions to democratize data analysis is crucial in the same way Oxford Nanopore is democratizing sequencing. Galaxy has been shown to be a successful option for short-read sequencing, but we think its advantages will shine even more in the era of long-read sequencing. Firstly, the user-friendly web interface does not require advanced computational skills, making it ideally suited for this interdisciplinary area and educational purposes. Secondly, the software and workflows can be seamlessly upgraded at the server side while maintaining 100% reproducibility of the performed analysis. Thirdly, the computational infrastructure supports a diverse spectrum from personal computers to cluster grids and the cloud. Within the scope of this project, we provide Oxford Nanopore-related tools in Galaxy. We have developed a collection of the best practice workflows for genome assembly within Galaxy. Our work is available for everyone at the European Galaxy server (https://usegalaxy.eu) and supportive self-learning training material is available. I will also introduce the Street Science Community (https://streetscience.community), a voluntary-based non-profit group that aims to teach the public the fundamental concepts of molecular biology and genetics data analysis by analyzing the “DNA of beer” using MinION and Galaxy.
Milad Miladi is PhD candidate and research assistant in the Bioinformatics group at the University of Freiburg in Germany. With a background in computer science and RNA computational biology, his research involves transcriptomics, non-coding RNAs and reproducible data analysis with Galaxy.
Don’t let data management be your bottleneck
More info
Once you can generate terabytes of data from a single flowcell, simply moving and storing that data can become the bottleneck for your workflow. We present some recent and up-coming MinKNOW features designed to help with your data management challenges. Find out how to trigger analyses automatically and hear some rules of thumb to help you plan what kit you need. We will also give you some insight into how Oxford Nanopore has scaled out data management to hundreds of devices.
Richard Carter has been part of the Informatics team at Oxford Nanopore since 2010. He has worked on a diverse range of applications including bioinformatics, customer facing software solutions through to his current role leading a team that analyse performance metrics for Oxford Nanopore devices. He has been involved in next generation sequencing for over ten years, prior to which he worked in bioinformatics and structural biology.
Rapidly mapping raw nanopore signal with UNCALLED to enable real-time targeted sequencing
Watch the video
UNCALLED is a tool that maps raw nanopore reads to large DNA references as they are being sequenced. It is a streaming algorithm, meaning the mapping begins as soon as the first bit of signal comes from the sequencer. UNCALLED can currently map reads from all active MinION pores to a 31Mbp reference containing eight bacterial genomes after less than three seconds of sequencing and analysis per read. The main application for UNCALLED is ReadUntil sequencing, where reads can be ejected from the pore depending on whether or not they map to the reference.
Sam Kovaka is a third year PhD student in computer science at Johns Hopkins University, co-advised by Michael Schatz and Mihaela Pertea. He attended Clark University for his undergraduate degree, majoring in biology and computer science. Sam started working with nanopore sequencing in the first year of his PhD with a class project that turned into the work that he will be presenting at London Calling 2019.
Don’t let data management be your bottleneck
Watch the video
Once you can generate terabytes of data from a single flowcell, simply moving and storing that data can become the bottleneck for your workflow. We present some recent and up-coming MinKNOW features designed to help with your data management challenges. Find out how to trigger analyses automatically and hear some rules of thumb to help you plan what kit you need. We will also give you some insight into how Oxford Nanopore has scaled out data management to hundreds of devices.
Stephen is Associate Director of Data Engineering at Oxford Nanopore Technologies. He is the principal architect of Oxford Nanopore’s automated mirroring, analysis and archiving system as well as coordinating the development of various applications supporting the Research and Development groups. Previously, Stephen worked in climate science and cheminformatics where he has developed many systems supporting UK, European and International research, including a major role in the ESGF architecture for sharing climate model outputs across the globe.
Applied bioinformatics: from basic QC to Epi2ME
More info
The Bioinformatics Resource section of the Nanopore Community is an evolving repository of data analysis tutorials which aim to deliver best-practise workflows for researchers to explore their own data. The installation of software dependencies is managed through bioconda and the data analysis is orchestrated using the Snakemake workflow management system. The R Markdown package is used to merge bioinformatics code and nanopore data into a reproducible and literate document. The tutorials are packaged with example data, and the complete tutorial code is placed on our Github pages. In this data analysis session, I will introduce our tutorials and will demonstrate how they can be used by laboratory researchers developing their bioinformatics skills. Tutorials cover topics such as the basic QC of individual flowcell runs, mapping reads to a reference genome and the quantitative analysis of transcriptome data. A more technical dissection of a tutorial should illustrate how the tutorials can be modified and customised to your needs. We would welcome your feedback and requests for future tutorial topics during the Q&A at the end of the presentation.
Stephen Rudd joined the Product management team last year having previously been the Strategic Account Manager at Oxford Nanopore Technologies for customers in Germany and Austria. Stephen is a classical geneticist and has a background in genome bioinformatics. He has been project manager for a taxonomically diverse range of genome studies utilising most DNA sequencing and genotyping technologies. He is looking forward to brainstorming potential solutions to challenging problems and to learning more about different research horizons.