SquiggleKit: a toolkit for manipulating nanopore signal data
The management of raw nanopore sequencing data poses a challenge that must be overcome to accelerate the development of new bioinformatics algorithms predicated on signal analysis. SquiggleKit is a toolkit for manipulating and interrogating nanopore data that simplifies file handling, data extraction, visualisation, and signal processing. Its modular tools can be used to reduce file numbers and memory footprint, identify poly-A tails, target barcodes, adapters, and find nucleotide sequence motifs in raw nanopore signal, amongst other applications. SquiggleKit serves as a bioinformatics portal into signal space, for novice and experienced users alike. It is comprehensively documented, simple to use, cross-platform compatible and freely available (https://github.com/Psy-Fer/SquiggleKit).
James Ferguson is a Genomic Systems Analyst in the Genomic Technologies Group at the Kinghorn Centre for Clinical Genomics, located at the Garvan institute of Medical Research in Sydney, Australia. With a background in clinical pathology testing, algorithm development, and computer hacking, James applies his unique skill set to develop new bioinformatic tools, as well as design and support nanopore sequencing infrastructure.
Assemblies of small eukaryotic genomes using long-reads are often close to complete. However, these assemblies remain difficult to validate, especially when genomes have complex features such as large inversions, translocations, ploidy variations, and where chromosome number may not be known. While many tools for assessing assemblies with short-reads exist, long-reads have far greater power for confirming the accuracy and completeness of contigs. I will present Tapestry, a tool for validating the contigs of a small assembly automatically and visualising the contigs so the structure of the assembly can be refined before polishing. I will show how Tapestry has helped us to resolve the complex genomes of several small eukaryotes.
John Davey is a bioinformatician at the University of York, working in the Department of Biology Technology Facility. He received his PhD from the University of Edinburgh and then worked with Mark Blaxter and Edinburgh Genomics during the development of Illumina sequencing, developing methods for analysing Restriction-site Associated DNA (RAD) Sequencing data, among many other things. He then held a fellowship at the University of Cambridge, working with Chris Jiggins on speciation of Heliconius butterflies, completing a chromosomal genome assembly of H. melpomene. He now works on a wide range of genomes and metagenomes at York, mostly trying to figure out how to turn raw nanopore sequence into completed genome assemblies.
PSI-Sigma: a comprehensive splicing-detection method for short-read and long-read RNA-seq analysis
Percent Spliced-In (PSI) values are commonly used to report alternative pre-mRNA splicing (AS) changes. Previous PSI-detection tools were limited to specific AS events and were evaluated by in silico RNA-seq data. We developed PSI-Sigma, which uses a new PSI index, and we employed actual (non-simulated) RNA-seq data from spliced synthetic genes (RNA Sequins) to benchmark its performance (i.e., precision, recall, false positive rate, and correlation) in comparison with three leading tools (rMATS, SUPPA2, and Whippet). PSI-Sigma outperformed these tools, especially in the case of AS events with multiple alternative exons and intron-retention events. We also briefly evaluated its performance in long-read RNA-seq analysis, by sequencing a mixture of human RNAs and RNA Sequins with nanopore long-read sequencers. Based on the long-read RNA-seq data of RNA sequins, we found that nanopore long-read RNA-seq is qualitatively reliable. Also, in human U87 cells, we found that ~1 million long reads can already detect major AS changes in ~3,500 protein-coding genes with at least 10 supporting long reads. PSI-Sigma is implemented in Perl and is available at https://github.com/wososa/PSI-Sigma
Kuan-Ting Lin is a Computational post-doc at Cold Spring Harbour Laboratory, where he focusses on quantitative biology and transcriptomics technologies. His long-term research interests involve the use of mathematical, statistical or computational techniques to develop understanding of how alterations in RNA transcription contribute to human health and his academic training and research experience has provided an excellent background in drug discovery, data mining and quantitative biology.
It is all about accessibility: Galaxy as a framework for democratizing Oxford Nanopore data analysis
Thanks to Oxford Nanopore Technologies, long-read sequencing is becoming more accessible for a much broader range of applications and end-users. Bioinformatics analysis was already a bottle-neck with the previous generations of sequencing technologies, but even more so with the new generations. Nanopore-based sequencing technologies are so much more accessible and can rapidly produce so much more data that the data analysis challenges can become fundamental. Community-driven solutions to democratize data analysis is crucial in the same way Oxford Nanopore is democratizing sequencing. Galaxy has been shown to be a successful option for short-read sequencing, but we think its advantages will shine even more in the era of long-read sequencing. Firstly, the user-friendly web interface does not require advanced computational skills, making it ideally suited for this interdisciplinary area and educational purposes. Secondly, the software and workflows can be seamlessly upgraded at the server side while maintaining 100% reproducibility of the performed analysis. Thirdly, the computational infrastructure supports a diverse spectrum from personal computers to cluster grids and the cloud. Within the scope of this project, we provide Oxford Nanopore-related tools in Galaxy. We have developed a collection of the best practice workflows for genome assembly within Galaxy. Our work is available for everyone at the European Galaxy server (https://usegalaxy.eu) and supportive self-learning training material is available. I will also introduce the Street Science Community (https://streetscience.community), a voluntary-based non-profit group that aims to teach the public the fundamental concepts of molecular biology and genetics data analysis by analyzing the “DNA of beer” using MinION and Galaxy.
Milad Miladi is PhD candidate and research assistant in the Bioinformatics group at the University of Freiburg in Germany. With a background in computer science and RNA computational biology, his research involves transcriptomics, non-coding RNAs and reproducible data analysis with Galaxy.
Applied bioinformatics: from basic QC to Epi2ME
Stephen Rudd joined the Product management team last year having previously been the Strategic Account Manager at Oxford Nanopore Technologies for customers in Germany and Austria. Stephen is a classical geneticist and has a background in genome bioinformatics. He has been project manager for a taxonomically diverse range of genome studies utilising most DNA sequencing and genotyping technologies. He is looking forward to brainstorming potential solutions to challenging problems and to learning more about different research horizons.