Nanopore sequencing offers advantages in all areas of research. Our offering includes DNA sequencing, as well as RNA and gene expression analysis and future technology for analysing proteins.

Learn about applications
View all Applications
Resources Investors Careers News About Store Community Contact

Tombo: detection of non-standard nucleotides using the genome-resolved raw nanopore signal


Date: 24th May 2018

The Tombo software package enables investigation, detection and visualisation of modified nucleotides. Its framework enables practical, scalable expansion to detect all modifications in both DNA and RNA 

Fig. 1 Modifications a) different epigenetic modifications b) nanopore sequencing of native RNA

Non-standard nucleotides are biologically significant and widespread in DNA and RNA

‘Epigenetics’ refers to heritable alterations of DNA that do not change the nucleotide sequence. One of the most widespread epigenetic modifications is 5-methyl cytosine (m5C), which most frequently occurs in mammalian cells in CpG dinucleotides, but which also occurs in additional sequence contexts. CpG methylation can alter patterns of gene expression by suppressing transcription. Base modification is also widespread in RNA, though the roles effected by these changes in biological processes are less well understood. Nanopore sequencing does not require amplification or strand synthesis, meaning that during sequencing, modified bases pass through the pore, and that the signature of these bases is present in the raw signal (Fig. 1).

Fig. 2 Identifying m6A and m5C a) ROC curves b) AUC values c) examples of dcm-methylation

Tombo performance on known m5C and N6-methyl A (m6A) sites in the E. coli genome

Tombo provides three distinct methods for the detection of modified bases, the choice of which depends on the data available and the experimental objectives. The performance of the different models has continued to improve as we have developed Tombo’s algorithms. Figure 2a shows ROC curves for the three detection methods at the dam- and dcm-modified motifs in E. coli. The de novo method (identifying deviations from the canonical base model) shows the best performance. dam- and dcm-methylation show strong and consistent shifts in the raw nanopore signal (Fig. 2b, top panel). Across the top 1,000 CCWGG-containing regions, the fraction of modified bases identified by Tombo is highest at the known m5C location (Fig. 2c, bottom panel). 

Fig. 3 a) ROC curve for detection of m6A and m5C in E. coli gDNA at different levels of coverage b) m5C detection on NA12878 chr20

Using Tombo to detect of m6A and m5C at different levels of coverage, and m5C in a variety of sequence contexts

We applied Tombo’s de novo modified base model to the E. coli genome for detection of m6A and m5C at different levels of coverage: 1x, 30x and 376x. As might be expected, greater coverage led to higher AUCs. At 376x, the AUC for m6A is 0.975, and for m5C is 0.992, whereas at 30x the AUCs are 0.947 and 0.983 respectively (Fig. 3a). We then used Tombo to detect m5C in human genomic data generated on a PromethION from NA12878, genomic DNA. The optimisation of Tombo for use with PromethION data is not yet complete, but we obtained strong correspondence between our Tombo analysis and publicly available bisulphite data from the same genome. Fig. 3b shows raw nanopore signals (red lines) which deviate from the expected canonical levels (grey background distributions) around sites of methylation which were identified using bisulphite sequencing. The top two panels show methylation at a CpG site, which is symmetric on positive and negative strands. The bottom two panels show methylation in examples of CHG and CHH contexts. Here the methylation was asymmetric, only being present on the positive strands.      

Download the PDF

Fig. 4 Signal shifts in non-C- and C-containing motifs

Model estimation for m5C in Direct RNA data

To estimate a specific m5C model for RNA, we produced a library by in vitro transcription which contains  a mixture of standard NTPs and m5CTP. The modified base caused a signal-level shift only at positions containing cytosine bases, compared to a control library (Fig. 4). From these signal distributions, we created a m5C RNA model and this is now included with Tombo for the specific detection of m5C in Direct RNA sequencing experiments.

Recommended for you

Open a chat to talk to our sales team