Resources Get started
Resource Centre

Quantitative RNA-seq: PCR-cDNA, PCR-free Direct cDNA and Direct RNA sequencing


Date: 13th May 2019

PCR-cDNA, Direct cDNA and Direct RNA sequencing give low-bias datasets with a high proportion of unambiguously mapping reads.

Fig. 1 RNA-seq workflows a) PCR-cDNA b) stranded PCR-cDNA c) Direct cDNA d) Direct RNA

Long-read RNA-seq: full-length cDNA libraries and Direct RNA sequencing

We have four core methods for the preparation of RNA-seq libraries. PCR-cDNA libraries are created by reverse transcription, strand-switching & second-strand synthesis, followed by PCR & attachment of sequencing adapters (Fig. 1a). A stranded version of this method is available (Fig. 1b). cDNA libraries can also be sequenced without any amplification being performed. In the Direct cDNA protocol, sequencing adapters are attached directly to the double-stranded cDNA product, making library preparation considerably faster (Fig. 1c). Again, strand-switching is used to increase the proportion of full-length cDNAs. In Direct RNA sequencing, adapters are ligated onto the 3’ end of poly-A-tailed RNA strands before sequencing (Fig. 1d).

ERCC spike-in panel assay shows all RNA-seq library types to be quantitative

ERCC spike-in panel assay shows all RNA-seq library types to be quantitative

We evaluated all four RNA-seq approaches by making libraries from the ERCC spike-in panel (Fig. 2). This is a set of 92 polyadenylated RNAs, ranging from 250 to 2,000 nucleotides in length, which are present in the mixture at defined concentrations. Strong correlations were obtained between observed and expected read counts in all cases, with no evidence of length bias, showing that each method of RNA-seq is low bias. Interestingly, the non-stranded PCR-cDNA protocol gave a strong correlation (Spearman r = 0.989, p < 0.001), in spite of the protocol including 10 cycles of PCR amplification. The cDNA protocols currently perform with the highest throughput, due to a different motor protein being used for Direct RNA sequencing.

Fig. 3 Isoform reconstruction from cDNA and direct RNA data

Confident genome annotation using long-read data

To compare our ability to reconstruct isoforms with the pinfish toolset (, we generated datasets for the Lexogen SIRV E0 mix and selected full-length, flip-flop-basecalled reads with pychopper. The pinfish tools take minimap2 -spliced alignments and summarise the long-read information into isoforms by clustering reads with similar exon–intron structure. The reads are then polished using racon and are re-mapped onto the reference genome. Finally, alignments of the polished reads are converted into GFF2 format. Fig. 3a shows direct cDNA alignments at the SIRV6 locus, with the true annotation and the recovered annotation. For each protocol we compared recovered annotations to the true annotations using the gffcompare tool (Fig. 3b). Polished transcript accuracy for all kits ranged between 99.0% and 99.7%.

Download the PDF

Fig. 4 Performance of nanopore reads a) GC bias b) PCR cycles c) multimapping reads

Advantages of nanopore sequencing in transcript quantification

We calculated correlations between GC content and read count for three of our library types, and compared these to a 100x PE Illumina dataset of the same sample. GC bias was virtually absent from the ONT data, including the PCR-cDNA data (Fig. 4a). To determine the number of permissible PCR cycles we calculated the variation in sequence counts which could be explained by transcript length, for a PCR-cDNA library created from 1 ng input. Results suggest that fewer than 18 cycles should be used (Fig. 4b). Finally, we evaluated the proportion of primary multimapping reads in a short-read Drosophila dataset in comparison to ONT data, using minimap2 to map to the transcriptome (Fig. 4c). The results indicate that long reads provide more confident mapping than short reads.

Recommended for you

Open a chat to talk to our sales team