Beware of Ogres: grass pea and the challenges of assembling large legume genomes
Grass pea (Lathyrus sativus) is exceptionally resilient to drought, flooding, and salinity. However, it contains a toxin which, when a lot of the plant is consumed over months, can cause paralysis from the waist down.
The problem of the presence of the toxin can be tackled through plant breeding; this requires an understanding of e.g. the genetics of toxin synthesis, for which a genome assembly is needed.
The grass pea genome is 6.3 Gb and highly repetitive, featuring ‘Ogre elements’, spanning up to 25 kbp.
Short-read assembly of grass pea produced a 6.2 Gb assembly across 1.6 million contigs.
Scaffolding the assembly with paired-end short reads increased contiguity, but introduced 2 billion Ns into the assembly.
Long-read nanopore sequencing on PromethION to 36x coverage + polishing with short reads produced a 6.2 Gb assembly ‘with no Ns’ in 163 contigs, with almost 3-fold improvement in contig N50 vs the scaffolded short-read assembly.
Gene annotation revealed 45k protein-coding genes & >75k transcripts. BUSCO completeness was 82-90%.