Supplementary Figure 8: Classification of transcripts of unknown coding potential. | Nature Genetics

Supplementary Figure 8: Classification of transcripts of unknown coding potential.

From: The landscape of long noncoding RNAs in the human transcriptome

Supplementary Figure 8

(a) Decision tree showing the categorization of ab initio transcripts. Unannotated transcripts and annotated noncoding RNAs were classified as either lncRNA or TUCP. Transcript categories for protein-coding genes, pseudogenes and read-throughs were imputed from overlapping reference annotations. (b) ROC curve comparing the false positive rate (x axis) with the true positive rate (y axis) for CPAT coding potential predictions of noncoding RNAs versus protein-coding genes. (c) Curve comparing the probability cutoff (x axis) with balanced accuracy (y axis). The dotted line shows the cutoff used to call TUCP transcripts. (d) Scatter plot comparing the frequency of Pfam domain occurrences in non-transcribed intergenic space versus transcribed regions. Points in red were considered valid Pfam domain hits, and points in black were considered artifacts. (e) Three-dimensional scatter plot comparing Fickett score (x axis), ORF size (y axis) and Hexamer score (z axis) for all transcripts. Transcripts represented by red points contain valid Pfam domains, while blue do not. (f–h) Box plots comparing ORF size (f), Hexamer score (g) and Fickett score (h) for lncRNAs (red), TUCPs predicted by Pfam only (yellow), TUCPs predicted by CPAT (green) and TUCPs predicted by both Pfam and CPAT (blue).

Back to article page