Description

Summary: A list of identified novel conserved exons.
Exons were identified using a version of CONGO (previously developed for the Drosophila genomes, see reference below) enhanced to handle mammalian exon prediction. The enhancements include a semi-Markov feature to model the short length distribution of mammalian exons, a synteny feature for recognizing duplicated regions, and an alternative training function to improve accuracy when performing an unbalanced prediction task (only ~1.5% of the human genome is protein-coding).

Credits

Data provided by Mike Lin at mit.edu

References

Lin, M. F. et al. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res 17, 1823-1836, doi:gr.6679507 [pii] 10.1101/gr.6679507 (2007).