Ensembl compara

From genomewiki
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Documentation on Ensembl server

  • genome_db: All species in this database
  • There are two different types of homologies: protein and genomic
    • Genomic homologies:
    • Protein homologies:
      • Main table homology: A homology is located on a tree, obtained with an (alignment/phylogenetic) method, based on a set of species
      • The method and the set of species (method_link_species_set_id) are a reference to method_link_species_set. This table points to method_link and to species_set
      • A homology.description can be ortholog or paralog and one-to-one or one-to-many, see Ensembl Doc, but there are also putative_gene_splits, between_species_paralogs and other_paralog
      • A homology is referenced by two or more homology_members

All genomes with their ids:

select * from genome_db;

All one-to-orthologs between genome_ids 3 and 4:

select m1.stable_id, m2.stable_id, h.description from homology as h, homology_member as hm1, homology_member as hm2, member as m1, member as m2 where h.homology_id=hm1.homology_id and h.homology_id=hm2.homology_id and hm1.member_id < hm2.member_id and m1.member_id=hm1.member_id and m2.member_id=hm2.member_id and m1.genome_db_id=3 and m2.genome_db_id=4;

Timeout.

All species_sets that include rat and mouse:

select * from species_set s1, species_set s2, genome_db gd1, genome_db gd2 where s1.species_set_id = s2.species_set_id and s1.genome_db_id<s2.genome_db_id and s1.genome_db_id=gd1.genome_db_id and s2.genome_db_id=gd2.genome_db_id and gd1.name="Rattus norvegicus" and gd2.name="Mus musculus";

All species_sets with more than one or two species:

select species_set_id, count(*) as count from species_set group by species_set_id having count<>1 and count<>2;