Ensembl minimum install: Difference between revisions

Revision as of 12:50, 13 September 2010

Load sequences into MySQL tables

You need the fasta and AGP files for an assembly. Ensembl supports multiple coordinate systems: Any piece of DNA can be referenced by it's chromosomal location (1:1000), its super_contig location (NT_039500:1-1000) or other coordinates

Coordinate systems have a "rank" of importance (the higher the better), and a "version" (so the database contains information for several possible assemblies of the same contigs and annotations can be loaded that are based on several different versions)

set a little shortcut:

 export $DBSPEC="-dbhost 127.0.0.1 -dbuser ens-training -dbport 3306 -dbname mouse37_mini_ref -dbpass workshop"

Create an empty database named mouse37_mini_ref and populate it with the CORE schema:

 mysql -uens-training -pworkshop -h127.0.0.1 -P3306 -D mouse37_mini_ref < $HOME/cvs_checkout/ensembl/sql/table.sql

Load coordinates and actual sequences into the empty core database:
- chromosome -> super_contig mappings:

 perl $PS/load_seq_region.pl $DBSPEC -coord_system_name chromosome -coord_system_version NCBIM37 -rank 1 -default_version -agp_file $HOME/workshop/genebuild/assembly/mini_chr_contig.agp

- super_contig -> contig mappings:

perl $PS/load_seq_region.pl -coord_system_name supercontig -default_version -rank 2 -coord_system_version NCBIM37 -agp_file $HOME/workshop/genebuild/assembly/mini_supercontig_contig.agp -verbose

- See what's going on with:

 select * from seq_region
 select * from coord_system 
 select * from dna;

- contigs:

 perl $PS/load_seq_region.pl -coord_system_name contig -default_version -rank 3 -sequence_level -coord_system_version NCBIM37 -fasta_file /home/ensembl/workshop/genebuild/assembly/clones_finished_mini.fa

- clones (only this command loads sequences into the "dna" table):

 perl $PS/load_seq_region.pl $DBSPEC -coord_system_name clone -default_version -coord_system_version NCBIM37 -rank 4 -agp_file /home/ensembl/workshop/genebuild/assembly/mini_clone_contig.agp

@@ Line 1: / Line 1: @@
+== Load sequences into MySQL tables ==
 You need the fasta and AGP files for an assembly. Ensembl supports multiple coordinate systems: Any piece of DNA can be referenced by it's chromosomal location (1:1000), its super_contig location (NT_039500:1-1000) or other coordinates
+Coordinate systems have a "rank" of importance (the higher the better), and a "version" (so the database contains information for several possible assemblies of the same contigs and annotations can be loaded that are based on several different versions)
+* set a little shortcut:
+  export $DBSPEC="-dbhost 127.0.0.1 -dbuser ens-training -dbport 3306 -dbname mouse37_mini_ref -dbpass workshop"
 * Create an empty database named mouse37_mini_ref and populate it with the CORE schema:
    mysql -uens-training -pworkshop -h127.0.0.1 -P3306 -D mouse37_mini_ref < $HOME/cvs_checkout/ensembl/sql/table.sql
-* Load sequences into the empty core database:
+* Load coordinates and actual sequences into the empty core database:
-   perl $PS/load_seq_region.pl -dbhost 127.0.0.1 -dbuser ens-training -dbport 3306 -dbname mouse37_mini_ref -dbpass workshop -coord_system_name chromosome -coord_system_version NCBIM37 -rank 1 -default_version -agp_file $HOME/workshop/genebuild/assembly/mini_chr_contig.agp
+** chromosome -> super_contig mappings:
+   perl $PS/load_seq_region.pl $DBSPEC -coord_system_name chromosome -coord_system_version NCBIM37 -rank 1 -default_version -agp_file $HOME/workshop/genebuild/assembly/mini_chr_contig.agp
+** super_contig -> contig mappings:
+perl $PS/load_seq_region.pl -coord_system_name supercontig -default_version -rank 2 -coord_system_version NCBIM37 -agp_file $HOME/workshop/genebuild/assembly/mini_supercontig_contig.agp -verbose
+** See what's going on with:
+  select * from seq_region
+  select * from coord_system
+  select * from dna;
+** contigs:
+  perl $PS/load_seq_region.pl -coord_system_name contig -default_version -rank 3 -sequence_level -coord_system_version NCBIM37 -fasta_file /home/ensembl/workshop/genebuild/assembly/clones_finished_mini.fa
+** clones (only this command loads sequences into the "dna" table):
+  perl $PS/load_seq_region.pl $DBSPEC -coord_system_name clone -default_version -coord_system_version NCBIM37 -rank 4 -agp_file /home/ensembl/workshop/genebuild/assembly/mini_clone_contig.agp
 *

Ensembl minimum install: Difference between revisions

Revision as of 12:50, 13 September 2010

Load sequences into MySQL tables

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

related sites

hosted projects

Tools