Ensembl data load: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
Line 6: Line 6:
  RepeatMasker -species mouse -qq -dir <full_path_to_output_directory> $HOME/workshop/genebuild/test_seqs/test_sequence_to_repeatmask.fa
  RepeatMasker -species mouse -qq -dir <full_path_to_output_directory> $HOME/workshop/genebuild/test_seqs/test_sequence_to_repeatmask.fa


* Define the type of analysis
* Create a "dummy analysis file" which will simply select the sequences to analyse (here: contigs), e.g. create a file submit_ana.conf:
[SubmitContig]
module=Dummy
input_id_type=CONTIG
* Load the "dummy analysis"
$HOME/cvs_checkout/ensembl-pipeline/scripts/analysis_setup.pl $DBSPEC -read -file repeatmask_ana.conf
* Define the real analysis, e.g. repeatmask_ana.conf
  [RepeatMask]
  [RepeatMask]
  db=repbase
  db=repbase
Line 20: Line 26:
  input_id_type=CONTIG
  input_id_type=CONTIG
* load the analysis into the mysql database
* load the analysis into the mysql database
$HOME/cvs_checkout/ensembl-pipeline/scripts/analysis_setup.pl $DBSPEC -read -file test.ana
$HOME/cvs_checkout/ensembl-pipeline/scripts/analysis_setup.pl $DBSPEC -read -file repeatmask_ana.conf
* see what happened:
* see what happened:
  SELECT * from analysis;\G
  SELECT * from analysis;\G

Revision as of 15:48, 13 September 2010

Load Repeatmasker file

  • The make things easier, let's set a little shortcut:
export DBSPEC="-dbhost 127.0.0.1 -dbuser ens-training -dbport 3306 -dbname mouse37_mini_ref -dbpass workshop"
  • Run repeatmasker on a fasta file:
RepeatMasker -species mouse -qq -dir <full_path_to_output_directory> $HOME/workshop/genebuild/test_seqs/test_sequence_to_repeatmask.fa
  • Create a "dummy analysis file" which will simply select the sequences to analyse (here: contigs), e.g. create a file submit_ana.conf:
[SubmitContig]
module=Dummy
input_id_type=CONTIG
  • Load the "dummy analysis"
$HOME/cvs_checkout/ensembl-pipeline/scripts/analysis_setup.pl $DBSPEC -read -file repeatmask_ana.conf 
  • Define the real analysis, e.g. repeatmask_ana.conf
[RepeatMask]
db=repbase
db_version=0129
db_file=repbase
program=RepeatMask
program_version=3.1.8
program_file=/path/to/repmasker/RepeatMask
parameters=-nolow -species mouse -s
module=RepeatMask
gff_source=RepeatMask
gff_feature=repeat
input_id_type=CONTIG
  • load the analysis into the mysql database
$HOME/cvs_checkout/ensembl-pipeline/scripts/analysis_setup.pl $DBSPEC -read -file repeatmask_ana.conf
  • see what happened:
SELECT * from analysis;\G
*************************** 1. row ***************************
   analysis_id: 1
       created: 2010-09-13 16:14:11
    logic_name: RepeatMask
            db: repbase
    db_version: 0129
       db_file: repbase
       program: RepeatMask
program_version: 3.1.8
  program_file: /path/to/repmasker/RepeatMask
    parameters: -nolow -species mouse -s
        module: RepeatMask
module_version: NULL
    gff_source: RepeatMask
   gff_feature: repeat