GBiB: From download to BLAT at assembly hubs

From genomewiki
Revision as of 20:01, 6 May 2015 by David da Silva Pires (talk | contribs) (First version of "Blat configuration" section.)
Jump to navigationJump to search

GBiB installation

  • Create a folder at your machine to place the installation files:
    $> sudo mkdir /usr/local/src/gbib
  • Log in at UCSC Genome Browser virtual store:
    • Genome Store
    • Click in "Add to cart" at the box relative to GBiB.
    • Click in "My products" on menu.
    • Note the download address.
    • Download GBiB to /usr/local/src/gbib:
    $> sudo wget https://genome-store.ucsc.edu/media/products/gbib.zip
  • Uncompress and delete gbib.zip:
    $> unzip gbib.zip
    $> rm gbib.zip
  • Start VirtualBox:
    $> sudo virtualbox &
  • Add GBiB to VirtualBox:
    • Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start
    • Wait while the first update is done.
    • Close GBiB terminal window.
    • Select "Send the shutdown signal".
    • Confirm by clicking "OK".


GBiB Configuration

  • Click at "Settings".
    • General ---> Advanced ---> Drag'n'Drop: Bidirectional.
    • General ---> Description: Schistosoma mansoni genome assembly and track hubs.
    • System ---> Motherboard ---> Base Memory: 4.096 MB.
    • System ---> Processor ---> Processor(s): 2.
    • Display ---> Video ---> Video Memory: 32 MB.
    • Shared Folders ---> + ---> Folder Path: /usr/local/src/gbib/hub/ ---> Auto-mount ---> OK.
  • Boot GBiB virtual machine:
    • Select "browserbox" on menu at left.
    • Click at "Start".
  • Test if everything is working at the following URLs:
  • Login using ssh, for a faster access.
    • Open a terminal, like "konsole".
    • Password: browser
    $> ssh browser@localhost -p 1235
  • Install tools that allows file manipulations:
    $> gbibAddTools
  • Turn off every kind of automatic update:
    $> gbibAutoUpdateOff
  • Do not allow users to mirror tracks:
    $> gbibMirrorTracksOff
  • Turn on the offline mode:
    $> gbibOffline
  • Reboot the virtual machine
    $> sudo shutdown -r now


Assembly hub configuration

  • Log in again using ssh:
    $> ssh browser@localhost -p 1235
  • Create the directories that will store the assembly hub configuration files:
    $> mkdir -p /folders/sf_hubs/geneNetwork/schMan2
  • Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):
  • Fill the contents of hub.txt file (shortlabel <= 17 chars, longlabel <= 80 chars):
    $> cat > /usr/local/src/gbib/hubs/geneNetwork/hub.txt << EOI
    hub geneNetwork
    shortlabel Gene Network
    longlabel Gene Network Hub for Schistosoma mansoni
    genomesFile genomes.txt
    email admin-gene@iq.usp.br
    descriptionUrl geneNetwork.html
    
    EOI
  • Fill the contents of genomes.txt:
    $> cat > /usr/local/src/gbib/hubs/geneNetwork/genomes.txt << EOI
    genome schMan2
    trackDb schMan2/trackDb.txt
    twoBitPath schMan2/schMan2.2bit
    groups schMan2/groups.txt
    description Dec. 2011 (Sanger 5.2)
    organism Schistosoma mansoni
    defaultPos Sm.Chr_1.unplaced.SC_0010:312,104-379,754
    orderKey 2
    htmlPath schMan2/description.html
    scientificName Schistosoma mansoni
    blat 127.0.0.1 42422
    transBlat 127.0.0.1 42423
    
    EOI
  • Verify if everything is OK whith the hub:
    $> hubPublickCheck hubPublic -addHub="/folders/sf_hub/geneNetwork/hub.txt"
  • If the above command works, you will get the MySQL command that could be executed to insert the hub at the public hub table. For example:
    mysql> insert into hubPublic (hubUrl,descriptionUrl,shortLabel,longLabel,registrationTime,dbCount,dbList) values ("/folders/sf_hubs/geneNetwork/hub.txt","/folders/sf_hubs/geneNetwork/geneNetwork.html", "Gene Network", "Gene Network Hub for Schistosoma mansoni", now(),2, "schMan2,");


Track hub configuration

  • Create the contents of trackDb.txt (track without spaces or dots and with the firts character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):
    $> sudo cat > /usr/local/share/gbib/hubs/geneNetwork/schMan2/trackDb.txt << EOI
    track SMPs
    bigDataUrl schMan2.bb
    shortLabel SMPs v5.2
    longLabel Schistosoma mansoni predictions (SMPs), version 5.2
    type bigBed 12
    searchIndex name
    visibility full
    html schMan2-description
    boxedCfg on
    color 96,64,0
    altColor 128,64,32
    dataVersion Dec. 2011 Sanger 5.2
    # directUrl http://verjo-server-01.iq.usp.br/genome/pires/geneNetwork/schMan1/geneView/%s
    iframeUrl https://www.google.com.br/search?q=$$
    iframeOptions height='400' width='640' scrolling='yes'
    priority 100
    url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$
    urlLabel NCBI Details:
    urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"
    
    track roche454-blat
    bigDataUrl roche454-blat.bb
    shortLabel Roche 454 Trinity
    longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat
    type bigBed 12
    searchIndex name
    visibility full
    color 64,0,96
    altColor 64,32,128
    
    EOI
  • In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:
    $> change_case --in_format fasta --outfile schMan2.fasta --processes 80 -a upper Schistosoma_mansoni_v5.2.fa
  • If the names of the chromosomes are very long, we need to make them shorter:
    $> sed s/Schisto_mansoni/Sm/ schMan2.fasta > schMan2-shortChromNames.fasta
  • Get the .2bit file from this fasta:
    $> faToTwoBit schMan2-shortChromNames.fasta schMan2.2bit
  • Get and sort from the largest to the shortest a file with the size of all chromosomes of the genome of interest:
    $> twoBitInfo schMan2.2bit stdout | sort -k2rn > schMan2-chromSizes-sorted.txt
  • The same substitution have to be done at the bed file of the track:
    $> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed
  • The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:
    $> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed
  • Convert from bed to bigBed:
    $> bedToBigBed -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb
  • Contents of groups.txt:
    $> cat > /usr/local/src/gbib/hubs/geneNetwork/schMan2/groups.txt << EOI
    name custom
    label Custom
    priority 1
    defaultIsClosed 1
    
    name mapping
    label Mapping
    priority 2
    defaultIsClosed 1
    
    name genes
    label Genes
    priority 3
    defaultIsClosed 1
    
    name mrna
    label mRNA
    priority 4
    defaultIsClosed 1
    
    name regulation
    label Regulation
    priority 5
    defaultIsClosed 1
    
    name comparative
    label Comparative
    priority 6
    defaultIsClosed 1
    
    name variation
    label Variation
    priority 7
    defaultIsClosed 1
    
    name experimental
    label Experimental
    priority 8
    defaultIsClosed 0
    
    EOI


Blat configuration

  • From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:
    $> gfServer start 127.0.0.1 42422 -stepSize=5 schMan2.2bit &
    $> gfServer start 127.0.0.1 42423 -trans schMan2.2bit &
  • If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":
    $> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &
  • Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:
    blat 127.0.0.1 42422
    transBlat 127.0.0.1 42423
  • Add this commands to cron.


Custom track configuration

    track type=bigBed


GBiB maintenance

  • Make an update of all softwares and data:
    $> gbibOnline