GBiB: From download to BLAT at assembly hubs: Difference between revisions

From genomewiki
Jump to navigationJump to search
m (Moving creation of folders to "Preparing raw data" section.)
m (Changing the way we show "here documents" (<<) such that one can copy and paste them.)
Line 91: Line 91:
* Fill the contents of hub.txt file:
* Fill the contents of hub.txt file:
  $> cat > /usr/local/src/gbib/hubs/geneNetwork/hub.txt << EOI
  $> cat > /usr/local/src/gbib/hubs/geneNetwork/hub.txt << EOI
    hub geneNetwork
hub geneNetwork
    shortlabel Gene Network
shortlabel Gene Network
    longlabel Gene Network Hub for Schistosoma mansoni
longlabel Gene Network Hub for Schistosoma mansoni
    genomesFile genomes.txt
genomesFile genomes.txt
    email admin-gene@iq.usp.br
email admin-gene@iq.usp.br
    descriptionUrl geneNetwork.html
descriptionUrl geneNetwork.html
   
   
    EOI
EOI
* The following rules must be obeyed:
* The following rules must be obeyed:
** hub: name without spaces.
** hub: name without spaces.
Line 105: Line 105:
* Fill the contents of genomes.txt:
* Fill the contents of genomes.txt:
  $> cat > /usr/local/src/gbib/hubs/geneNetwork/genomes.txt << EOI
  $> cat > /usr/local/src/gbib/hubs/geneNetwork/genomes.txt << EOI
    genome schMan2
genome schMan2
    trackDb schMan2/trackDb.txt
trackDb schMan2/trackDb.txt
    twoBitPath schMan2/schMan2.2bit
twoBitPath schMan2/schMan2.2bit
    groups schMan2/groups.txt
groups schMan2/groups.txt
    description Dec. 2011 (Sanger 5.2)
description Dec. 2011 (Sanger 5.2)
    organism Schistosoma mansoni
organism Schistosoma mansoni
    defaultPos Sm.Chr_1.unplaced.SC_0010:312,104-379,754
defaultPos Sm.Chr_1.unplaced.SC_0010:312,104-379,754
    orderKey 2
orderKey 2
    htmlPath schMan2/description.html
htmlPath schMan2/description.html
    scientificName Schistosoma mansoni
scientificName Schistosoma mansoni
    blat 127.0.0.1 42422
blat 127.0.0.1 42422
    transBlat 127.0.0.1 42423
transBlat 127.0.0.1 42423
   
   
    EOI
EOI
* Verify if everything is OK with the hub:
* Verify if everything is OK with the hub:
  $> hubPublickCheck hubPublic -addHub="/folders/sf_hub/geneNetwork/hub.txt"
  $> hubPublickCheck hubPublic -addHub="/folders/sf_hub/geneNetwork/hub.txt"
Line 129: Line 129:
* Create the contents of trackDb.txt (track without spaces or dots and with the firts character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):
* Create the contents of trackDb.txt (track without spaces or dots and with the firts character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):
  $> sudo cat > /usr/local/share/gbib/hubs/geneNetwork/schMan2/trackDb.txt << EOI
  $> sudo cat > /usr/local/share/gbib/hubs/geneNetwork/schMan2/trackDb.txt << EOI
    track SMPs
track SMPs
    bigDataUrl schMan2.bb
bigDataUrl schMan2.bb
    shortLabel SMPs v5.2
shortLabel SMPs v5.2
    longLabel Schistosoma mansoni predictions (SMPs), version 5.2
longLabel Schistosoma mansoni predictions (SMPs), version 5.2
    type bigBed 12
type bigBed 12
    searchIndex name
searchIndex name
    visibility full
visibility full
    html schMan2-description
html schMan2-description
    boxedCfg on
boxedCfg on
    color 96,64,0
color 96,64,0
    altColor 128,64,32
altColor 128,64,32
    dataVersion Dec. 2011 <em>Sanger 5.2</em>
dataVersion Dec. 2011 <em>Sanger 5.2</em>
    # directUrl http://verjo-server-01.iq.usp.br/genome/pires/geneNetwork/schMan1/geneView/%s
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/geneNetwork/schMan1/geneView/%s
    iframeUrl https://www.google.com.br/search?q=$$
iframeUrl https://www.google.com.br/search?q=$$
    iframeOptions height='400' width='640' scrolling='yes'
iframeOptions height='400' width='640' scrolling='yes'
    priority 100
priority 100
    url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$
    urlLabel NCBI Details:
urlLabel NCBI Details:
    urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"
   
   
    track roche454-blat
track roche454-blat
    bigDataUrl roche454-blat.bb
bigDataUrl roche454-blat.bb
    shortLabel Roche 454 Trinity
shortLabel Roche 454 Trinity
    longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat
    type bigBed 12
type bigBed 12
    searchIndex name
searchIndex name
    visibility full
visibility full
    color 64,0,96
color 64,0,96
    altColor 64,32,128
altColor 64,32,128
   
   
    EOI
EOI
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:
  $> change_case --in_format fasta --outfile schMan2.fasta --processes 80 -a upper Schistosoma_mansoni_v5.2.fa
  $> change_case --in_format fasta --outfile schMan2.fasta --processes 80 -a upper Schistosoma_mansoni_v5.2.fa
Line 176: Line 176:
* Contents of groups.txt:
* Contents of groups.txt:
  $> cat > /usr/local/src/gbib/hubs/geneNetwork/schMan2/groups.txt << EOI
  $> cat > /usr/local/src/gbib/hubs/geneNetwork/schMan2/groups.txt << EOI
    name custom
name custom
    label Custom
label Custom
    priority 1
priority 1
    defaultIsClosed 1
defaultIsClosed 1
   
   
    name mapping
name mapping
    label Mapping
label Mapping
    priority 2
priority 2
    defaultIsClosed 1
defaultIsClosed 1
   
   
    name genes
name genes
    label Genes
label Genes
    priority 3
priority 3
    defaultIsClosed 1
defaultIsClosed 1
   
   
    name mrna
name mrna
    label mRNA
label mRNA
    priority 4
priority 4
    defaultIsClosed 1
defaultIsClosed 1
   
   
    name regulation
name regulation
    label Regulation
label Regulation
    priority 5
priority 5
    defaultIsClosed 1
defaultIsClosed 1
   
   
    name comparative
name comparative
    label Comparative
label Comparative
    priority 6
priority 6
    defaultIsClosed 1
defaultIsClosed 1
   
   
    name variation
name variation
    label Variation
label Variation
    priority 7
priority 7
    defaultIsClosed 1
defaultIsClosed 1
   
   
    name experimental
name experimental
    label Experimental
label Experimental
    priority 8
priority 8
    defaultIsClosed 0
defaultIsClosed 0
   
   
    EOI
EOI





Revision as of 00:00, 11 May 2015

Introduction

Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:

  • It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.
  • It is much easier to install, configure and maintain when compared with a full mirror of UCSC Genome Browser web site.
  • It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.

Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.

Preparing the raw data

  • Create the directories that will store the assembly hub configuration files:
$> mkdir -p ~/var/gbib/work ~/var/gbib/hubs/geneNetwork/eboVir3
$> cd ~/var/gbib/work
$> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/eboVir3.2bit .
$> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/eboVir3.chrom.sizes .
$> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .


GBiB installation

  • Create a folder at your machine to place the installation files:
$> sudo mkdir /usr/local/src/gbib
  • Download GBiB from UCSC Genome Browser virtual store:
    • Go to the Genome Store.
    • Click in "Login / Register".
    • Check if you agree with the terms and conditions.
    • Check if your hardware and software meet the basic requirements.
    • Click in "Add to cart" at the box relative to GBiB.
    • Click in "Cart (1)" on menu.
    • Click in "Proceed to checkout".
    • Click in "My products" on menu.
    • Copy the address of download (let's call it <download_link>).
    • Download GBiB to /usr/local/src/gbib, uncompress and delete it.
$> cd /usr/local/src/gbib
$> sudo wget <download_link>
$> sudo unzip gbib.zip
$> sudo rm gbib.zip
  • Give user sufficient access to the three uncompressed files and to the folder and start VirtualBox in background:
$> sudo chmod o+rw /usr/local/src/gbib/*
$> sudo chmod o+w /usr/local/srb/gbib
$> virtualbox &
  • Add GBiB to VirtualBox and boot it for the first time:
    • Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start
    • Wait while the first update is done (it can takes more than 1 hour to finish the update process, depending of your internet connection speed).
    • Close GBiB terminal window.
    • Select "Send the shutdown signal".
    • Confirm by clicking "OK".


GBiB configuration

  • Click at "Settings".
    • General ---> Description: Ebola virus genome assembly and track hubs.
    • System ---> Motherboard ---> Base Memory: 4.096 MB.
    • System ---> Processor ---> Processor(s): 2.
    • Display ---> Video ---> Video Memory: 32 MB.
    • Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.
    • Shared Folders ---> + ---> Folder Path: ~/var/gbib/hubs ---> Read-only ---> Auto-mount ---> OK.
  • Boot GBiB virtual machine:
    • Select "browserbox" on menu at left.
    • Click at "Start".
  • Test if everything is working at the following URLs:
  • Login using ssh, for a faster access.
    • Open a terminal, like "konsole".
    • Password: browser
$> ssh browser@localhost -p 1235
  • Install tools that allows file manipulations:
$> gbibAddTools
  • Turn off every kind of automatic update:
$> gbibAutoUpdateOff
  • Do not allow users to mirror tracks:
$> gbibMirrorTracksOff
  • Turn on the offline mode:
$> gbibOffline
  • Reboot the virtual machine
$> sudo shutdown -r now


Assembly hub configuration

  • Log in again using ssh:
$> ssh browser@localhost -p 1235
  • Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):
  • Fill the contents of hub.txt file:
$> cat > /usr/local/src/gbib/hubs/geneNetwork/hub.txt << EOI
hub geneNetwork
shortlabel Gene Network
longlabel Gene Network Hub for Schistosoma mansoni
genomesFile genomes.txt
email admin-gene@iq.usp.br
descriptionUrl geneNetwork.html

EOI
  • The following rules must be obeyed:
    • hub: name without spaces.
    • shortLabel: limited to 17 characters.
    • longLabel: limited to 80 characters.
  • Fill the contents of genomes.txt:
$> cat > /usr/local/src/gbib/hubs/geneNetwork/genomes.txt << EOI
genome schMan2
trackDb schMan2/trackDb.txt
twoBitPath schMan2/schMan2.2bit
groups schMan2/groups.txt
description Dec. 2011 (Sanger 5.2)
organism Schistosoma mansoni
defaultPos Sm.Chr_1.unplaced.SC_0010:312,104-379,754
orderKey 2
htmlPath schMan2/description.html
scientificName Schistosoma mansoni
blat 127.0.0.1 42422
transBlat 127.0.0.1 42423

EOI
  • Verify if everything is OK with the hub:
$> hubPublickCheck hubPublic -addHub="/folders/sf_hub/geneNetwork/hub.txt"
  • If the above command works, you will get the MySQL command that could be executed to insert the hub at the public hub table. For example:
    mysql> insert into hubPublic (hubUrl,descriptionUrl,shortLabel,longLabel,registrationTime,dbCount,dbList) values ("/folders/sf_hubs/geneNetwork/hub.txt","/folders/sf_hubs/geneNetwork/geneNetwork.html", "Gene Network", "Gene Network Hub for Schistosoma mansoni", now(),2, "schMan2,");


Track hub configuration

  • Create the contents of trackDb.txt (track without spaces or dots and with the firts character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):
$> sudo cat > /usr/local/share/gbib/hubs/geneNetwork/schMan2/trackDb.txt << EOI
track SMPs
bigDataUrl schMan2.bb
shortLabel SMPs v5.2
longLabel Schistosoma mansoni predictions (SMPs), version 5.2
type bigBed 12
searchIndex name
visibility full
html schMan2-description
boxedCfg on
color 96,64,0
altColor 128,64,32
dataVersion Dec. 2011 Sanger 5.2
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/geneNetwork/schMan1/geneView/%s
iframeUrl https://www.google.com.br/search?q=$$
iframeOptions height='400' width='640' scrolling='yes'
priority 100
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$
urlLabel NCBI Details:
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"

track roche454-blat
bigDataUrl roche454-blat.bb
shortLabel Roche 454 Trinity
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat
type bigBed 12
searchIndex name
visibility full
color 64,0,96
altColor 64,32,128

EOI
  • In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:
$> change_case --in_format fasta --outfile schMan2.fasta --processes 80 -a upper Schistosoma_mansoni_v5.2.fa
  • If the names of the chromosomes are very long, we need to make them shorter:
$> sed s/Schisto_mansoni/Sm/ schMan2.fasta > schMan2-shortChromNames.fasta
  • Get the .2bit file from this fasta:
$> faToTwoBit schMan2-shortChromNames.fasta schMan2.2bit
  • Get and sort from the largest to the shortest a file with the size of all chromosomes of the genome of interest:
$> twoBitInfo schMan2.2bit stdout | sort -k2rn > schMan2-chromSizes-sorted.txt
  • The same substitution have to be done at the bed file of the track:
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed
  • The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed
  • Convert from bed to bigBed:
$> bedToBigBed -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb
  • Contents of groups.txt:
$> cat > /usr/local/src/gbib/hubs/geneNetwork/schMan2/groups.txt << EOI
name custom
label Custom
priority 1
defaultIsClosed 1

name mapping
label Mapping
priority 2
defaultIsClosed 1

name genes
label Genes
priority 3
defaultIsClosed 1

name mrna
label mRNA
priority 4
defaultIsClosed 1

name regulation
label Regulation
priority 5
defaultIsClosed 1

name comparative
label Comparative
priority 6
defaultIsClosed 1

name variation
label Variation
priority 7
defaultIsClosed 1

name experimental
label Experimental
priority 8
defaultIsClosed 0

EOI


Blat configuration

  • From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:
$> gfServer start 127.0.0.1 42422 -stepSize=5 schMan2.2bit &
$> gfServer start 127.0.0.1 42423 -trans schMan2.2bit &
  • If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &
  • Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:
blat 127.0.0.1 42422
transBlat 127.0.0.1 42423
  • Add this commands to cron.


Custom track configuration

track type=bigBed name="Name" description="Description"
bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb
color=204,51,51 altColor=204,51,51 visibility=full


GBiB maintenance

  • Make an update of all softwares and data:
$> gbibOnline
$> gbibAutoUpdateOn
$> updateBrowser
$> gbibAutoUpdateOff
$> gbibOffline


References

See also: