GBiB: From download to BLAT at assembly hubs

GBiB installation

Create a folder at your machine to place the installation files:

    $> sudo mkdir /usr/local/src/gbib

Log in at UCSC Genome Browser virtual store:
- Genome Store
- Click in "Add to cart" at the box relative to GBiB.
- Click in "My products" on menu.
- Note the download address.
- Download GBiB to /usr/local/src/gbib:

    $> sudo wget https://genome-store.ucsc.edu/media/products/gbib.zip

Uncompress and delete gbib.zip:

    $> unzip gbib.zip
    $> rm gbib.zip

Start VirtualBox:

    $> sudo virtualbox &

Add GBiB to VirtualBox:
- Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start
- Wait while the first update is done.
- Close GBiB terminal window.
- Select "Send the shutdown signal".
- Confirm by clicking "OK".

GBiB Configuration

Click at "Settings".
- General ---> Advanced ---> Drag'n'Drop: Bidirectional.
- General ---> Description: Schistosoma mansoni genome assembly and track hubs.
- System ---> Motherboard ---> Base Memory: 4.096 MB.
- System ---> Processor ---> Processor(s): 2.
- Display ---> Video ---> Video Memory: 32 MB.
- Shared Folders ---> + ---> Folder Path: /usr/local/src/gbib/hub/ ---> Auto-mount ---> OK.
Boot GBiB virtual machine:
- Select "browserbox" on menu at left.
- Click at "Start".
Test if everything is working at the following URLs:
- http://127.0.0.1:1234
- http://127.0.0.1:1234/folders
Login using ssh, for a faster access.
- Open a terminal, like "konsole".
- Password: browser

    $> ssh browser@localhost -p 1235

Install tools that allows file manipulations:

    $> gbibAddTools

Turn off every kind of automatic update:

    $> gbibAutoUpdateOff

Do not allow users to mirror tracks:

    $> gbibMirrorTracksOff

Turn on the offline mode:

    $> gbibOffline

Reboot the virtual machine

    $> sudo shutdown -r now

Assembly hub configuration

Log in again using ssh:

    $> ssh browser@localhost -p 1235

Create the directories that will store the assembly hub configuration files:

    $> mkdir -p /folders/sf_hubs/geneNetwork/schMan2

Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):
- Insert "udcTimeout=1&" right after http://genome.ucsc.edu/cgi-bin/hgTracks? at URL.
- To disable this feature, click at "clear" on the message that appears at the top of the page.
Fill the contents of hub.txt file (shortlabel <= 17 chars, longlabel <= 80 chars):

    $> cat > /usr/local/src/gbib/hubs/geneNetwork/hub.txt << EOI
    hub geneNetwork
    shortlabel Gene Network
    longlabel Gene Network Hub for Schistosoma mansoni
    genomesFile genomes.txt
    email admin-gene@iq.usp.br
    descriptionUrl geneNetwork.html
    
    EOI

Fill the contents of genomes.txt:

    $> cat > /usr/local/src/gbib/hubs/geneNetwork/genomes.txt << EOI
    genome schMan2
    trackDb schMan2/trackDb.txt
    twoBitPath schMan2/schMan2.2bit
    groups schMan2/groups.txt
    description Dec. 2011 (Sanger 5.2)
    organism Schistosoma mansoni
    defaultPos Sm.Chr_1.unplaced.SC_0010:312,104-379,754
    orderKey 2
    htmlPath schMan2/description.html
    scientificName Schistosoma mansoni
    blat 127.0.0.1 42422
    transBlat 127.0.0.1 42423
    
    EOI

Verify if everything is OK whith the hub:

    $> hubPublickCheck hubPublic -addHub="/folders/sf_hub/geneNetwork/hub.txt"

If the above command works, you will get the MySQL command that could be executed to insert the hub at the public hub table. For example:

    mysql> insert into hubPublic (hubUrl,descriptionUrl,shortLabel,longLabel,registrationTime,dbCount,dbList) values ("/folders/sf_hubs/geneNetwork/hub.txt","/folders/sf_hubs/geneNetwork/geneNetwork.html", "Gene Network", "Gene Network Hub for Schistosoma mansoni", now(),2, "schMan2,");

Track hub configuration

Create the contents of trackDb.txt (track without spaces or dots and with the firts character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):

    $> sudo cat > /usr/local/share/gbib/hubs/geneNetwork/schMan2/trackDb.txt << EOI
    track SMPs
    bigDataUrl schMan2.bb
    shortLabel SMPs v5.2
    longLabel Schistosoma mansoni predictions (SMPs), version 5.2
    type bigBed 12
    searchIndex name
    visibility full
    html schMan2-description
    boxedCfg on
    color 96,64,0
    altColor 128,64,32
    dataVersion Dec. 2011 Sanger 5.2
    # directUrl http://verjo-server-01.iq.usp.br/genome/pires/geneNetwork/schMan1/geneView/%s
    iframeUrl https://www.google.com.br/search?q=$$
    iframeOptions height='400' width='640' scrolling='yes'
    priority 100
    url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$
    urlLabel NCBI Details:
    urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"
    
    track roche454-blat
    bigDataUrl roche454-blat.bb
    shortLabel Roche 454 Trinity
    longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat
    type bigBed 12
    searchIndex name
    visibility full
    color 64,0,96
    altColor 64,32,128
    
    EOI

In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:

    $> change_case --in_format fasta --outfile schMan2.fasta --processes 80 -a upper Schistosoma_mansoni_v5.2.fa

If the names of the chromosomes are very long, we need to make them shorter:

    $> sed s/Schisto_mansoni/Sm/ schMan2.fasta > schMan2-shortChromNames.fasta

Get the .2bit file from this fasta:

    $> faToTwoBit schMan2-shortChromNames.fasta schMan2.2bit

Get and sort from the largest to the shortest a file with the size of all chromosomes of the genome of interest:

    $> twoBitInfo schMan2.2bit stdout | sort -k2rn > schMan2-chromSizes-sorted.txt

The same substitution have to be done at the bed file of the track:

    $> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed

The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:

    $> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed

Convert from bed to bigBed:

    $> bedToBigBed -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb

Contents of groups.txt:

    $> cat > /usr/local/src/gbib/hubs/geneNetwork/schMan2/groups.txt << EOI
    name custom
    label Custom
    priority 1
    defaultIsClosed 1
    
    name mapping
    label Mapping
    priority 2
    defaultIsClosed 1
    
    name genes
    label Genes
    priority 3
    defaultIsClosed 1
    
    name mrna
    label mRNA
    priority 4
    defaultIsClosed 1
    
    name regulation
    label Regulation
    priority 5
    defaultIsClosed 1
    
    name comparative
    label Comparative
    priority 6
    defaultIsClosed 1
    
    name variation
    label Variation
    priority 7
    defaultIsClosed 1
    
    name experimental
    label Experimental
    priority 8
    defaultIsClosed 0
    
    EOI

Blat configuration

From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:

    $> gfServer start 127.0.0.1 42422 -stepSize=5 schMan2.2bit &
    $> gfServer start 127.0.0.1 42423 -trans schMan2.2bit &

If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":

    $> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &

Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:

    blat 127.0.0.1 42422
    transBlat 127.0.0.1 42423

Add this commands to cron.

Custom track configuration

    track type=bigBed

GBiB maintenance

Make an update of all softwares and data:

    $> gbibOnline

GBiB: From download to BLAT at assembly hubs

Contents

GBiB installation

GBiB Configuration

Assembly hub configuration

Track hub configuration

Blat configuration

Custom track configuration

GBiB maintenance

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

related sites

hosted projects

Tools