https://genomewiki.ucsc.edu/api.php?action=feedcontributions&user=David+da+Silva+Pires&feedformat=atomgenomewiki - User contributions [en]2024-03-28T22:37:10ZUser contributionsMediaWiki 1.38.4https://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=25852GBiB: From download to BLAT at assembly hubs2021-06-25T19:38:42Z<p>David da Silva Pires: /* Track hub configuration */ Changing from eboVir3 to tryCru32-CLBrenerEsmeraldoLike.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 19.10 (Eoan Ermine). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo chmod o+x /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget --no-check-certificate <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/{final,input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/input<br />
<br />
* Download input data file: genome FASTA.<br />
<font color=green>browser@browserbox:$></font> wget 'https://tritrypdb.org/common/downloads/release-32/TcruziCLBrenerEsmeraldo-like/fasta/data/TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta'<br />
<br />
* Make a symbolic link for input data file (just to keep a pattern).<br />
<font color=green>browser@browserbox:$></font> ln -s TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta tryCru32-CLBrenerEsmeraldoLike.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/assemblyHub/hub.txt << EOI<br />
hub assemblyHub<br />
shortLabel Assembly Hub<br />
longLabel Assembly Hub for Trypanosoma cruzi<br />
genomesFile genomes.txt<br />
email admin@assemblyhub.edu<br />
descriptionUrl http://genome.assemblyhub.edu<br />
EOI<br />
<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/assemblyHub/genomes.txt << EOI<br />
genome tryCru32-CLBrenerEsmeraldoLike<br />
organism Trypanosoma cruzi CL Brener Esmeraldo-like<br />
scientificName Trypanosoma cruzi<br />
orderKey 1<br />
description TriTrypDB Release 32 (20 Apr 2017)<br />
defaultPos TcChr1-S:1-77,957<br />
twoBitPath tryCru32-CLBrenerEsmeraldoLike/genome/final/tryCru32-CLBrenerEsmeraldoLike.2bit<br />
htmlPath tryCru32-CLBrenerEsmeraldoLike/htmlPage/description.html<br />
groups tryCru32-CLBrenerEsmeraldoLike/groups.txt<br />
trackDb tryCru32-CLBrenerEsmeraldoLike/trackDb.txt<br />
blat 127.0.0.1 2302<br />
transBlat 127.0.0.1 2303<br />
EOI<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/output/tryCru32-CLBrenerEsmeraldoLike-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/input/tryCru32-CLBrenerEsmeraldoLike.fasta<br />
<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/output<br />
<font color=green>browser@browserbox:$></font> sed 's/^>Trypanosoma_cruzi/>Tc/' tryCru32-CLBrenerEsmeraldoLike-uppercase.fasta > tryCru32-CLBrenerEsmeraldoLike-uppercase-shortChromNames.fasta<br />
<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s tryCru32-CLBrenerEsmeraldoLike-uppercase-shortChromNames.fasta tryCru32-CLBrenerEsmeraldoLike.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit tryCru32-CLBrenerEsmeraldoLike.fasta tryCru32-CLBrenerEsmeraldoLike.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo tryCru32-CLBrenerEsmeraldoLike.2bit stdout | sort -k2nr > tryCru32-CLBrenerEsmeraldoLike-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 tryCru32-CLBrenerEsmeraldoLike.fasta tryCru32-CLBrenerEsmeraldoLike.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n tryCru32-CLBrenerEsmeraldoLike.agp > tryCru32-CLBrenerEsmeraldoLike-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa tryCru32-CLBrenerEsmeraldoLike-sorted.agp tryCru32-CLBrenerEsmeraldoLike.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file and should not contain the character '.' (dot).<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/tryCru32-CLBrenerEsmeraldoLike-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/tryCru32-CLBrenerEsmeraldoLike.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/tryCru32-CLBrenerEsmeraldoLike-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/tryCru32-CLBrenerEsmeraldoLike-chromSizes.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/assemblyHub/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/assemblyHub/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track gap<br />
shortLabel Gap<br />
longLabel Gap Locations<br />
type bigBed 4 .<br />
bigDataUrl tracks/map/gap/output/gap.bb<br />
EOI<br />
<font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
<br />
# Gap Locations.<br />
include tracks/map/gap/trackDb.txt<br />
EOI<br />
<br />
<br />
=== Basic HTML page ===<br />
<br />
Let's compose a basic page to our organism of interest:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<p><br />
<i>Ebola</i> virus genome assembly and track hub.<br />
<ul><br />
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</a></li><br />
</ul><br />
</p><br />
<p><br />
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br><br />
</p></nowiki><br />
EOI<br />
<br />
For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html.<br />
<br />
= Other tips =<br />
<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=25849GBiB: From download to BLAT at assembly hubs2021-06-18T20:30:36Z<p>David da Silva Pires: /* Preparing the data */ Changing from eboVir3 to tryCru32-CLBrenerEsmeraldoLike.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 19.10 (Eoan Ermine). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo chmod o+x /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget --no-check-certificate <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/{final,input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/input<br />
<br />
* Download input data file: genome FASTA.<br />
<font color=green>browser@browserbox:$></font> wget 'https://tritrypdb.org/common/downloads/release-32/TcruziCLBrenerEsmeraldo-like/fasta/data/TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta'<br />
<br />
* Make a symbolic link for input data file (just to keep a pattern).<br />
<font color=green>browser@browserbox:$></font> ln -s TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta tryCru32-CLBrenerEsmeraldoLike.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/assemblyHub/hub.txt << EOI<br />
hub assemblyHub<br />
shortLabel Assembly Hub<br />
longLabel Assembly Hub for Trypanosoma cruzi<br />
genomesFile genomes.txt<br />
email admin@assemblyhub.edu<br />
descriptionUrl http://genome.assemblyhub.edu<br />
EOI<br />
<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/assemblyHub/genomes.txt << EOI<br />
genome tryCru32-CLBrenerEsmeraldoLike<br />
organism Trypanosoma cruzi CL Brener Esmeraldo-like<br />
scientificName Trypanosoma cruzi<br />
orderKey 1<br />
description TriTrypDB Release 32 (20 Apr 2017)<br />
defaultPos TcChr1-S:1-77,957<br />
twoBitPath tryCru32-CLBrenerEsmeraldoLike/genome/final/tryCru32-CLBrenerEsmeraldoLike.2bit<br />
htmlPath tryCru32-CLBrenerEsmeraldoLike/htmlPage/description.html<br />
groups tryCru32-CLBrenerEsmeraldoLike/groups.txt<br />
trackDb tryCru32-CLBrenerEsmeraldoLike/trackDb.txt<br />
blat 127.0.0.1 2302<br />
transBlat 127.0.0.1 2303<br />
EOI<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/output/tryCru32-CLBrenerEsmeraldoLike-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/input/tryCru32-CLBrenerEsmeraldoLike.fasta<br />
<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/output<br />
<font color=green>browser@browserbox:$></font> sed 's/^>Trypanosoma_cruzi/>Tc/' tryCru32-CLBrenerEsmeraldoLike-uppercase.fasta > tryCru32-CLBrenerEsmeraldoLike-uppercase-shortChromNames.fasta<br />
<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s tryCru32-CLBrenerEsmeraldoLike-uppercase-shortChromNames.fasta tryCru32-CLBrenerEsmeraldoLike.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit tryCru32-CLBrenerEsmeraldoLike.fasta tryCru32-CLBrenerEsmeraldoLike.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo tryCru32-CLBrenerEsmeraldoLike.2bit stdout | sort -k2nr > tryCru32-CLBrenerEsmeraldoLike-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 tryCru32-CLBrenerEsmeraldoLike.fasta tryCru32-CLBrenerEsmeraldoLike.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n tryCru32-CLBrenerEsmeraldoLike.agp > tryCru32-CLBrenerEsmeraldoLike-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa tryCru32-CLBrenerEsmeraldoLike-sorted.agp tryCru32-CLBrenerEsmeraldoLike.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file and should not contain the character '.' (dot).<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track gap<br />
shortLabel Gap<br />
longLabel Gap Locations<br />
type bigBed 4 .<br />
bigDataUrl tracks/map/gap/output/gap.bb<br />
EOI<br />
<font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
<br />
# Gap Locations.<br />
include tracks/map/gap/trackDb.txt<br />
EOI<br />
<br />
<br />
=== Basic HTML page ===<br />
<br />
Let's compose a basic page to our organism of interest:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<p><br />
<i>Ebola</i> virus genome assembly and track hub.<br />
<ul><br />
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</a></li><br />
</ul><br />
</p><br />
<p><br />
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br><br />
</p></nowiki><br />
EOI<br />
<br />
For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html.<br />
<br />
= Other tips =<br />
<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=25844GBiB: From download to BLAT at assembly hubs2021-06-11T18:16:04Z<p>David da Silva Pires: /* Creating a basic genomes.txt file */ Adding a lot of fields (not so basic anymore ;-) and changing from eboVir3 to Trypanosoma cruzi.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 19.10 (Eoan Ermine). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo chmod o+x /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget --no-check-certificate <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/{final,input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/input<br />
<br />
* Download input data file: genome FASTA.<br />
<font color=green>browser@browserbox:$></font> wget 'https://tritrypdb.org/common/downloads/release-32/TcruziCLBrenerEsmeraldo-like/fasta/data/TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta'<br />
<br />
* Make a symbolic link for input data file (just to keep a pattern).<br />
<font color=green>browser@browserbox:$></font> ln -s TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta tryCru32-CLBrenerEsmeraldoLike.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/assemblyHub/hub.txt << EOI<br />
hub assemblyHub<br />
shortLabel Assembly Hub<br />
longLabel Assembly Hub for Trypanosoma cruzi<br />
genomesFile genomes.txt<br />
email admin@assemblyhub.edu<br />
descriptionUrl http://genome.assemblyhub.edu<br />
EOI<br />
<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/assemblyHub/genomes.txt << EOI<br />
genome tryCru32-CLBrenerEsmeraldoLike<br />
organism Trypanosoma cruzi CL Brener Esmeraldo-like<br />
scientificName Trypanosoma cruzi<br />
orderKey 1<br />
description TriTrypDB Release 32 (20 Apr 2017)<br />
defaultPos TcChr1-S:1-77,957<br />
twoBitPath tryCru32-CLBrenerEsmeraldoLike/genome/final/tryCru32-CLBrenerEsmeraldoLike.2bit<br />
htmlPath tryCru32-CLBrenerEsmeraldoLike/htmlPage/description.html<br />
groups tryCru32-CLBrenerEsmeraldoLike/groups.txt<br />
trackDb tryCru32-CLBrenerEsmeraldoLike/trackDb.txt<br />
blat 127.0.0.1 2302<br />
transBlat 127.0.0.1 2303<br />
EOI<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file and should not contain the character '.' (dot).<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track gap<br />
shortLabel Gap<br />
longLabel Gap Locations<br />
type bigBed 4 .<br />
bigDataUrl tracks/map/gap/output/gap.bb<br />
EOI<br />
<font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
<br />
# Gap Locations.<br />
include tracks/map/gap/trackDb.txt<br />
EOI<br />
<br />
<br />
=== Basic HTML page ===<br />
<br />
Let's compose a basic page to our organism of interest:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<p><br />
<i>Ebola</i> virus genome assembly and track hub.<br />
<ul><br />
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</a></li><br />
</ul><br />
</p><br />
<p><br />
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br><br />
</p></nowiki><br />
EOI<br />
<br />
For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html.<br />
<br />
= Other tips =<br />
<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=25842GBiB: From download to BLAT at assembly hubs2021-06-11T15:20:25Z<p>David da Silva Pires: /* Assembly hub configuration */ Just adjusting some spacing between sections.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 19.10 (Eoan Ermine). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo chmod o+x /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget --no-check-certificate <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/{final,input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/input<br />
<br />
* Download input data file: genome FASTA.<br />
<font color=green>browser@browserbox:$></font> wget 'https://tritrypdb.org/common/downloads/release-32/TcruziCLBrenerEsmeraldo-like/fasta/data/TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta'<br />
<br />
* Make a symbolic link for input data file (just to keep a pattern).<br />
<font color=green>browser@browserbox:$></font> ln -s TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta tryCru32-CLBrenerEsmeraldoLike.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/assemblyHub/hub.txt << EOI<br />
hub assemblyHub<br />
shortLabel Assembly Hub<br />
longLabel Assembly Hub for Trypanosoma cruzi<br />
genomesFile genomes.txt<br />
email admin@assemblyhub.edu<br />
descriptionUrl http://genome.assemblyhub.edu<br />
EOI<br />
<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file and should not contain the character '.' (dot).<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track gap<br />
shortLabel Gap<br />
longLabel Gap Locations<br />
type bigBed 4 .<br />
bigDataUrl tracks/map/gap/output/gap.bb<br />
EOI<br />
<font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
<br />
# Gap Locations.<br />
include tracks/map/gap/trackDb.txt<br />
EOI<br />
<br />
<br />
=== Basic HTML page ===<br />
<br />
Let's compose a basic page to our organism of interest:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<p><br />
<i>Ebola</i> virus genome assembly and track hub.<br />
<ul><br />
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</a></li><br />
</ul><br />
</p><br />
<p><br />
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br><br />
</p></nowiki><br />
EOI<br />
<br />
For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html.<br />
<br />
= Other tips =<br />
<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=25841GBiB: From download to BLAT at assembly hubs2021-06-11T15:15:17Z<p>David da Silva Pires: /* Creating a basic hub.txt file */ Changing from a virus to a generic assembly Hub.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 19.10 (Eoan Ermine). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo chmod o+x /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget --no-check-certificate <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/{final,input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/input<br />
<br />
* Download input data file: genome FASTA.<br />
<font color=green>browser@browserbox:$></font> wget 'https://tritrypdb.org/common/downloads/release-32/TcruziCLBrenerEsmeraldo-like/fasta/data/TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta'<br />
<br />
* Make a symbolic link for input data file (just to keep a pattern).<br />
<font color=green>browser@browserbox:$></font> ln -s TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta tryCru32-CLBrenerEsmeraldoLike.fasta<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/assemblyHub/hub.txt << EOI<br />
hub assemblyHub<br />
shortLabel Assembly Hub<br />
longLabel Assembly Hub for Trypanosoma cruzi<br />
genomesFile genomes.txt<br />
email admin@assemblyhub.edu<br />
descriptionUrl http://genome.assemblyhub.edu<br />
EOI<br />
<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file and should not contain the character '.' (dot).<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track gap<br />
shortLabel Gap<br />
longLabel Gap Locations<br />
type bigBed 4 .<br />
bigDataUrl tracks/map/gap/output/gap.bb<br />
EOI<br />
<font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
<br />
# Gap Locations.<br />
include tracks/map/gap/trackDb.txt<br />
EOI<br />
<br />
<br />
=== Basic HTML page ===<br />
<br />
Let's compose a basic page to our organism of interest:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<p><br />
<i>Ebola</i> virus genome assembly and track hub.<br />
<ul><br />
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</a></li><br />
</ul><br />
</p><br />
<p><br />
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br><br />
</p></nowiki><br />
EOI<br />
<br />
For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html.<br />
<br />
= Other tips =<br />
<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=25840GBiB: From download to BLAT at assembly hubs2021-06-11T15:04:34Z<p>David da Silva Pires: /* Downloading the raw data */ Changing from Ebola virus to Trypanosoma cruzi genome.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 19.10 (Eoan Ermine). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo chmod o+x /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget --no-check-certificate <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/{final,input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/input<br />
<br />
* Download input data file: genome FASTA.<br />
<font color=green>browser@browserbox:$></font> wget 'https://tritrypdb.org/common/downloads/release-32/TcruziCLBrenerEsmeraldo-like/fasta/data/TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta'<br />
<br />
* Make a symbolic link for input data file (just to keep a pattern).<br />
<font color=green>browser@browserbox:$></font> ln -s TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta tryCru32-CLBrenerEsmeraldoLike.fasta<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file and should not contain the character '.' (dot).<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track gap<br />
shortLabel Gap<br />
longLabel Gap Locations<br />
type bigBed 4 .<br />
bigDataUrl tracks/map/gap/output/gap.bb<br />
EOI<br />
<font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
<br />
# Gap Locations.<br />
include tracks/map/gap/trackDb.txt<br />
EOI<br />
<br />
<br />
=== Basic HTML page ===<br />
<br />
Let's compose a basic page to our organism of interest:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<p><br />
<i>Ebola</i> virus genome assembly and track hub.<br />
<ul><br />
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</a></li><br />
</ul><br />
</p><br />
<p><br />
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br><br />
</p></nowiki><br />
EOI<br />
<br />
For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html.<br />
<br />
= Other tips =<br />
<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=25520GBiB: From download to BLAT at assembly hubs2020-01-29T15:51:40Z<p>David da Silva Pires: /* GBiB installation */ Using a wget flag to pass through certification verification.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 19.10 (Eoan Ermine). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo chmod o+x /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget --no-check-certificate <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.soe.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file and should not contain the character '.' (dot).<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track gap<br />
shortLabel Gap<br />
longLabel Gap Locations<br />
type bigBed 4 .<br />
bigDataUrl tracks/map/gap/output/gap.bb<br />
EOI<br />
<font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
<br />
# Gap Locations.<br />
include tracks/map/gap/trackDb.txt<br />
EOI<br />
<br />
<br />
=== Basic HTML page ===<br />
<br />
Let's compose a basic page to our organism of interest:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<p><br />
<i>Ebola</i> virus genome assembly and track hub.<br />
<ul><br />
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</a></li><br />
</ul><br />
</p><br />
<p><br />
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br><br />
</p></nowiki><br />
EOI<br />
<br />
For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html.<br />
<br />
= Other tips =<br />
<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=25519GBiB: From download to BLAT at assembly hubs2020-01-29T15:49:18Z<p>David da Silva Pires: Adding command to set right permissions to access /usr/local/src/gbib</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 19.10 (Eoan Ermine). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo chmod o+x /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.soe.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file and should not contain the character '.' (dot).<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track gap<br />
shortLabel Gap<br />
longLabel Gap Locations<br />
type bigBed 4 .<br />
bigDataUrl tracks/map/gap/output/gap.bb<br />
EOI<br />
<font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
<br />
# Gap Locations.<br />
include tracks/map/gap/trackDb.txt<br />
EOI<br />
<br />
<br />
=== Basic HTML page ===<br />
<br />
Let's compose a basic page to our organism of interest:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<p><br />
<i>Ebola</i> virus genome assembly and track hub.<br />
<ul><br />
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</a></li><br />
</ul><br />
</p><br />
<p><br />
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br><br />
</p></nowiki><br />
EOI<br />
<br />
For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html.<br />
<br />
= Other tips =<br />
<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=25518GBiB: From download to BLAT at assembly hubs2020-01-27T19:22:24Z<p>David da Silva Pires: /* Introduction */ Updating the tested Linux distribution.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 19.10 (Eoan Ermine). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.soe.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file and should not contain the character '.' (dot).<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track gap<br />
shortLabel Gap<br />
longLabel Gap Locations<br />
type bigBed 4 .<br />
bigDataUrl tracks/map/gap/output/gap.bb<br />
EOI<br />
<font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
<br />
# Gap Locations.<br />
include tracks/map/gap/trackDb.txt<br />
EOI<br />
<br />
<br />
=== Basic HTML page ===<br />
<br />
Let's compose a basic page to our organism of interest:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<p><br />
<i>Ebola</i> virus genome assembly and track hub.<br />
<ul><br />
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</a></li><br />
</ul><br />
</p><br />
<p><br />
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br><br />
</p></nowiki><br />
EOI<br />
<br />
For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html.<br />
<br />
= Other tips =<br />
<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=25484GBiB: From download to BLAT at assembly hubs2019-12-02T13:37:12Z<p>David da Silva Pires: /* Introduction */ Updating the tested Linux distribution.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 19.04 (Disco Dingo). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.soe.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file and should not contain the character '.' (dot).<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track gap<br />
shortLabel Gap<br />
longLabel Gap Locations<br />
type bigBed 4 .<br />
bigDataUrl tracks/map/gap/output/gap.bb<br />
EOI<br />
<font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
<br />
# Gap Locations.<br />
include tracks/map/gap/trackDb.txt<br />
EOI<br />
<br />
<br />
=== Basic HTML page ===<br />
<br />
Let's compose a basic page to our organism of interest:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<p><br />
<i>Ebola</i> virus genome assembly and track hub.<br />
<ul><br />
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</a></li><br />
</ul><br />
</p><br />
<p><br />
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br><br />
</p></nowiki><br />
EOI<br />
<br />
For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html.<br />
<br />
= Other tips =<br />
<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=24986GBiB: From download to BLAT at assembly hubs2018-09-04T19:22:02Z<p>David da Silva Pires: /* Track hub configuration */ Including a warning about the presence of a '.' (dot) at a track name, which causes problems at mysql tables when configuring searchTrix.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 17.04 (Zesty). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.soe.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file and should not contain the character '.' (dot).<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track gap<br />
shortLabel Gap<br />
longLabel Gap Locations<br />
type bigBed 4 .<br />
bigDataUrl tracks/map/gap/output/gap.bb<br />
EOI<br />
<font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
<br />
# Gap Locations.<br />
include tracks/map/gap/trackDb.txt<br />
EOI<br />
<br />
<br />
=== Basic HTML page ===<br />
<br />
Let's compose a basic page to our organism of interest:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<p><br />
<i>Ebola</i> virus genome assembly and track hub.<br />
<ul><br />
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</a></li><br />
</ul><br />
</p><br />
<p><br />
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br><br />
</p></nowiki><br />
EOI<br />
<br />
For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html.<br />
<br />
= Other tips =<br />
<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=24184GBiB: From download to BLAT at assembly hubs2017-06-20T15:53:00Z<p>David da Silva Pires: /* Introduction */ Updating the Kubuntu version.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 17.04 (Zesty). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track gap<br />
shortLabel Gap<br />
longLabel Gap Locations<br />
type bigBed 4 .<br />
bigDataUrl tracks/map/gap/output/gap.bb<br />
EOI<br />
<font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
<br />
# Gap Locations.<br />
include tracks/map/gap/trackDb.txt<br />
EOI<br />
<br />
<br />
=== Basic HTML page ===<br />
<br />
Let's compose a basic page to our organism of interest:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<p><br />
<i>Ebola</i> virus genome assembly and track hub.<br />
<ul><br />
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</a></li><br />
</ul><br />
</p><br />
<p><br />
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br><br />
</p></nowiki><br />
EOI<br />
<br />
For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html.<br />
<br />
= Other tips =<br />
<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=24015GBiB: From download to BLAT at assembly hubs2017-04-22T22:02:39Z<p>David da Silva Pires: /* Basic HTML page */ Tip: closing point.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track gap<br />
shortLabel Gap<br />
longLabel Gap Locations<br />
type bigBed 4 .<br />
bigDataUrl tracks/map/gap/output/gap.bb<br />
EOI<br />
<font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
<br />
# Gap Locations.<br />
include tracks/map/gap/trackDb.txt<br />
EOI<br />
<br />
<br />
=== Basic HTML page ===<br />
<br />
Let's compose a basic page to our organism of interest:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<p><br />
<i>Ebola</i> virus genome assembly and track hub.<br />
<ul><br />
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</a></li><br />
</ul><br />
</p><br />
<p><br />
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br><br />
</p></nowiki><br />
EOI<br />
<br />
For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html.<br />
<br />
= Other tips =<br />
<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=24014GBiB: From download to BLAT at assembly hubs2017-04-22T22:01:03Z<p>David da Silva Pires: /* Gap track */ New subsection explaining how to create a basic HTML page for the description of the assembly hub.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track gap<br />
shortLabel Gap<br />
longLabel Gap Locations<br />
type bigBed 4 .<br />
bigDataUrl tracks/map/gap/output/gap.bb<br />
EOI<br />
<font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
<br />
# Gap Locations.<br />
include tracks/map/gap/trackDb.txt<br />
EOI<br />
<br />
<br />
=== Basic HTML page ===<br />
<br />
Let's compose a basic page to our organism of interest:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<p><br />
<i>Ebola</i> virus genome assembly and track hub.<br />
<ul><br />
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</a></li><br />
</ul><br />
</p><br />
<p><br />
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br><br />
</p></nowiki><br />
EOI<br />
<br />
For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
<br />
= Other tips =<br />
<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23990GBiB: From download to BLAT at assembly hubs2017-04-20T10:18:00Z<p>David da Silva Pires: /* Gap track */ Writing the trackDb.txt files.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track gap<br />
shortLabel Gap<br />
longLabel Gap Locations<br />
type bigBed 4 .<br />
bigDataUrl tracks/map/gap/output/gap.bb<br />
EOI<br />
<font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
<br />
# Gap Locations.<br />
include tracks/map/gap/trackDb.txt<br />
EOI<br />
<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23954GBiB: From download to BLAT at assembly hubs2017-04-10T14:03:51Z<p>David da Silva Pires: /* Gap track */ Adapting the gap track input and output files to the new location of AGP file.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23953GBiB: From download to BLAT at assembly hubs2017-04-10T13:50:18Z<p>David da Silva Pires: /* Track hub configuration */ Fixing the path for the sorted AGP file.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23952GBiB: From download to BLAT at assembly hubs2017-04-10T13:37:59Z<p>David da Silva Pires: Since the AGP file is used to construct both assembly and gap tracks, it is better to keep it in just one directory. I chose the genome directory.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fasta eboVir3.agp<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23951GBiB: From download to BLAT at assembly hubs2017-04-10T13:03:59Z<p>David da Silva Pires: /* Additional configuration */ Initial configuration of the gap track.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n output/eboVir3.agp > output/eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.2bit input/<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa output/eboVir3-sorted.agp input/eboVir3.2bit<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
=== Gap track ===<br />
<br />
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23950GBiB: From download to BLAT at assembly hubs2017-04-10T12:44:37Z<p>David da Silva Pires: /* Additional configuration */ Removing data refering an incomplete description of a BIGBED track.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n output/eboVir3.agp > output/eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.2bit input/<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa output/eboVir3-sorted.agp input/eboVir3.2bit<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23949GBiB: From download to BLAT at assembly hubs2017-04-10T12:37:42Z<p>David da Silva Pires: Making a double check with hubCheck: file and URL.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n output/eboVir3.agp > output/eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.2bit input/<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa output/eboVir3-sorted.agp input/eboVir3.2bit<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Double check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23948GBiB: From download to BLAT at assembly hubs2017-04-10T12:17:34Z<p>David da Silva Pires: Removing a call for the hubCheck command and placing the explanation in a better place.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n output/eboVir3.agp > output/eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.2bit input/<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa output/eboVir3-sorted.agp input/eboVir3.2bit<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
* Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23947GBiB: From download to BLAT at assembly hubs2017-04-10T12:01:20Z<p>David da Silva Pires: Dealing with genomes that have chromosomes names too long.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ eboVir3-uppercase.fasta > eboVir3-uppercase-shortChromNames.fasta<br />
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase-shortChromNames.fasta eboVir3.fasta<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n output/eboVir3.agp > output/eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.2bit input/<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa output/eboVir3-sorted.agp input/eboVir3.2bit<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23946GBiB: From download to BLAT at assembly hubs2017-04-10T10:23:28Z<p>David da Silva Pires: Dealing with genomes that have all nucleotides as lowecase letters.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* Get the .2bit file from the FASTA file and make a symbolic link just to shorten the name:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3-uppercase.fasta eboVir3-uppercase.2bit<br />
<font color=green>browser@browserbox:$></font> ln -s eboVir3-uppercase.2bit eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n output/eboVir3.agp > output/eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.2bit input/<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa output/eboVir3-sorted.agp input/eboVir3.2bit<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23942GBiB: From download to BLAT at assembly hubs2017-04-07T16:53:33Z<p>David da Silva Pires: /* Track hub configuration */ Including the assembly track configuration in the main trackDb.txt.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* Get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n output/eboVir3.agp > output/eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.2bit input/<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa output/eboVir3-sorted.agp input/eboVir3.2bit<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
* Edit the main trackDb.txt file to include the assembly track configuration.<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
#==========================================<br />
# MAPPING AND SEQUENCING.<br />
<br />
# Assembly.<br />
include tracks/map/assembly/trackDb.txt<br />
EOI<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23941GBiB: From download to BLAT at assembly hubs2017-04-07T16:46:00Z<p>David da Silva Pires: Checking AGP and FASTA file in the right place.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
* Get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n output/eboVir3.agp > output/eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.2bit input/<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa output/eboVir3-sorted.agp input/eboVir3.2bit<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3-sorted.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23940GBiB: From download to BLAT at assembly hubs2017-04-07T16:28:32Z<p>David da Silva Pires: /* Track hub configuration */ Fixing the path after organizing the assembly track inside the group map.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23939GBiB: From download to BLAT at assembly hubs2017-04-07T13:52:10Z<p>David da Silva Pires: /* Track hub configuration */ Changing the order of presentation, creating all the directories in just one command, separating the assembly track at map grouping and making use of input and output.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/{input,output}<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/assembly/<br />
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/map/assembly/output/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23938GBiB: From download to BLAT at assembly hubs2017-04-07T13:34:59Z<p>David da Silva Pires: /* GBiB installation */ Typo.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
The first track that we will configure is the assembly track, which shows a block with information different from 'N'. In other words, the gaps are the holes in the track. Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> mkdir input/ output/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23937GBiB: From download to BLAT at assembly hubs2017-04-06T17:48:48Z<p>David da Silva Pires: /* Blat configuration */ Fixing the working directory and warning about the capital 'B' in "transBlat".</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
The first track that we will configure is the assembly track, which shows a block with information different from 'N'. In other words, the gaps are the holes in the track. Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> mkdir input/ output/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> mkdir ../log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23936GBiB: From download to BLAT at assembly hubs2017-04-06T17:06:11Z<p>David da Silva Pires: /* Blat configuration */ Creating a log directory and fixing gfServer commands.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
The first track that we will configure is the assembly track, which shows a block with information different from 'N'. In other words, the gaps are the holes in the track. Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> mkdir input/ output/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/<br />
<font color=green>browser@browserbox:$></font> mkdir log/<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=log/gfServer.log genome/eboVir3.2bit &<br />
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=log/gfServer-trans.log genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23933GBiB: From download to BLAT at assembly hubs2017-04-06T13:07:31Z<p>David da Silva Pires: /* Track hub configuration */ A better bedToBigBed command.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
The first track that we will configure is the assembly track, which shows a block with information different from 'N'. In other words, the gaps are the holes in the track. Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> mkdir input/ output/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/eboVir3-chromSizes-sorted.txt output/eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23932GBiB: From download to BLAT at assembly hubs2017-04-06T13:05:08Z<p>David da Silva Pires: /* Track hub configuration */ Fixing the filename.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
The first track that we will configure is the assembly track, which shows a block with information different from 'N'. In other words, the gaps are the holes in the track. Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> mkdir input/ output/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3-chromSizes-sorted.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23931GBiB: From download to BLAT at assembly hubs2017-04-06T13:03:56Z<p>David da Silva Pires: /* Track hub configuration */ Making a symbolic link for the eboVir3-chromSizes.txt file.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
The first track that we will configure is the assembly track, which shows a block with information different from 'N'. In other words, the gaps are the holes in the track. Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> mkdir input/ output/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3-chromSizes.txt input/<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23930GBiB: From download to BLAT at assembly hubs2017-04-06T12:58:34Z<p>David da Silva Pires: /* Track hub configuration */ Now eboVir3.agp file is in the output/ directory.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
The first track that we will configure is the assembly track, which shows a block with information different from 'N'. In other words, the gaps are the holes in the track. Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> mkdir input/ output/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" output/eboVir3.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23929GBiB: From download to BLAT at assembly hubs2017-04-06T12:57:38Z<p>David da Silva Pires: /* Track hub configuration */ The assembly.bed file should also go to the output/ directory.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
The first track that we will configure is the assembly track, which shows a block with information different from 'N'. In other words, the gaps are the holes in the track. Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> mkdir input/ output/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23928GBiB: From download to BLAT at assembly hubs2017-04-06T12:55:21Z<p>David da Silva Pires: /* Track hub configuration */ Organizing the files in subdirectories.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
The first track that we will configure is the assembly track, which shows a block with information different from 'N'. In other words, the gaps are the holes in the track. Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> mkdir input/ output/<br />
<font color=green>browser@browserbox:$></font> ln -s ../../../genome/eboVir3.fasta input/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 input/eboVir3.fasta output/eboVir3.agp<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23927GBiB: From download to BLAT at assembly hubs2017-04-06T12:49:43Z<p>David da Silva Pires: /* Preparing the data */ Simplifying the build of the chromSizes-sorted file.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo eboVir3.2bit stdout | sort -k2nr > eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
The first track that we will configure is the assembly track, which shows a block with information different from 'N'. In other words, the gaps are the holes in the track. Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 ../../genome/eboVir3.fasta eboVir3.agp<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23926GBiB: From download to BLAT at assembly hubs2017-04-06T12:43:47Z<p>David da Silva Pires: /* Preparing the data */ Simplifying the build of the .2bit file.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from the FASTA file:<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome/<br />
<font color=green>browser@browserbox:$></font> faToTwoBit eboVir3.fasta eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit stdout | sort -k2nr > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
The first track that we will configure is the assembly track, which shows a block with information different from 'N'. In other words, the gaps are the holes in the track. Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 ../../genome/eboVir3.fasta eboVir3.agp<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23925GBiB: From download to BLAT at assembly hubs2017-04-06T12:11:48Z<p>David da Silva Pires: Removing one pipe since awk can do the comparison and the printing in just one command. Simplifying the output filename.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from this fasta:<br />
<font color=green>browser@browserbox:$></font> faToTwoBit /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit stdout | sort -k2nr > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
The first track that we will configure is the assembly track, which shows a block with information different from 'N'. In other words, the gaps are the holes in the track. Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 ../../genome/eboVir3.fasta eboVir3.agp<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23924GBiB: From download to BLAT at assembly hubs2017-04-06T11:59:32Z<p>David da Silva Pires: Building the AGP file only when needed: at assembly track configuration.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from this fasta:<br />
<font color=green>browser@browserbox:$></font> faToTwoBit /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit stdout | sort -k2nr > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
The first track that we will configure is the assembly track, which shows a block with information different from 'N'. In other words, the gaps are the holes in the track. Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/assembly/<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 ../../genome/eboVir3.fasta eboVir3.agp<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 != "N"' | awk '{printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > eboVir3-assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=23923GBiB: From download to BLAT at assembly hubs2017-04-06T11:41:18Z<p>David da Silva Pires: Changing to a better order the lines in the initial genomes.txt</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
trackDb eboVir3/trackDb.txt<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
We can also build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fa eboVir3.agp<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from this fasta:<br />
<font color=green>browser@browserbox:$></font> faToTwoBit /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit stdout | sort -k2nr > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 != "N"' | awk '{printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > eboVir3-assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=22810GBiB: From download to BLAT at assembly hubs2015-07-21T12:02:01Z<p>David da Silva Pires: Setting locale environment variables in order to provide a case-sensitive sort.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
trackDb eboVir3/trackDb.txt<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
We can also build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fa eboVir3.agp<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from this fasta:<br />
<font color=green>browser@browserbox:$></font> faToTwoBit /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit stdout | sort -k2nr > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 != "N"' | awk '{printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > eboVir3-assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
=== Setting locale ===<br />
<br />
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".<br />
<br />
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:<br />
<br />
# Define custom locale settings.<br />
export LANG="C" <br />
export LANGUAGE="C" <br />
export LC_MESSAGES="C" <br />
export LC_CTYPE="C" <br />
export LC_NUMERIC="C" <br />
export LC_TIME="C" <br />
export LC_COLLATE="C" <br />
export LC_MONETARY="C" <br />
export LC_PAPER="C" <br />
export LC_NAME="C" <br />
export LC_ADDRESS="C" <br />
export LC_TELEPHONE="C" <br />
export LC_MEASUREMENT="C" <br />
export LC_IDENTIFICATION="C" <br />
export LC_ALL="C"<br />
<br />
After that, load .bashrc again by doing:<br />
<br />
$> . ~browser/.bashrc<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=22739GBiB: From download to BLAT at assembly hubs2015-06-04T01:03:01Z<p>David da Silva Pires: /* Additional configuration */ Incrementing the HTML description of a track.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
trackDb eboVir3/trackDb.txt<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
We can also build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fa eboVir3.agp<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from this fasta:<br />
<font color=green>browser@browserbox:$></font> faToTwoBit /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit stdout | sort -k2nr > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 != "N"' | awk '{printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > eboVir3-assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification (Validation)</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify (validate) the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
Other interesting information: background information, display conventions, and acknowledgments.<br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=22738GBiB: From download to BLAT at assembly hubs2015-06-04T00:42:33Z<p>David da Silva Pires: More additional configuration to genomes.txt.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
trackDb eboVir3/trackDb.txt<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
We can also build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fa eboVir3.agp<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from this fasta:<br />
<font color=green>browser@browserbox:$></font> faToTwoBit /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit stdout | sort -k2nr > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 != "N"' | awk '{printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > eboVir3-assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/track-description.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups eboVir3/groups.txt<br />
description Ebola virus version 3<br />
orderKey 1<br />
htmlPath eboVir3/description<br />
scientificName Ebola<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=22737GBiB: From download to BLAT at assembly hubs2015-06-04T00:30:59Z<p>David da Silva Pires: /* Additional configuration */ Additional configuration to genomes.txt.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
trackDb eboVir3/trackDb.txt<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
We can also build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fa eboVir3.agp<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from this fasta:<br />
<font color=green>browser@browserbox:$></font> faToTwoBit /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit stdout | sort -k2nr > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 != "N"' | awk '{printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > eboVir3-assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/track-description.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
Additional configuration to genomes.txt:<br />
groups groups.txt<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=22736GBiB: From download to BLAT at assembly hubs2015-06-04T00:20:29Z<p>David da Silva Pires: /* Additional configuration */ Additional configuration to hub.txt.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
trackDb eboVir3/trackDb.txt<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
We can also build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fa eboVir3.agp<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from this fasta:<br />
<font color=green>browser@browserbox:$></font> faToTwoBit /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit stdout | sort -k2nr > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 != "N"' | awk '{printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > eboVir3-assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/track-description.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
Additional configuration to hub.txt:<br />
descriptionUrl description<br />
<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=22733GBiB: From download to BLAT at assembly hubs2015-06-03T13:42:15Z<p>David da Silva Pires: /* Additional configuration */ Adding an assembly hub through its URL.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
trackDb eboVir3/trackDb.txt<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
We can also build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fa eboVir3.agp<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from this fasta:<br />
<font color=green>browser@browserbox:$></font> faToTwoBit /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit stdout | sort -k2nr > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 != "N"' | awk '{printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > eboVir3-assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/track-description.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).<br />
<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=22732GBiB: From download to BLAT at assembly hubs2015-06-03T11:14:28Z<p>David da Silva Pires: Moving "Custom track configuration" to "Additional configuration".</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
trackDb eboVir3/trackDb.txt<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
We can also build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fa eboVir3.agp<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from this fasta:<br />
<font color=green>browser@browserbox:$></font> faToTwoBit /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit stdout | sort -k2nr > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 != "N"' | awk '{printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > eboVir3-assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/track-description.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
<br />
=== Custom track configuration ===<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pireshttps://genomewiki.ucsc.edu/index.php?title=GBiB:_From_download_to_BLAT_at_assembly_hubs&diff=22731GBiB: From download to BLAT at assembly hubs2015-06-03T10:46:06Z<p>David da Silva Pires: /* Additional configuration */ Moving some additional configuration to Blat to the right place.</p>
<hr />
<div>== Introduction ==<br />
<br />
Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:<br />
<br />
* It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.<br />
* It is much easier to install, configure and maintain when compared with a full mirror of [http://genome.ucsc.edu UCSC Genome Browser web site].<br />
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.<br />
<br />
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.<br />
<br />
<br />
== GBiB installation ==<br />
<br />
* Create a folder at your machine to place the installation files:<br />
<font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib<br />
* Download GBiB from UCSC Genome Browser virtual store:<br />
** Go to the [http://genome-store.ucsc.edu Genome Store].<br />
** Click in "Login / Register".<br />
** Check if you agree with the terms and conditions at the box relative to GBiB.<br />
** Check if your hardware and software meet the basic requirements.<br />
** Click in "Add to cart".<br />
** Click in "Cart (1)" on menu.<br />
** Click in "Proceed to checkout".<br />
** Click in "My products" on menu.<br />
** Copy the address of download (let's call it <download_link>).<br />
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.<br />
<font color=blue>user@host:$></font> cd /usr/local/src/gbib<br />
<font color=red>user@host:$></font> sudo wget <download_link><br />
<font color=red>user@host:$></font> sudo unzip gbib.zip<br />
<font color=red>user@host:$></font> sudo rm gbib.zip<br />
* Give user sufficient access to the three uncompressed files and to the folder:<br />
<font color=red>user@host:$></font> sudo chmod o+rw /usr/local/src/gbib/*<br />
<font color=red>user@host:$></font> sudo chmod o+w /usr/local/src/gbib<br />
* Install VirtualBox and start it in background:<br />
<font color=red>user@host:$></font> sudo apt-get install virtualbox<br />
<font color=blue>user@host:$></font> virtualbox &<br />
* Add GBiB to VirtualBox and boot it for the first time:<br />
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start<br />
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).<br />
** Close GBiB terminal window.<br />
** Select "Send the shutdown signal".<br />
** Confirm by clicking "OK".<br />
<br />
<br />
== GBiB configuration ==<br />
<br />
* Click at "Settings".<br />
** General ---> Description: Ebola virus genome assembly and track hubs.<br />
** System ---> Motherboard ---> Base Memory: 4.096 MB.<br />
** System ---> Processor ---> Processor(s): 2.<br />
** Display ---> Video ---> Video Memory: 32 MB.<br />
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.<br />
* Boot GBiB virtual machine:<br />
** Select "browserbox" on menu at left.<br />
** Click at "Start".<br />
* Test if everything is working at the following URLs:<br />
** [http://127.0.0.1:1234 http://127.0.0.1:1234]<br />
** [http://127.0.0.1:1234/folders http://127.0.0.1:1234/folders]<br />
* Login using ssh, for a faster access.<br />
** Open a terminal, like "konsole".<br />
** Password: browser<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
* Install tools that allows file manipulations:<br />
<font color=green>browser@browserbox:$></font> gbibAddTools<br />
* Turn off every kind of automatic update:<br />
<font color=green>browser@browserbox:$></font> gbibAutoUpdateOff<br />
* Do not allow users to mirror tracks:<br />
<font color=green>browser@browserbox:$></font> gbibMirrorTracksOff<br />
* Turn on the offline mode:<br />
<font color=green>browser@browserbox:$></font> gbibOffline<br />
* Reboot the virtual machine<br />
<font color=brown>browser@browserbox:$></font> sudo shutdown -r now<br />
<br />
<br />
== Assembly hub configuration ==<br />
<br />
* Log in again using ssh:<br />
<font color=blue>user@host:$></font> ssh browser@localhost -p 1235<br />
<br />
<br />
=== Downloading the raw data ===<br />
<br />
* Create the directories that will store the assembly hub configuration files:<br />
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome<br />
<font color=green>browser@browserbox:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .<br />
<font color=green>browser@browserbox:$></font> gunzip KM034562v1.fa.gz<br />
<font color=green>browser@browserbox:$></font> ln -s KM034562v1.fa eboVir3.fasta<br />
<br />
<br />
=== Creating a basic hub.txt file ===<br />
<br />
<br />
* Fill the contents of hub.txt file:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/hub.txt << EOI<br />
hub virusNetwork<br />
email admin@virus.edu<br />
shortLabel Virus Network<br />
longLabel Virus Network Hub for Ebola virus<br />
genomesFile genomes.txt<br />
<br />
EOI<br />
* The following rules must be obeyed:<br />
** hub: name without spaces.<br />
** shortLabel: limited to 17 characters.<br />
** longLabel: limited to 80 characters.<br />
<br />
Check the integrity of your hub with the command hubCheck:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
<br />
<br />
=== Creating a basic genomes.txt file ===<br />
<br />
<br />
* Fill the contents of genomes.txt:<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/genomes.txt << EOI<br />
genome eboVir3<br />
trackDb eboVir3/trackDb.txt<br />
twoBitPath eboVir3/genome/eboVir3.2bit<br />
organism Ebola virus<br />
defaultPos KM034562v1:1-18,957<br />
<br />
EOI<br />
<br />
<br />
=== Preparing the data ===<br />
<br />
<br />
We can also build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.<br />
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fa eboVir3.agp<br />
Check if the new AGP file matches the fasta file:<br />
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp<br />
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit<br />
* Get the .2bit file from this fasta:<br />
<font color=green>browser@browserbox:$></font> faToTwoBit /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit<br />
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:<br />
<font color=green>browser@browserbox:$></font> twoBitInfo /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit stdout | sort -k2nr > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-chromSizes-sorted.txt<br />
<br />
== Track hub configuration ==<br />
<br />
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):<br />
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI<br />
track assembly<br />
shortLabel Assembly<br />
longLabel Assembly<br />
type bigBed 6<br />
bigDataUrl tracks/assembly/assembly.bb<br />
<br />
EOI<br />
The name of each track ("track" field) must be unique at the entire file.<br />
<br />
* Construction of the assembly track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 != "N"' | awk '{printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > eboVir3-assembly.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb<br />
<br />
== Blat configuration ==<br />
<br />
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42420 -stepSize=5 -log=/var/log/gfServer-eboVir3.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
$> <font color=green>browser@browserbox:$></font> gfServer start 127.0.0.1 42421 -trans -log=/var/log/gfServer-eboVir3-trans.log /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.2bit &<br />
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:<br />
blat 127.0.0.1 42420<br />
transBlat 127.0.0.1 42421<br />
<br />
== Custom track configuration ==<br />
<br />
browser position chr22:20,100,000-20,100,900<br />
browser hide all<br />
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb<br />
* The following rules must be obeyed:<br />
** name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.<br />
** description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.<br />
** visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.<br />
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.<br />
<br />
<br />
== GBiB maintenance ==<br />
<br />
* Make an update of all softwares and data:<br />
$> gbibOnline<br />
$> gbibAutoUpdateOn<br />
$> updateBrowser<br />
$> gbibAddTools<br />
$> gbibAutoUpdateOff<br />
$> gbibOffline<br />
<br />
== Additional configuration ==<br />
<br />
<br />
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:<br />
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/virusNetwork/eboVir3/genome/eboVir3.fasta<br />
* If the names of the chromosomes are very long, we need to make them shorter:<br />
<font color=green>browser@browserbox:$></font> sed s/Ebola_virus/Ev/ /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3.fasta > /folders/sf_work/virusNetwork/eboVir3/genome/eboVir3-shortChromNames.fasta<br />
* Check if everything is OK with the hub:<br />
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/virusNetwork/hub.txt<br />
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.<br />
* Check again if everything is OK with the hub:<br />
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki><br />
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.<br />
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3<br />
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3<br />
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb<br />
* Construction of the gap track directly from the AGP file:<br />
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed<br />
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb<br />
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:<br />
<br />
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"<br />
<br />
Let's compose a basic page to our organism of interest:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<P><br />
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR><br />
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html<br />
</P></nowiki><br />
EOI<br />
<br />
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):<br />
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.<br />
** To disable this feature, click at "clear" on the message that appears at the top of the page.<br />
* Create the HTML page description for the hub:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki><br />
<HEAD><TITLE>Virus Network Hub</TITLE><br />
<BODY><br />
<P><br />
Ebola virus genome assembly and track hub.<br />
<UL><br />
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank"><br />
NCBI genome/4887 (Ebola virus)</A></LI><br />
</UL><br />
</P><br />
<BODY></HTML></nowiki><br />
<br />
EOI<br />
* Include an image of the organism.<br />
* The same substitution have to be done at the bed file of the track:<br />
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed<br />
* The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:<br />
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed<br />
* Convert from bed to bigBed:<br />
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb<br />
* Construction of the GC content track:<br />
<font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \<br />
eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz<br />
<font color=green>browser@browserbox:$></font> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw<br />
* Contents of groups.txt:<br />
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI<br />
name user<br />
label Custom<br />
priority 1<br />
defaultIsClosed 1<br />
<br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
<br />
name genes<br />
label Genes<br />
priority 3<br />
defaultIsClosed 0<br />
<br />
name mrna<br />
label mRNA<br />
priority 4<br />
defaultIsClosed 1<br />
<br />
name regulation<br />
label Regulation<br />
priority 5<br />
defaultIsClosed 1<br />
<br />
name comparative<br />
label Comparative<br />
priority 6<br />
defaultIsClosed 1<br />
<br />
name varRep<br />
label Variation<br />
priority 7<br />
defaultIsClosed 0<br />
<br />
name x<br />
label Experimental<br />
priority 8<br />
defaultIsClosed 1<br />
<br />
EOI<br />
* Let's compose an HTML page to our track:<br />
$> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/track-description.html << EOI<nowiki><br />
<H2>Description</H2><br />
<P><br />
Replace this text with a summary describing the<br />
concepts or analysis represented by your data.<br />
<br />
<H2>Methods</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to generate and analyze the data.<br />
<br />
<H2>Verification</H2><br />
<P><br />
Replace this text with a description of the methods<br />
used to verify the data.<br />
<br />
<H2>Credits</H2><br />
<P><br />
Replace this text with a list of the individuals <br />
and/or organizations who contributed to the collection<br />
and analysis of the data.<br />
<br />
<H2>References</H2><br />
<P><br />
Replace this text with a list of relevant literature<br />
references and/or websites that provide background<br />
or supporting information about the data.</nowiki><br />
<br />
EOI<br />
<br />
visibility full<br />
html schMan2-description<br />
boxedCfg on<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
dataVersion Dec. 2011 <em>Sanger 5.2</em><br />
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s<br />
iframeUrl https://www.google.com.br/search?q=$$<br />
iframeOptions height='400' width='640' scrolling='yes'<br />
priority 100<br />
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$<br />
urlLabel NCBI Details:<br />
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"<br />
<br />
track roche454-blat<br />
bigDataUrl roche454-blat.bb<br />
shortLabel Roche 454 Trinity<br />
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat<br />
type bigBed 12<br />
searchIndex name<br />
visibility full<br />
color 64,0,96<br />
altColor 64,32,128<br />
<br />
track assembly<br />
longLabel Assembly<br />
shortLabel Assembly<br />
priority 10<br />
visibility pack<br />
colorByStrand 150,100,30 230,170,40<br />
color 150,100,30<br />
altColor 230,170,40<br />
bigDataUrl eboVir3-assembly.bb<br />
type bigBed 6<br />
html trackDescriptions/assembly<br />
url http://www.ncbi.nlm.nih.gov/nuccore/$$<br />
urlLabel NCBI Nucleotide database<br />
group map<br />
<br />
track gap<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl eboVir3-gap.bb<br />
type bigBed 4<br />
group map<br />
html trackDescriptions/gap<br />
<br />
track gc5Base<br />
shortLabel GC Percent<br />
longLabel GC Percent in 5-Base Windows<br />
group map<br />
priority 23.5<br />
visibility full<br />
autoScale Off<br />
maxHeightPixels 128:36:16<br />
graphTypeDefault Bar<br />
gridDefault OFF<br />
windowingFunction Mean<br />
color 0,0,0<br />
altColor 128,128,128<br />
viewLimits 30:70<br />
type bigWig 0 100<br />
bigDataUrl eboVir3-gc5Base.bw<br />
html trackDescriptions/gc5Base<br />
<br />
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.<br />
<br />
EOI<br />
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":<br />
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &<br />
* Add this commands to cron, writing them just before the "exit" command at last line:<br />
$> sudo su -<br />
$> vim /etc/rc.local<br />
@vim $><br />
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.<br />
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &<br />
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &<br />
<br />
== References ==<br />
See also:<br />
* [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hubs User Guide]<br />
* [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hub Wiki]<br />
* [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html Track Database Definitions]<br />
* [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib Starting a Blat enabled Assembly Hub on GBiB]<br />
* Quick Start Guides to [http://genome.ucsc.edu/goldenPath/help/hubQuickStart.html Basic Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartGroups.html Organizing Hubs] and [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Assembly Hubs]<br />
[[Category:GBiB]]</div>David da Silva Pires