ABRF2010 Tutorial
Tutorial: Viewing sequencing data in the UCSC Genome Browser
Presented March 20, 2010 as part of the workshop Tools to Facilitate Management, Analysis and Visualization of 2nd Generation Sequencing Data at ABRF 2010.
This page serves as a step-by-step guide to the demo of UCSC Genome Browser tools presented at the workshop.
Genome Graphs with .sgr data
Some of the result files from the ChIPSeq toolset that Charlie Nicolet showed us earlier are in the .sgr format, i.e. tab-separated text that specifies chromosome, position and a numeric signal value. This format can be uploaded to UCSC's Genome Graphs tool for genome-at-a-glance view of the data.
- genome.ucsc.edu
- 7th link down on the left
- Chromosome picture -- no data yet
- upload: paste this URL: ...
- should see confirmation screen, click OK.
- Genome Graphs doesn't show the data until you select it. (session link)
- Now the data appear for chr21 only. Click on a peak.
- this takes you to a 1,000,000-base region of the genome corresponding to the location you clicked. (session link)
- Note the track at the top labeled "redbin 1" -- that is the data shown in Genome Graphs. Peaks correspond to upstream regions of genes, which makes sense for RNA polymerase II ChIP-seq.
Transforming .sgr into bedGraph for Genome Browser viewing
The Genome Browser can draw a somewhat nicer display of .sgr data, if we do a simple transform of the .sgr data into a similar format called bedGraph. bedGraph has both start and end genomic coordinates, so scores can be drawn over regions as opposed to single bases. Here is a command that translates .sgr to bedGraph, expanding each data point to cover a 30-base region:
perl -wpe '@w=split("\t"); $_ = "$w[0]\t" . ($w[1]-1) . "\t" . ($w[1]+29) . "\t$w[2]";' \ chr21_extended.txt_redbin.sgr > redbinGt10__.bed
If you would like to show only datapoints above a certain threshold, say 5, you can make the command slightly more complicated:
perl -wpe '@w=split("\t"); $_ = "$w[0]\t" . ($w[1]-1) . "\t" . ($w[1]+29) . "\t$w[2]"; \ if ($w[2] < 5) {$_ = "";}' \ chr21_extended.txt_redbin.sgr > redbinGt5.bed
This line can be added to the beginning of redbinGt5.bed, in order to add labels to the track display (without the line break used for readability here):
track name=redbinGt5 description="redbin items with score >= 5, expanded to 30 bases" \ type=bedGraph autoScale=off add settings: height 16, vertical range 0-100
You can download the file here. Right-click the link and choose "Copy link location".
- In the Genome Browser, click the "manage custom tracks" button below the image.
- paste the link that you just copied (plaintext url) into the first box on the form and click submit.
- You should see a confirmation page like this (photo) (session link)
- Click on "genome browser" in the top blue bar (does "Go to Genome Browser" do the right thing?)
- Now the new custom track shows grayscale representations of the score. You can click on the track title ("redbin items with score > 5") to expand the display to bar graph.
zoom in?
Comparing with UCSC-hosted data
Now we will take a look at some ChIP-seq data from the ENCODE project.
- In the Genome Browser, scroll down below the image to the "Regulation" section. Click "+" to expand.
- Click on the link titled "Yale TFBS". This will take you to the track controls.
- In the large table under "Select subtracks by cell line and factor",
- Hit the "-" button in upper left corner to deselect all subtracks. Note that HeLa-S3 cell line is in 3rd column of checkboxes.
- Scroll down to factor Pol2. select the 3rd check box. Only HeLa/Pol2 is selected for display.
- Scroll back up to the top of the page. Change visibility to "pack" and click submit.
- In the Genome Browser, HeLa/Pol2 peaks and signal appear below "Spliced ESTs". session link
Viewing alignments in BAM format
- samtools conversion -- provide link to bam
- view ~1kB of bam with alignment qualities in grayscale
- base-level view of bam, w/base qualities in grayscale