ABRF2010 Tutorial: Difference between revisions

From genomewiki
Jump to navigationJump to search
(→‎Viewing alignments in BAM format: <nowiki> for fyi-URL; bold for something they should copy-paste.)
Line 62: Line 62:
# In the Genome Browser, HeLa/Pol2 peaks and signal appear below "Spliced ESTs". ''session link''
# In the Genome Browser, HeLa/Pol2 peaks and signal appear below "Spliced ESTs". ''session link''


== Viewing alignments in BAM format ==
=== Viewing alignments in BAM format ===


* samtools conversion -- provide link to bam
Charlie Nicolet has shared the ELAND export file that accompanies chr21_extended.txt, [http://genome-test.cse.ucsc.edu/ABRF2010/chr21_export.txt chr21_export.txt].  This can be transformed into [http://samtools.sourceforge.net/ BAM] for viewing in the UCSC Genome Browser.  The [http://samtools.sourceforge.net/ samtools] package includes a script that converts ELAND export files into SAM (the text version of BAM).  Export files often have fasta names instead of sequence names, e.g. "chr21.fa" instead of "chr21", so the ".fa" must be removed:


     export2sam.pl --read1=chr21_export.txt \
     export2sam.pl --read1=chr21_export.txt \
     | sed -re 's/(chr.*)\.fa/\1/' \
     | perl -wpe 's/(chr.*)\.fa/$1/' \
       > chr21.sam
       > chr21.sam


Add these header lines to the beginning of the file (note: the words must be separated by tab characters, not spaces):
Edit chr21.sam to add these header lines at the beginning (note: the words must be separated by tab characters, not spaces):


  @HD VN:1.0 GO:none SO:coordinate
  @HD VN:1.0 GO:none SO:coordinate
  @SQ SN:chr21 LN:46944323
  @SQ SN:chr21 LN:46944323


Use samtools to convert SAM to BAM, sort the BAM, and index the sorted BAM:
Use samtools to convert SAM to BAM, sort the BAM, and index the sorted BAM (see http://genome.ucsc.edu/goldenPath/help/bam.html for more info):


     samtools view -bSo chr21.unsorted.bam chr21.sam
     samtools view -bSo chr21.unsorted.bam chr21.sam
Line 81: Line 81:
     samtools index chr21.bam
     samtools index chr21.bam


Now the sorted chr21.bam and its index file chr21.bam.bai must be put on a web server (ftp, http, or https).  For this example, we will use <nowiki>http://genome-test.cse.ucsc.edu/ABRF2010/chr21.bam</nowiki> .
Now the sorted chr21.bam and its index file chr21.bam.bai must be put on a web server (ftp, http, or https).  For example, if you create files my.bam and my.bam.bai, and you have a personal web directory <nowiki>http://my.edu/mylab/~me/</nowiki>, you can copy or link the files to <nowiki>http://my.edu/mylab/~me/my.bam</nowiki> and <nowiki>http://mylab/~me/my.bam.bai</nowiki>.  Then you would use only the first URL (<nowiki>http://my.edu/mylab/~me/my.bam</nowiki>) in a track definition line like the one below.  For this example, we will use <nowiki>http://genome-test.cse.ucsc.edu/ABRF2010/chr21.bam</nowiki> .
 
====Steps====


# In the Genome Browser, click the "manage custom tracks" buttom below the image.   
# In the Genome Browser, click the "manage custom tracks" buttom below the image.   
# Click the "add custom tracks" button.   
# Click the "add custom tracks" button.   
# Paste this text into the box:
# Paste this text into the box:
  '''track name=redbin type=bam bigDataUrl=<nowiki>http://genome-test.cse.ucsc.edu/ABRF2010/chr21.bam</nowiki> visibility=pack'''
  '''track name=redbin type=bam bigDataUrl=<nowiki>http://genome-test.cse.ucsc.edu/ABRF2010/chr21.bam</nowiki> visibility=pack'''
 
:4. Click submit.  Click "Go to Genome Browser".
# Click submit.  Click "Go to Genome Browser".


* view ~1kB of bam with alignment qualities in grayscale
* view ~1kB of bam with alignment qualities in grayscale

Revision as of 17:11, 19 March 2010

Tutorial: Viewing sequencing data in the UCSC Genome Browser

Presented March 20, 2010 as part of the workshop Tools to Facilitate Management, Analysis and Visualization of 2nd Generation Sequencing Data at ABRF 2010.

This page serves as a step-by-step guide to the demo of UCSC Genome Browser tools presented at the workshop.

Genome Graphs with .sgr data

Some of the result files from the ChIPSeq toolset that Charlie Nicolet showed us earlier are in the .sgr format, i.e. tab-separated text that specifies chromosome, position and a numeric signal value. This format can be uploaded to UCSC's Genome Graphs tool for genome-at-a-glance view of the data.

  1. In a new web browser window or tab, go to http://genome.ucsc.edu/.
  2. On the left side of the page, there is a blue sidebar. Click on "Genome Graphs", the 7th link from the top of the sidebar.
  3. In the assembly menu, change the selection to "Mar. 2006 (NCBI36/hg18)".
  4. Click the upload button.
  5. In the input labeled "name of data set", type or paste "redbin".

other fields

  1. Paste this URL: http://genome-test.cse.ucsc.edu/ABRF2010/chr21_extended.txt_redbin.sgr in the box labeled "Paste URLs or data", and click submit.
  2. You should see confirmation screen; click OK to return to the Genome Graphs display.
  3. Now you should see a layout of human chromosomes. Genome Graphs doesn't show the data until you select it. Select "redbin 1" from the menu on the left. (session link)
  4. Now the data appear -- in this case, for chr21 only. Click on a peak.
  5. Click on a peak to jump to the Genome Browser, displaying a 1,000,000-base region of the genome corresponding to the location you clicked. (session link)
  6. Note the track at the top labeled "redbin 1" -- that is the data shown in Genome Graphs. Peaks correspond to upstream regions of genes, which makes sense for RNA polymerase II ChIP-seq.

Transforming .sgr into bedGraph for Genome Browser viewing

The Genome Browser can draw a somewhat nicer display of .sgr data, if we do a simple transform of the .sgr data into a similar format called bedGraph. bedGraph has both start and end genomic coordinates, so scores can be drawn over regions as opposed to single bases. Here is a command that translates .sgr to bedGraph, expanding each data point to cover a 30-base region:

   perl -wpe '@w=split("\t"); $_ = "$w[0]\t" . ($w[1]-1) . "\t" . ($w[1]+29) . "\t$w[2]";' \
     chr21_extended.txt_redbin.sgr > redbinGt10__.bed

If you would like to show only datapoints above a certain threshold, say 5, you can make the command slightly more complicated:

   perl -wpe '@w=split("\t"); $_ = "$w[0]\t" . ($w[1]-1) . "\t" . ($w[1]+29) . "\t$w[2]"; \
              if ($w[2] < 5) {$_ = "";}' \
     chr21_extended.txt_redbin.sgr > redbinGt5.bed

This line can be added to the beginning of redbinGt5.bed, in order to add labels to the track display (without the line break used for readability here):

   track name=redbinGt5 description="redbin items with score >= 5, expanded to 30 bases" \
      type=bedGraph autoScale=off
   add settings: height 16, vertical range 0-100

The file has been saved here: http://genome-test.cse.ucsc.edu/ABRF2010/redbinGt5.bed .

  1. In the Genome Browser, click the "manage custom tracks" button below the image.
  2. Click on "add custom tracks"
  3. Paste http://genome-test.cse.ucsc.edu/ABRF2010/redbinGt5.bed into the first box on the form and click submit.
  4. You should see a confirmation page like this (photo) (session link)
  5. Click on "genome browser" in the top blue bar (does "Go to Genome Browser" do the right thing?)
  6. Now the new custom track shows grayscale representations of the score. You can click on the track title ("redbin items with score > 5") to expand the display to bar graph.

zoom in?

Comparing with UCSC-hosted data

Now we will take a look at some ChIP-seq data from the ENCODE project.

  1. In the Genome Browser, scroll down below the image to the "Regulation" section. Click "+" to expand.
  2. Click on the link titled "Yale TFBS". This will take you to the track controls.
  3. In the large table under "Select subtracks by cell line and factor",
  4. Hit the "-" button in upper left corner to deselect all subtracks. Note that HeLa-S3 cell line is in 3rd column of checkboxes.
  5. Scroll down to factor Pol2. select the 3rd check box. Only HeLa/Pol2 is selected for display.
  6. Scroll back up to the top of the page. Change visibility to "pack" and click submit.
  7. In the Genome Browser, HeLa/Pol2 peaks and signal appear below "Spliced ESTs". session link

Viewing alignments in BAM format

Charlie Nicolet has shared the ELAND export file that accompanies chr21_extended.txt, chr21_export.txt. This can be transformed into BAM for viewing in the UCSC Genome Browser. The samtools package includes a script that converts ELAND export files into SAM (the text version of BAM). Export files often have fasta names instead of sequence names, e.g. "chr21.fa" instead of "chr21", so the ".fa" must be removed:

   export2sam.pl --read1=chr21_export.txt \
   | perl -wpe 's/(chr.*)\.fa/$1/' \
     > chr21.sam

Edit chr21.sam to add these header lines at the beginning (note: the words must be separated by tab characters, not spaces):

@HD	VN:1.0	GO:none	SO:coordinate
@SQ	SN:chr21	LN:46944323

Use samtools to convert SAM to BAM, sort the BAM, and index the sorted BAM (see http://genome.ucsc.edu/goldenPath/help/bam.html for more info):

   samtools view -bSo chr21.unsorted.bam chr21.sam
   samtools sort chr21.unsorted.bam chr21
   samtools index chr21.bam

Now the sorted chr21.bam and its index file chr21.bam.bai must be put on a web server (ftp, http, or https). For example, if you create files my.bam and my.bam.bai, and you have a personal web directory http://my.edu/mylab/~me/, you can copy or link the files to http://my.edu/mylab/~me/my.bam and http://mylab/~me/my.bam.bai. Then you would use only the first URL (http://my.edu/mylab/~me/my.bam) in a track definition line like the one below. For this example, we will use http://genome-test.cse.ucsc.edu/ABRF2010/chr21.bam .

Steps

  1. In the Genome Browser, click the "manage custom tracks" buttom below the image.
  2. Click the "add custom tracks" button.
  3. Paste this text into the box:
track name=redbin type=bam bigDataUrl=http://genome-test.cse.ucsc.edu/ABRF2010/chr21.bam visibility=pack
4. Click submit. Click "Go to Genome Browser".
  • view ~1kB of bam with alignment qualities in grayscale
  • base-level view of bam, w/base qualities in grayscale

Where to learn more

  • Help, FAQ
  • genome@soe.ucsc.edu and list archives
  • genomewiki (this site)
  • OpenHelix