Conservation Track QA: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
m (formatting)
Line 1: Line 1:
'''QAing Conservation and Most Conserved tracks for a new assembly'''<BR>
== QAing Conservation and Most Conserved tracks for a new assembly ==
(using the ponAbe2 (orangutan) assembly as an example throughout)<BR>
(using the ponAbe2 (orangutan) assembly as an example throughout)<BR>


Line 11: Line 11:
* extFile
* extFile


Make a list of files to go to hgnfs1:<BR>
=== Make a list of files to go to hgnfs1: ===
* /gbdb/ponAbe2/multiz8way/phastCons8way.wib
* /gbdb/ponAbe2/multiz8way/phastCons8way.wib
* /gbdb/ponAbe2/multiz8way/anno/maf/*
* /gbdb/ponAbe2/multiz8way/anno/maf/*


Make a list of files to go to hgdownload:<BR>
=== Make a list of files to go to hgdownload: ===
* /usr/local/apache/htdocs/goldenPath/ponAbe2/phastCons8way/*.*
* /usr/local/apache/htdocs/goldenPath/ponAbe2/phastCons8way/*.*
* /usr/local/apache/htdocs/goldenPath/ponAbe2/phastCons8way/phastConsScores/*
* /usr/local/apache/htdocs/goldenPath/ponAbe2/phastCons8way/phastConsScores/*
Line 21: Line 21:
* /usr/local/apache/htdocs/goldenPath/ponAbe2/multiz8way/maf/*
* /usr/local/apache/htdocs/goldenPath/ponAbe2/multiz8way/maf/*


Make a list of all organisms in the Conservation track:<BR>
=== Make a list of all organisms in the Conservation track: ===
* orangutan       Pan troglodytes         July 2007, ponAbe2
{|
* human           Homo sapiens           Mar 2006, hg18
| orangutan || Pan troglodytes || July 2007 || ponAbe2
* chimpanzee     Pan troglodytes         Mar 2006, panTro2
|-
* rhesus         Macaca mulatta         Jan 2006, rheMac2
| human || Homo sapiens || Mar 2006 || hg18
* marmoset       Callithrix jacchus     June 2007, calJac1
|-
* mouse           Mus musculus           July 2007, mm9
| chimpanzee || Pan troglodytes || Mar 2006 || panTro2
* opossum         Monodelphis domestica   Jan 2006, monDom4
|-
* platypus       Ornithorhychus anatinus Mar 2007, ornAna1
| rhesus || Macaca mulatta || Jan 2006 || rheMac2
|-
| marmoset || Callithrix jacchus || June 2007 || calJac1
|-
| mouse || Mus musculus || July 2007 || mm9
|-
| opossum || Monodelphis domestica || Jan 2006 || monDom4
|-
| platypus || Ornithorhychus anatinus || Mar 2007 || ornAna1
|}


Make a list of all organisms for which there are nets & chains:<BR>
=== Make a list of all organisms for which there are nets & chains: ===
(put in order of furthest from this species to closest)<BR>
(put in order of furthest from this species to closest)<BR>
* ornAna1
* ornAna1
Line 41: Line 50:




Check the following in the files:
=== Check the following in the files: ===
==== Check annotated maf files for overlapping blocks: ====
[hgwdev:/gbdb/ponAbe2/multiz8way/anno/maf>
foreach f (*.maf)
  echo -n "${f}: "
  mafFilter -overlap -minRow=1 $f > /dev/null
end


- check annotated maf files for overlapping blocks:
''If there are 'rejected blocks', contact the developer.''
[hgwdev:/gbdb/ponAbe2/multiz8way/anno/maf>
foreach f (*.maf)
      echo -n "${f}: "
      mafFilter -overlap -minRow=1 $f > /dev/null
end


If there are 'rejected blocks', contact the developer.


==== Read both README files: ====
- Read both README files:
/goldenPath/ponAbe2/phastCons8way/README.txt<BR>
/goldenPath/ponAbe2/phastCons8way/README.txt
/goldenPath/ponAbe2/multiz8way/README.txt
/goldenPath/ponAbe2/multiz8way/README.txt


- check upstream files to make sure that the species name doesn't appear in an "s" line:
==== Check upstream files to make sure that the species name doesn't appear in an "s" line: ====
[hgwdev:~/goldenPath/ponAbe2/multiz8way/maf> zcat upstream*.maf.gz | grep "s ponAbe2" | wc -l
[hgwdev:~/goldenPath/ponAbe2/multiz8way/maf> zcat upstream*.maf.gz | grep "s ponAbe2" | wc -l<BR>
0
0<BR>
If this is not zero, contact the developer.
''If this is not zero, contact the developer.''
 


- check upstream files to make sure gene names haven't been truncated (to 9 chars):
==== Check upstream files to make sure gene names haven't been truncated (to 9 chars): ====
[hgwdev:~/goldenPath/ponAbe2/multiz8way/maf> zcat upstream*.maf.gz | head ##maf version=1 scoring=zero
[hgwdev:~/goldenPath/ponAbe2/multiz8way/maf> zcat upstream*.maf.gz | head  
a score=0.000000
##maf version=1 scoring=zero
s NM_001017434 0 1000 + 1000 GTGAAGTGTCAGGGTGGAGAAGCAAATACAAACTCTTCACTAAGTGGCCCT
a score=0.000000
If the gene names are short (9 characters) contact the developer.
s NM_001017434 0 1000 + 1000 GTGAAGTGTCAGGGTGGAGAAGCAAATACAAACTCTTCACTAAGTGGCCCT
''If the gene names are short (9 characters) contact the developer.''


- check one maf file:
==== Check one maf file: ====
[hgwdev:~/goldenPath/ponAbe2/multiz8way/maf> zcat chrX.maf.gz  | head
[hgwdev:~/goldenPath/ponAbe2/multiz8way/maf> zcat chrX.maf.gz  | head
##maf version=1 scoring=autoMZ.v1
##maf version=1 scoring=autoMZ.v1
a score=21236.000000
a score=21236.000000
s ponAbe2.chrX    2 249 + 156195299 cagtggcatgatcacagatgactgcagcctcggcctccatagc
s ponAbe2.chrX    2 249 + 156195299 cagtggcatgatcacagatgactgcagcctcggcctccatagc


==== Read through both description pages: ====
* Conservation track:
** Check image that displays on conservation details page.
** Check "Gene tracks used for codon translation" table against make doc.
** Make sure organsisms are listed (in all places) in the correct phylogenetic order.
** Make sure that this page includes all the extra sections (if the multizs have been anotated).
** Make sure there is a tree model available.
* Most Conserved track:
** Make sure the text referrs to the correct species.


Read through both description pages:
==== Check trackDb.ra file: ====
Conservation track:
* Conservation track:
- Check image that displays on conservation details page.
** Make sure there is a speciesCodonDefault entry (usually is this species).
- Check "Gene tracks used for codon translation" table against make doc.
** Make sure Jim has signed off on the species listed in the speciesDefaultOff entry.
- Make sure organsisms are listed (in all places) in the correct phylogenetic order.
* Most Conserved track:
- Make sure that this page includes all the extra sections (if the multizs have been anotated).
- Make sure there is a tree model available.


Most Conserved track:
- Make sure the text referrs to the correct species.


Check trackDb.ra file:
Conservation track:
- Make sure there is a speciesCodonDefault entry (usually is this species).
- Make sure Jim has signed off on the species listed in the speciesDefaultOff entry.


Most Conserved track:
=== Figure out extFile and seq tables: ===
 
* if they are standard maf files, there will be no entries in the seq table.
 
* There may be more than one set of entries in the extFile table.  Make sure you only push the set that pertains to the actual files you are pushing to hgnfs1 (e.g. /gbdb/ponAbe2/multiz8way/anno/maf/*)
 
* These are the ones that will need pushing to beta:
 
FIGURE OUT extFile AND seq:
- if they are standard maf files, there will be no entries in the seq table.
- There may be more than one set of entries in the extFile table.  Make sure you only push the set that pertains to the actual files you are pushing to hgnfs1 (e.g. /gbdb/ponAbe2/multiz8way/anno/maf/*)
 
These are the ones that will need pushing to beta:
mysql> select path from extFile where path like "%anno/maf%";
mysql> select path from extFile where path like "%anno/maf%";




TO TEST IN BROWSER:
=== Test in the Genome Browser: ===
- Zoom out past 1M bps (this tests the multiz*waySummary table)
* Zoom out past 1M bps (this tests the multiz*waySummary table)
- Find example areas of all annotation types:
* Find example areas of all annotation types (check against the maf file for that location):
pale yellow bar
** pale yellow bar
green square brackets
** green square brackets
vertical blue bar
** vertical blue bar
gaps
** gaps
 
* Check out codon translation for a few species.
- Check out codon translation for a few species.
 




Tables Tests:
=== Test in the tables: ===
- joinerCheck
* joinerCheck


- featureBits
* featureBits
[hgwdev:~/qa/tracks/conservation/ponAbe2>  nice featureBits ponAbe2 multiz8way gap -bed=output.bed
[hgwdev:~/qa/tracks/conservation/ponAbe2>  nice featureBits ponAbe2 multiz8way gap -bed=output.bed
162920397 bases of 3093572278 (5.266%) in intersection
162920397 bases of 3093572278 (5.266%) in intersection


* countPerChrom.csh ponAbe2 multiz8way


countPerChrom.csh ponAbe2 multiz8way
* find out how phastCons was run (from make doc).  See if the species listed in the non-inf list make sense.  In this case, they do not add to the phastCons wiggle.  --not-informative
 
 
 
- find out how phastCons was run (from make doc).  See if the species listed in the non-inf list make sense.  In this case, they do not add to the phastCons wiggle.   
--not-informative

Revision as of 23:44, 14 January 2008

QAing Conservation and Most Conserved tracks for a new assembly

(using the ponAbe2 (orangutan) assembly as an example throughout)

Make a list of tables:

  • multiz8way
  • multiz8wayFrames
  • multiz8waySummary
  • phastCons8way
  • phastConsElements8way
  • seq
  • extFile

Make a list of files to go to hgnfs1:

  • /gbdb/ponAbe2/multiz8way/phastCons8way.wib
  • /gbdb/ponAbe2/multiz8way/anno/maf/*

Make a list of files to go to hgdownload:

  • /usr/local/apache/htdocs/goldenPath/ponAbe2/phastCons8way/*.*
  • /usr/local/apache/htdocs/goldenPath/ponAbe2/phastCons8way/phastConsScores/*
  • /usr/local/apache/htdocs/goldenPath/ponAbe2/multiz8way/*.*
  • /usr/local/apache/htdocs/goldenPath/ponAbe2/multiz8way/maf/*

Make a list of all organisms in the Conservation track:

orangutan Pan troglodytes July 2007 ponAbe2
human Homo sapiens Mar 2006 hg18
chimpanzee Pan troglodytes Mar 2006 panTro2
rhesus Macaca mulatta Jan 2006 rheMac2
marmoset Callithrix jacchus June 2007 calJac1
mouse Mus musculus July 2007 mm9
opossum Monodelphis domestica Jan 2006 monDom4
platypus Ornithorhychus anatinus Mar 2007 ornAna1

Make a list of all organisms for which there are nets & chains:

(put in order of furthest from this species to closest)

  • ornAna1
  • monDom4
  • mm9
  • rheMac2
  • panTro2
  • hg18


Check the following in the files:

Check annotated maf files for overlapping blocks:

[hgwdev:/gbdb/ponAbe2/multiz8way/anno/maf>

foreach f (*.maf)
  echo -n "${f}: "
  mafFilter -overlap -minRow=1 $f > /dev/null
end

If there are 'rejected blocks', contact the developer.


Read both README files:

/goldenPath/ponAbe2/phastCons8way/README.txt
/goldenPath/ponAbe2/multiz8way/README.txt

Check upstream files to make sure that the species name doesn't appear in an "s" line:

[hgwdev:~/goldenPath/ponAbe2/multiz8way/maf> zcat upstream*.maf.gz | grep "s ponAbe2" | wc -l
0
If this is not zero, contact the developer.

Check upstream files to make sure gene names haven't been truncated (to 9 chars):

[hgwdev:~/goldenPath/ponAbe2/multiz8way/maf> zcat upstream*.maf.gz | head

##maf version=1 scoring=zero
a score=0.000000
s NM_001017434 0 1000 + 1000 GTGAAGTGTCAGGGTGGAGAAGCAAATACAAACTCTTCACTAAGTGGCCCT

If the gene names are short (9 characters) contact the developer.

Check one maf file:

[hgwdev:~/goldenPath/ponAbe2/multiz8way/maf> zcat chrX.maf.gz | head

##maf version=1 scoring=autoMZ.v1
a score=21236.000000
s ponAbe2.chrX     2 249 + 156195299 cagtggcatgatcacagatgactgcagcctcggcctccatagc

Read through both description pages:

  • Conservation track:
    • Check image that displays on conservation details page.
    • Check "Gene tracks used for codon translation" table against make doc.
    • Make sure organsisms are listed (in all places) in the correct phylogenetic order.
    • Make sure that this page includes all the extra sections (if the multizs have been anotated).
    • Make sure there is a tree model available.
  • Most Conserved track:
    • Make sure the text referrs to the correct species.

Check trackDb.ra file:

  • Conservation track:
    • Make sure there is a speciesCodonDefault entry (usually is this species).
    • Make sure Jim has signed off on the species listed in the speciesDefaultOff entry.
  • Most Conserved track:


Figure out extFile and seq tables:

  • if they are standard maf files, there will be no entries in the seq table.
  • There may be more than one set of entries in the extFile table. Make sure you only push the set that pertains to the actual files you are pushing to hgnfs1 (e.g. /gbdb/ponAbe2/multiz8way/anno/maf/*)
  • These are the ones that will need pushing to beta:

mysql> select path from extFile where path like "%anno/maf%";


Test in the Genome Browser:

  • Zoom out past 1M bps (this tests the multiz*waySummary table)
  • Find example areas of all annotation types (check against the maf file for that location):
    • pale yellow bar
    • green square brackets
    • vertical blue bar
    • gaps
  • Check out codon translation for a few species.


Test in the tables:

  • joinerCheck
  • featureBits

[hgwdev:~/qa/tracks/conservation/ponAbe2> nice featureBits ponAbe2 multiz8way gap -bed=output.bed 162920397 bases of 3093572278 (5.266%) in intersection

  • countPerChrom.csh ponAbe2 multiz8way
  • find out how phastCons was run (from make doc). See if the species listed in the non-inf list make sense. In this case, they do not add to the phastCons wiggle. --not-informative