ENCODE Hg18 Migration: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
(added new category Category:Browser Linked)
 
(22 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Here is the lift summary:
[http://docs.google.com/View?docID=dcw2pgjb_10q9qkpp&revision=_latest Overall summary]
By groups:
<pre>
Group                    Super-tracks Tracks Tables
------------------------------------------------------
Regions and Genes              2      12      73
Transcription                  2      11      67
Chromatin Immunoprecipitation  8      28    349
Chromatin Structure            2        8      51
------------------------------------------------------
                              14      49    540
</pre>
Here are lists of all tables/tracks/supertracks, with #items migrated, and #dropped, by track group:
* [http://docs.google.com/View?docID=dcw2pgjb_2cz82dz&revision=_latest ENCODE Regions and Genes]
* [http://docs.google.com/View?docID=dcw2pgjb_4ckrp99&revision=_latest ENCODE Transcription]
*[http://docs.google.com/View?docID=dcw2pgjb_9dsr4hg&revision=_latest ENCODE Chromatin Immunoprecipitation]
* [http://docs.google.com/View?docID=dcw2pgjb_6fsfcnr&revision=_latest ENCODE Chromatin Structure]
Background
Background
----------
----------
Line 13: Line 35:
Tasks
Tasks
-----
-----
<nowiki>
 
1. Data lift
1. Data lift
2. Browser frame page, static pages
2. Browser frame page, static pages
3. Downloads
3. Downloads
4. Track reorganization
4. Track reorganization
5. Support for mixed-type composite tracks
5. Support for mixed-type composite tracks
6. Track description reorganization
6. Track description reorganization
7. Track group display enhancements
7. Track group display enhancements
8. Track search tool
8. Track search tool
</nowiki>


Items 5-8 are generic to the browser,  
Items 5-8 are generic to the browser,  
Line 30: Line 51:
------------
------------
1. Coordinate conversion and database loading of
1. Coordinate conversion and database loading of
  hg17 encode tables (~500 total), excluding
hg17 encode tables (~500 total), excluding
  those in the Variation and Comp Geno groups,
those in the Variation and Comp Geno groups,
  which must be regenerated by the data providers.
which must be regenerated by the data providers.


Notes:
Notes:


  * liftOver from hg17 to hg18 in the
liftOver from hg17 to hg18 in the
  ENCODE regions should be nearly perfect,
ENCODE regions should be nearly perfect,
  except for in one region which has always
except for in one region which has always
  been problematic (ENm006 on chrX).  If items
been problematic (ENm006 on chrX).  If items
  are unmapped in this region, we shouldn't worry.
are unmapped in this region, we shouldn't worry.


  * documentation from the previous migration is
Documentation from the previous migration is
  in encodeHg17.txt.
in encodeHg17.txt.


2. ENCODE browser frame page  
2. ENCODE browser frame page  
Line 49: Line 70:
3. Downloads
3. Downloads


  * Need to post all wiggle data in unencoded form
Need to post all wiggle data in unencoded form
    (this wasn't done systematically with hg17).
(this wasn't done systematically with hg17).


  * Some hg17 downloads may not be liftable (data formats we
Some hg17 downloads may not be liftable (data formats we
      don't support)
don't support)


4. Track restructuring to combine related datasets
4. Track restructuring to combine related datasets


  A goal is to merge tracks and track groups. Attempt to
A goal is to merge tracks and track groups. Attempt to
  have a single track for each group/experiment -- e.g.  
have a single track for each group/experiment -- e.g.  
  1 Yale Chip/chip track instead of 5. This allow  
1 Yale Chip/chip track instead of 5. This allow  
  merging ENCODE track groups -- perhaps to 3.  It will
merging ENCODE track groups -- perhaps to 3.  It will
  require some coding and doc work (below)
require some coding and doc work (below)


5. Support for mixed-type composite tracks
5. Support for mixed-type composite tracks


  This feature would allow handling of different types of subtracks
This feature would allow handling of different types of subtracks
  within a single track.  Separate configuration of different
within a single track.  Separate configuration of different
  types would be allowed -- e.g. wiggles and beds could both
types would be allowed -- e.g. wiggles and beds could both
  be fully configurable by the user, with visibility and filtering
be fully configurable by the user, with visibility and filtering
  properly managed.  This will introduce a 3rd level (subtrack group)
properly managed.  This will introduce a 3rd level (subtrack group)
  into our track organization.
into our track organization.


6. Track description reorganization
6. Track description reorganization


  Methods will be associated with subtrack groups rather than with
Methods will be associated with subtrack groups rather than with
  the whole composite, so related datasets can be flexibly grouped.
the whole composite, so related datasets can be flexibly grouped.
  It would be best to remove specifics of experimental parameters
It would be best to remove specifics of experimental parameters
  (e.g. cell types used, transcriptor factors tested) at this time
(e.g. cell types used, transcriptor factors tested) at this time
  and have this info interpolated from metadata.  We may have to
and have this info interpolated from metadata.  We may have to
  wait for our metadata/DCC work for this though.
wait for our metadata/DCC work for this though.


7. Track group display changes
7. Track group display changes


  To reduce clutter, have track group sections be easily collapsed
To reduce clutter, have track group sections be easily collapsed
  or expanded.  Perhaps use a different color on the group bar
or expanded.  Perhaps use a different color on the group bar
  for ENCODE track groups to distinguish them.
for ENCODE track groups to distinguish them.


8. Track search tool
8. Track search tool
Line 102: Line 123:
* Max table name size
* Max table name size
* Effective limit to menu size (e.g #tables in TB)
* Effective limit to menu size (e.g #tables in TB)
[[Category:ENCODE]]
[[Category:Browser Linked]]

Latest revision as of 18:18, 10 March 2011

Here is the lift summary: Overall summary

By groups:

Group                     Super-tracks Tracks Tables
------------------------------------------------------
Regions and Genes              2       12      73
Transcription                  2       11      67
Chromatin Immunoprecipitation  8       28     349
Chromatin Structure            2        8      51
------------------------------------------------------
                              14       49     540

Here are lists of all tables/tracks/supertracks, with #items migrated, and #dropped, by track group:

Background


The ENCODE Project is ready to move to hg18. To support this, UCSC needs to migrate the existing data tables from hg17 and set up an hg18 ENCODE browser page and downloads area.

As part of the migration, it would be very desirable to reduce the impact of the ENCODE tracks on the overall browser access, and to make the ENCODE data easier to access.

Tasks


1. Data lift
2. Browser frame page, static pages
3. Downloads
4. Track reorganization
5. Support for mixed-type composite tracks
6. Track description reorganization
7. Track group display enhancements
8. Track search tool

Items 5-8 are generic to the browser, not ENCODE-specific.

Task Details


1. Coordinate conversion and database loading of hg17 encode tables (~500 total), excluding those in the Variation and Comp Geno groups, which must be regenerated by the data providers.

Notes:

liftOver from hg17 to hg18 in the ENCODE regions should be nearly perfect, except for in one region which has always been problematic (ENm006 on chrX). If items are unmapped in this region, we shouldn't worry.

Documentation from the previous migration is in encodeHg17.txt.

2. ENCODE browser frame page

3. Downloads

Need to post all wiggle data in unencoded form (this wasn't done systematically with hg17).

Some hg17 downloads may not be liftable (data formats we don't support)

4. Track restructuring to combine related datasets

A goal is to merge tracks and track groups. Attempt to have a single track for each group/experiment -- e.g. 1 Yale Chip/chip track instead of 5. This allow merging ENCODE track groups -- perhaps to 3. It will require some coding and doc work (below)

5. Support for mixed-type composite tracks

This feature would allow handling of different types of subtracks within a single track. Separate configuration of different types would be allowed -- e.g. wiggles and beds could both be fully configurable by the user, with visibility and filtering properly managed. This will introduce a 3rd level (subtrack group) into our track organization.

6. Track description reorganization

Methods will be associated with subtrack groups rather than with the whole composite, so related datasets can be flexibly grouped. It would be best to remove specifics of experimental parameters (e.g. cell types used, transcriptor factors tested) at this time and have this info interpolated from metadata. We may have to wait for our metadata/DCC work for this though.

7. Track group display changes

To reduce clutter, have track group sections be easily collapsed or expanded. Perhaps use a different color on the group bar for ENCODE track groups to distinguish them.

8. Track search tool

Not essential for migration, but would add alot of value.

Some initial thoughts:

  • Could have search button on configure page
  • Output would show suitable subtracks, with checkboxes
  • Advanced search with checkboxes for cell lines, antibodies, platforms, labs
  • Would be nice to allow reordering subtracks

Issues to consider


  • Max #tables in a database
  • Max table name size
  • Effective limit to menu size (e.g #tables in TB)