ENCODE Hg18 Migration

From genomewiki
Jump to navigationJump to search

Here is the lift summary: Overall summary

By groups:

Group                     Super-tracks Tracks Tables
------------------------------------------------------
Regions and Genes              2       12      73
Transcription                  2       11      67
Chromatin Immunoprecipitation  8       28     349
Chromatin Structure            2        8      51
------------------------------------------------------
                              14       49     540

Here are lists of all tables/tracks/supertracks, with #items migrated, and #dropped, by track group:

Background


The ENCODE Project is ready to move to hg18. To support this, UCSC needs to migrate the existing data tables from hg17 and set up an hg18 ENCODE browser page and downloads area.

As part of the migration, it would be very desirable to reduce the impact of the ENCODE tracks on the overall browser access, and to make the ENCODE data easier to access.

Tasks


1. Data lift
2. Browser frame page, static pages
3. Downloads
4. Track reorganization
5. Support for mixed-type composite tracks
6. Track description reorganization
7. Track group display enhancements
8. Track search tool

Items 5-8 are generic to the browser, not ENCODE-specific.

Task Details


1. Coordinate conversion and database loading of hg17 encode tables (~500 total), excluding those in the Variation and Comp Geno groups, which must be regenerated by the data providers.

Notes:

liftOver from hg17 to hg18 in the ENCODE regions should be nearly perfect, except for in one region which has always been problematic (ENm006 on chrX). If items are unmapped in this region, we shouldn't worry.

Documentation from the previous migration is in encodeHg17.txt.

2. ENCODE browser frame page

3. Downloads

Need to post all wiggle data in unencoded form (this wasn't done systematically with hg17).

Some hg17 downloads may not be liftable (data formats we don't support)

4. Track restructuring to combine related datasets

A goal is to merge tracks and track groups. Attempt to have a single track for each group/experiment -- e.g. 1 Yale Chip/chip track instead of 5. This allow merging ENCODE track groups -- perhaps to 3. It will require some coding and doc work (below)

5. Support for mixed-type composite tracks

This feature would allow handling of different types of subtracks within a single track. Separate configuration of different types would be allowed -- e.g. wiggles and beds could both be fully configurable by the user, with visibility and filtering properly managed. This will introduce a 3rd level (subtrack group) into our track organization.

6. Track description reorganization

Methods will be associated with subtrack groups rather than with the whole composite, so related datasets can be flexibly grouped. It would be best to remove specifics of experimental parameters (e.g. cell types used, transcriptor factors tested) at this time and have this info interpolated from metadata. We may have to wait for our metadata/DCC work for this though.

7. Track group display changes

To reduce clutter, have track group sections be easily collapsed or expanded. Perhaps use a different color on the group bar for ENCODE track groups to distinguish them.

8. Track search tool

Not essential for migration, but would add alot of value.

Some initial thoughts:

  • Could have search button on configure page
  • Output would show suitable subtracks, with checkboxes
  • Advanced search with checkboxes for cell lines, antibodies, platforms, labs
  • Would be nice to allow reordering subtracks

Issues to consider


  • Max #tables in a database
  • Max table name size
  • Effective limit to menu size (e.g #tables in TB)