Custom track database

From Genomewiki

Jump to: navigation, search

Custom Track Database (Completed 2006-06-02)

Contents

Ideas:

  1. Conservative
    • Keep the existing system, merely add a new keyword to the track definition line,

    e.g. "db=<data type>" and handle it in a similar manner to how wiggle tracks are handled now

  2. Moderate
    • Same as conservative, but generalize existing wiggle track code to handle multiple data types.
  3. Radical
    • Remove all existing code for custom track handling and perhaps a new system could function more like existing tracks and not need special handling all over the place.

How would it work:

  1. Conservative
    • Incoming data stream recognizes the "db=<data type>" keyword. Incoming data is sent to a separate file for DB loading. DB loading is done simply by exec'ing the appropriate

    hgLoad<data type> loader into a new table in the custom track DB. A trackDb entry is made into the custom track DB.

    • The DB loading could also be done merely by tossing the incoming data over to the submission mechanism mentioned below.
    • A single track line is left in the custom track trash file as a pointer to the DB entries.
    • Subsequent viewing is achieved by combining the custom track DB trackDb entries with the ordinary trackDb entries. An extra field has to be added to one of our structures somewhere which will allow the appropriate database to be used when processing each track since they now come from different databases.
  2. Moderate
  3. Same as conservative, but also support a batch submission mechanism (like that mentioned below), AKA the "automated track loader". This type of facility is required for the ENCODE project, for creating standard (non-custom) tracks, and it would be best to share functionality. The batch submission mechanism consists of:

    1. Web interface to submit data, track configuration, and track description and save it in queue (in the filesystem)
    2. Back-end daemon that retrieves batch requests, processes into tracks by using database loaders, then notifies user by email that the track is complete. The web interface should be designed to support custom tracks or standard tracks. For small data sets, it could short-circuit the batch submission, load the data directly and call hgTracks (similar to the current custom track submission interface).
  4. Radical
    • Submission system as already developed by Andy and Mark could be used to submit custom tracks.
    • As mentioned in 1. above, simply combine the custom track trackDb entries with the ordinary trackDb entries to make all tracks appear the same to the processing cgi-bin programs. The only new bit of information needed would be a DB tag for each track so the appropriate database can be used for each track.

Gotchas

  1. It may be difficult to make tracks coming from the custom track DB simply appear as just another track, but coming from another database, because the concept of "database" for source of tracks is programmed into all the CGIs as a single global variable. It isn't immediately obvious how this could easily be made into a switch.
  2. It might be a lot easier to keep the old system in place because the extra processing involved with a DB source of data could be handled by a new field in the customTrack structure to specify the DB source. All the exception handling for this new field is somewhat already in place everywhere because this structure is already used everywhere to fetch data for a custom track.

Other:

Personal tools