Custom track database

From genomewiki
Revision as of 22:52, 27 April 2006 by Kate (talk | contribs)
Jump to navigationJump to search

Custom Track Database

Ideas:

  1. Conservative
    • Keep the existing system, merely add a new keyword to the track definition line,
    e.g. "db=" and handle it in a similar manner to how wiggle tracks are handled now
  2. Moderate
    • Same as conservative, but generalize existing wiggle track code to handle multiple data types.
  3. Radical
    • Remove all existing code for custom track handling and perhaps a new system could function more like existing tracks and not need special handling all over the place.

How would it work:

  1. Conservative
    • Incoming data stream recognizes the "db=" keyword. Incoming data is sent to a separate file for DB loading. DB loading is done simply by exec'ing the appropriate
    hgLoad loader into a new table in the custom track DB. A trackDb entry is made into the custom track DB.
    • The DB loading could also be done merely by tossing the incoming data over to the submission mechanism mentioned below.
    • A single track line is left in the custom track trash file as a pointer to the DB entries.
    • Subsequent viewing is achieved by combining the custom track DB trackDb entries with the ordinary trackDb entries. An extra field has to be added to one of our structures somewhere which will allow the appropriate database to be used when processing each track since they now come from different databases.
  2. Moderate
  3. Same as conservative, but also support a batch submission mechanism (like that mentioned below), AKA the "automated track loader". This type of facility is required for the ENCODE project, for creating standard (non-custom) tracks, and it would be best to share functionality. The batch submission mechanism consists of: 1. Web interface to submit data, track configuration, and track description and save it in queue (in the filesystem) 2. Back-end daemon that retrieves batch requests, processes into tracks by using database loaders, then notifies user by email that the track is complete. The web interface should be designed to support custom tracks or standard tracks. For small data sets, it could short-circuit the batch submission, and load directly and call hgTracks (similar to the current custom track submission interface).
  4. Radical
    • Submission system as already developed by Andy and Mark could be used to submit custom tracks.
    • As mentioned in 1. above, simply combine the custom track trackDb entries with the ordinary trackDb entries to make all tracks appear the same to the processing cgi-bin programs. The only new bit of information needed would be a DB tag for each track so the appropriate database can be used for each track.

Gotchas

  1. It may be difficult to make tracks coming from the custom track DB simply appear as just another track, but coming from another database, because the concept of "database" for source of tracks is programmed into all the CGIs as a single global variable. It isn't immediately obvious how this could easily be made into a switch.
  2. It might be a lot easier to keep the old system in place because the extra processing involved with a DB source of data could be handled by a new field in the customTrack structure to specify the DB source. All the exception handling for this new field is somewhat already in place everywhere because this structure is already used everywhere to fetch data for a custom track.

Other: