HgTracks multi-region changes

From Genecats
Jump to navigationJump to search

This page provides an overview of the changes made to hgTracks and the position cart variable by Galt when implementing multi-region (a.k.a. "exon-mostly") display mode. It began as part of code review #16514 by Angie. Instead of adding these notes to the Redmine ticket, I am putting them in the wiki so that Galt can correct any of my misunderstandings, and hopefully this will serve as a rough guide for any developers who will need to interact with this code in the future.

The big merge of code changes for the initial implementation of multi-region display is 8c908f948b; see also http://genecats.cse.ucsc.edu/git-reports-history/v327/review/user/galt/index.html

hgTracks has always operated on the assumption that the user is viewing a single region of some reference assembly sequence. There are global variables for that region and global variables for layout parameters of a single region image. These global variables are referenced throughout all tracks' drawing code. Track loaders fetch data from a single region. Galt implemented multi-region display in a way that barely touched the bulk of hgTracks code -- quite an accomplishment! -- partly by introducing new structures and loops, and partly by carefully overwriting the pre-existing global variables. Developers working on loading and drawing code in the future can mostly ignore those changes and continue using the old globals in the same way as before. However, there may be cases in which track handlers will have to make use of data from all regions (for example, wiggle auto-scale). Developers who add a new global variable will have to decide whether it needs to be updated every time the current region changes. But hopefully they'll not add any more globals.  :)

Multi-region display UI

There is a new button underneath the browser image, labeled 'multi-region'. Clicking it produces a pop-up with options for multi-region display. In the initial version there is a radio button set as shown below, followed by a checkbox that controls whether the regions are visually separated by thin red lines (the default) or highlighting alternating regions.

(*) Show single-chromosome view (default)

( ) Show exons or ( ) genes using GENCODE v22. Use padding of: [6] bases.

( ) Custom regions from BED URL: [ ]

( ) Show one alternate haplotype, placed on its chromosome, using ID: [ chr1_KI270762v1_alt ]

[ ] Highlight alternating regions in multi-region view

The 'Show exons or genes' line appears only when a suitable gene table can be identified. The 'Show one alternate haplotype' option appears only when there is an altLocations sql table.

Within the four main options, there are additional parameters:

  • a fifth mode option to omit regions between genes instead of between exons; default unselected
  • the padding option for exon or gene mode; default 6
  • the URL for a BED file (with optional multi-region-specific settings) provided by the user; default empty
  • an alternate haplotype ID; default is "chr6_cox_hap2" or if not applicable, the first item in the altLocations table

There are also two new keyboard bindings: 'e v' changes the mode to 'Show exons' and 'd v' changes the mode to single-region.

The code includes several additional modes not shown above, mostly "demo" modes for testing during development. Since those are not reachable by users, I won't describe those.

Changes to the position cart variable (and hence other CGIs)

Before multi-region, the position cart variable contained either seq:start-end or a search term that must be resolved into seq:start-end. seq was always a chromosome/scaffold name that could be found in the current db's chromInfo table, and had start and end coordinates relative to that chromosome/scaffold. Several other CGIs (such as the Table Browser, Gene Sorter, DI, VAI) assumed this formatting of position. Now, when viewing custom regions from the user or an alt haplotype, position's value begins with virt and its start and end coordinates are for a "virtual chromosome" constructed by joining all regions.

The virt:start-end form does not (yet) make sense to any other CGI besides hgTracks, so some code has been added to the beginning of each CGI that uses the position variable to detect virt: at the beginning; if found, it swaps in the value of a new cart variable, nonVirtPosition.

Code changes

Needless to say, changing a paradigm in hgTracks involves adding and changing a lot of code. This is a high-level overview, not a comprehensive list.

New cart variables

virtMode

Boolean: true if we are not in single-region mode.

virtModeType

One of {"default", "exonMostly", "geneMostly", "customUrl", "singleAltHaplo" and some others not accessible by UI}. This stores the user's choice of mode in the multi-region pop-up.

multiRegionsBedUrl

The user's URL for custom regions (user sets in pop-up)

singleAltHaploId

The name of the alternate haplotype sequence used in alt-haplo mode (user sets in pop-up)

emAltHighlight

If true, alternating regions are differentiated by highlighting instead of red lines. (checkbox in pop-up)

emPadding

The number of bases to add on each side of each exon/gene when making the region list in exonMostly/geneMostly mode (user sets in pop-up)

emGeneTable

The table used to make regions in exonMostly/geneMostly mode. Currently this is set by the code, not the user.

lastVirtModeType

Stored in cart with the same value as virtModeType, for detecting changes to virtModeType.

lastVirtModeExtraState

CGI-encoded sequence of several mode-specific parameters for detecting changes:

  • singleAltHaploId (see above)
  • multiRegionsBedUrl: the name and file modification date of the user's URL for custom regions
  • singleTransId: ignoring here because the singleTrans mode is not UI-accessible

virtModeShortDescr

Typically one word like "exons" or "genes"; this appears in the ideogram. This is inferred from the mode except when in customUrl mode if the user specifies one.

nonVirtPosition

When in multi-region mode, the start and end of all regions on the first region's chromosome. For use by other CGIs when position begins with "virt".

oldPosition

For detecting changes in position that somehow are not caught by pre-existing cart var lastPosition.

position.db

Not just a position -- this stores pretty much all of the above, CGI-encoded, in case we change db and then return to this db.

Additions to struct track

struct track is fundamental to hgTracks: it encompasses track data, track metadata and track methods for loading, drawing, labeling, mapping etc. hgTracks has always had a single global trackList, iterating over it at loading time and at every stage of building up the main image and mapbox (actually makeActiveImage builds flatTracks from trackList and uses that instead of trackList, but still, it has always used a one-dimensional list).

Since each struct track in the list includes loading and drawing functions that rely on global variables for genomic coordinates and pixel coordinates, the least invasive change to the code was to make a separate struct track per region for each track -- so in effect we have a two-dimensional array of struct track, implemented as lists using separate pointers. track->next still points to the next track in trackList -- but trackList is different for every region. struct track now has two new members, nextWindow and prevWindow, which connect the per-region instances of a track.

In addition, Kate's GTEx track draws a fixed-pixel-width bar graph above the scaled-width transcript, so track code needs a way to draw on the image in a way that may extend into other regions' windows. The packing code also needs to be aware of the possibly larger pixel width taken up. So two new methods were added, nonPropPixelWidth for use by the packing code and nonPropDrawItemAt for the drawing code.

struct track *nextWindow, *prevWindow

These point to the same track's instance of struct track for the region to the right (if any) and the region to the left (if any). They are populated in doTrackForm. nextWindow is used in simpleTracks.c, hgTracks.c and gtexTracks.c (for variable height calculation) to iterate over regions for a single track. As of 1/8/16, prevWindow seems to be used only in a disabled section of doTrackForm marked "// TEMP HACK GALT REMOVE" (disabled by loadHack = FALSE).

int (*nonPropPixelWidth)(struct track *tg, void *item)

Currently populated only by the GTEx track, this returns the width in pixels of the non-scaled part (e.g. GTEx bar graph) that is drawn for each item. If non-NULL, packCountRowsOverflow calls it when computing ranges to pass into the spaceSaver.

void (*nonPropDrawItemAt)(struct track *tg, void *item, struct hvGfx *hvg, ...)

Currently populated only by the GTEx track, this draws the non-scaled part of an item. If non-NULL, it is called with a clipping rectangle set to the whole image instead of to the current region/window by genericDrawOverflowItem and genericDrawItem.

Additions to spaceSaver module

Multi-region brings new challenges in item packing: the per-region track instances can independently load different slices of the same item, but those pieces must be joined back together on the same row. That applies even to tracks with visibility=full now; before, the spaceSaver was not invoked for tracks in full mode, but now spaceSaver handles full mode with a special case to automatically hop down to the next row for a new item. (The unification of different slices of the same item is handled by simpleTracks.c's packCountRowsOverflow.)

struct spaceSaver

Two new members are added to struct spaceSaver:

    int vis;                  /* Remember visibility used */
    void *window;             /* Remember window used */

I can't find any functional effect of the new member window. It is not used in spaceSaver.c, and the only use in hgTracks code is to make sure it's the same as currentWindow (which in turn is compared to windows to make sure we've been given the track instance for the first window). It also appears in a commented-out debugging warn statement. The new vis is used by the new spaceSaverAddOverflowMulti to detect full mode.

struct spaceNode

Two new members:

    struct spaceSaver *parentSs;  /* Useful for adding to parent spaceSaver with multiple windows */
    bool noLabel;                 /* Suppress label if TRUE in pack */

parentSs is used to add each node passed into the new spaceSaverAddOverflowMulti to its parent spaceSaver's nodeList. That allows packCountRowsOverflow to pass in a list of nodes to be packed together -- but that yet may have different spaceSavers... ?? TODO: why can they have different spaceSavers and what's up with findSpaceSaverAndFreeOldOnes ?? The spaceSaver code doesn't seem to use noLabel, it just stores it. In simpleTracks.c noLabel used to make sure that various things are only done for the first node of a multi-node item, e.g. invoking track->nonPropDrawItem, counting items for overflow, and making item left labels.

spaceSaverAddOverflowMulti

TODO

New data structures in hgTracks.h

struct virtRegion

This maps a particular region to its place in the reference assembly. Used only in hgTracks.c. See #Initialization_of_regions_in_tracksDisplay.

struct virtRegion
/* virtual chromosome structure */
   {
   struct virtRegion *next;
   char *chrom;
   int start;
   int end;
   char strand[2];	/* + or - for strand */
   };

struct virtChromRegionPos

In practice, this is one member of an array; the members of the array contain successive offsets within the virtual chromosome for each region. Using an array instead of a list enables binary search for regions within the current (possibly zoomed-in) position on the virtual chromosome. Used only in hgTracks.c; see functions makeVirtChrom, virtChromBinarySearch, makeWindowListFromVirtChrom, virtChromSearchForPosition.

struct virtChromRegionPos
/* virtual chromosome region position*/
   {
   long virtPos;
   struct virtRegion *virtRegion;
   };

struct positionMatch

This is a (listy) range on the virtual chrom. Lists of these are used to translate genomic coords (chrom, start, end) into ranges on the virtual chromosome. Used only in hgTracks.c; see functions virtChromSearchForPosition and findNearestVirtMatch.

struct positionMatch
/* virtual chroom position that matches or overlaps search query chrom,start,end */
{
struct positionMatch *next;
long virtStart;
long virtEnd;
};

struct window

A window is a (part of a) region that the user is currently viewing. For example, if the virtual region list encompasses all exons of the currently viewed gene, but then we zoom in to view just a couple exons, then windows are instantiated for those two exon regions. struct window contains the (viewed part of the) region's genomic coords, virtual chromosome coords, and pixel x coords (left offset and width). It has a flag used when highlighting alternate regions instead of drawing red lines between them. Its trackList contains the struct track instances that were created to load and draw this region's data (see also #Additions_to_struct_track). It is used by multiple track drawing routines (e.g. cdc.c, cytoBandTrack.c, rmskJoined.c, simpleTracks.c). See hgTracks.c functions makeWindowListFromVirtChrom, setGlobalsFromWindow as well as various loops on windows in many drawing and position-calculating functions.

struct window  // window in multiwindow image
   {
   struct window *next;   // Next on list.

   // These two were experimental and will be removed soon:
   char *organism;        /* Name of organism */
   char *database;        /* Name of database */

   char *chromName;
   int winStart;           // in bases
   int winEnd;
   int insideX;            // in pixels
   int insideWidth;
   long virtStart;         // in bases on virt chrom
   long virtEnd;

   boolean regionOdd;      // window comes from odd region? or even? for window separator coloring

   struct track *trackList;   // track list for window
   };

struct convertRange

This is used by some extremely complicated code in simpleTracks.c's linkedFeaturesNextPrevExonFind and linkedFeaturesNextPrevItem. The gist of it is to find regular genomic coordinates for some exon(s) on the virtual chrom if possible. If you've never before encountered a for loop that contains a goto jumping backwards into the middle of a previous 60+ line while loop inside an if clause, which may then iterate some more... well, here's your chance. Also, bedTrack.c's simpleBedNextPrevEdge constructs one of these and calls linkedFeaturesNextPrevExonFind on it to get virt coords.

struct convertRange
   {
   struct convertRange *next;
   char *chrom;
   int start;
   int end;
   long vStart;
   long vEnd;
   boolean found;
   boolean skipIt;
   };

New global variables

Many new global variables were added, following the convention of declaring in hgTracks.h and initializing in simpleTracks.c. However, several of these are used only within hgTracks.c.

boolean virtMode

Cart variable, see above.

char *virtModeType

Cart variable, see above. Default: "default" (i.e. not multi-region)

char *lastVirtModeType

Cart variable, see above.

char *virtModeShortDescr

Cart variable, see above.

char *lastVirtModeExtraState

Cart variable, see above.

char *multiRegionsBedUrl

Cart variable, see above. Default: ""

boolean emAltHighlight

Cart variable, see above.

int emPadding

Cart variable, see above. Default: 6

char *emGeneTable

Cart variable, see above.

struct cart *lastDbPosCart

Not a cart! struct cart is used to CGI-decode the values that were encoded in lastVirtModeExtraState into a hash. Only the hash is used.

char *virtModeExtraState

Not a cart variable, but it becomes the next value of lastVirtModeExtraState, i.e. it encodes mode-specific parameters such as singleAltHaploId and multiRegionsBedUrl (plus mod date)

struct virtRegion *virtRegionList

List of regions that are joined to make the virtual chromosome

struct virtChromRegionPos *virtChrom

Array of successive regions and offsets into the virtual chromosome

int virtRegionCount

Number of regions in virtual chromosome and size of the array virtChrom

long virtSeqBaseCount

Number of bases in virtual chromosome

long virtWinStart, virtWinEnd

Start and end of the portion of the virtual chromosome that is currently displayed.

long virtWinBaseCount

The length of the currently display portion of the virtual chromosome. Set to virtWinEnd - virtWinStart every time virtWinEnd and virtWinStart are changed.

long defaultVirtWinStart, defaultVirtWinEnd

Used only when transitioning from default mode into singleAltHaplo mode: the start/end within the virtual chromosome of the alt haplo plus flanks of equal length on the main chromosome (trimmed to start/end of main chromosome). In singleAltHaplo mode, the virtual chromosome is constructed by concatenating all assembly sequences (!) and then replacing the main chromosome sequence with the part of the main chromosome preceding the alt haplo, then the alt haplo, and then the rest of the chromosome after the alt haplo.

char *virtChromName

When in multi-region mode, virt; otherwise just the good old chromName.

boolean virtChromChanged

True only when changing from multi-region mode into another, or when a parameter change is detected using lastVirtModeExtraState; not used by C code, but is passed forward to JS code.

struct track *emGeneTrack

The struct track for gene table for exonMostly/geneMostly

struct rgbColor vertWindowSeparatorColor

Constant: { 255, 220, 220} (light red for vertical lines between regions)

char *singleAltHaploId

Default: "chr6_cox_hap2"

struct window *windows

A list of the currently viewed (i.e. within the position range) portions of regions. The virtual chromosome contains all regions applicable to the current mode, but if the user zooms in, we display only the parts of regions that they are viewing. See also #struct_window. This list is created when tracksDisplay calls makeWindowListFromVirtChrom. Note: a couple functions have local variables with the same name (disguisePositionVirtSingleChrom, nonVirtPositionFromHighlightPos), not to be confused with this global list. (There are also several uses of makeWindowListFromVirtChrom for translating ranges other than the current position.)

struct window *currentWindow

The element of the list windows that we're working on right now. This is frequently used to test if we're working on the first window (if (currentWindow == windows)), for things that should be done only once, e.g. track labels, or by code that expects to be starting at the first window. rmskJoinedTrack.c and cds.c use it to keep their own global variables in sync with the hgTracks control code.

bool trackLoadingInProgress

// flag to delay ss layout until all windows are ready.

int fullInsideX, fullInsideWidth

Full-image insideX and insideWidth, for the few tracks that need to know the offset/width of the whole image, not just the offset/width of the current region's slice.

char *singleTransId

Not described here because singleTrans mode is not UI-accessible

int demo2NumWindows, demo2WindowSize, demo2StepSize

Not described here because demo2 mode is not UI-accesible

Initialization of regions in tracksDisplay

Changes to makeActiveImage

nextPrevExon

hgTracks.js changes