ExonMostlyInitialDesignMeeting

From Genecats
Jump to navigationJump to search

On Sep 3, 2014, Jim, Galt and Angie met to discuss some implementation issues for the "exon-mostly" display. ExonMostlyInitialDesignMeetingWhiteboard has fuzzy snapshots of the whiteboard and a transcription of the text (as well as Angie could make out).

Performance considerations

When viewing a transcript with N exons, if each track's loading code makes a separate query for each exon region, then there will be N times as many queries as before. For hg19.knownGene, average N is ~9.

  • Will an order of magnitude increase in the number of mysql queries and/or big file queries cause performance problems?
  • What is the distance between regions at which it becomes more efficient to do one query over the regions and everything between than to do separate queries per region? This distance most likely be different for mysql vs. bigFile and for huge-table mysql (like hg19.snp138) vs small-table mysql (like hg19.gap).

Galt has the most experience with the thread-unsafeness of mysql and the incompatibilities of pthreads and forking, and will do some experiments to characterize how many simultaneous threads, processes, mysql requests etc are optimal for performance.

Display

We will use some kind of vertical marks to show the boundaries of regions. We most likely will want to show a few bases of padding on either side of exons, to see splice sites and have a little visual separation.

  • Would a user sometimes want to see 500 bp upstream of TSS too? (or 2kb??).
  • What is the upper limit for how many distinct regions we can display? If the image (ignoring left label) is 1000 px wide, and each separator is 3px wide, at 167 regions half of the width is taken up by separators. extreme cases: hg19.knownGene item uc031qqx.1 (antibody parts) on chr14 has exonCount=5065! hg19.refGene's current max exonCount is 363 (NM_001267550/TTN on chr2).]

Regions, regions, regions

There may be several incarnations of "the region list", and different parts of the code will have to be sure to use the right one:

  1. user/logical regions: exons (or if we're ambitious, unconstrained genomic regions)
  2. displayed regions: we might want to pad exons with a few bases on either side, and then merge regions that overlap (or that are so close to each other that the separator would waste space), and possibly clip to a zoomed subset of user/logical regions
  3. fetched regions: some displayed regions may be close enough so that it would be more efficient to do one mysql query covering multiple regions instead of per-region mysql queries

Text

If it weren't for left item labels, we could just render the entire transcript region and then display only some vertical slices of that image. However, we still need those labels and they may extend into region(s) to the left of an item. (Consider a gene's left label, or a SNP that falls near the beginning of an exon.) Therefore rendering of text, and packing of items, must be done in post-slice pixel coordinates.

For labels that appear outside the item, it would be good to have a function that takes chromStart, chromEnd, and text, and then translates chromStart (or chromEnd) into a post-slice pixel offset and draws the text relative to that offset. For labels that appear inside an item, a similar function could center the text on the post-slice pixel center between post-slice pixel offsets for chromStart and chromEnd.

Regions to pixels

Displayed regions will have a well-defined mapping to pixel X coordinate ranges. This mapping could be implemented efficiently using a chromosome-range tree structure. Any given genomic position range could map to 0 pixel ranges (not in any displayed region), one pixel range when the position range is a subset of one displayed region, or more than one pixel range when the position range spans multiple displayed regions. For example, if an assembly contig spans all exons of a gene, then it would be rendered in all displayed regions / pixel ranges.

The same pixel-scaling factor should be used in all displayed regions so all items are drawn at the same scale even if the regions are of different sizes. With separatorWidth being the width of whatever vertical separator we draw between displayed regions, and pixelWidth being the width of the image excluding the left label area:

 uint totalSeparatorWidth = separatorWidth * (slCount(displayedRegions) - 1);
 uint totalBasesInRegions = sumLengths(displayedRegions);
 double pixelsPerBase = (pixelWidth - totalSeparatorWidth) / totalBasesInRegions;

So for example, if our separator is 3px wide, the image area is 1000 pixels wide, and there are 10 displayedRegions that sum to 25000 bases, then pixelsPerBase works out like this:

 totalSeparatorWidth = 3 * (10 - 1) = 27;
 totalBasesInRegions = 25000;
 pixelsPerBase = (1000 - 27) / 25000 = 0.038920;
  • Will we need to account for rounding at edges of displayed regions? Small items at the end of the position range sometimes fall past the rightmost pixel in hgTracks and don't appear in the image; could we have the same problem here?

What we didn't talk about

User interface

As Braney and Max pointed out to Angie after the meeting, we can totally handle the changes to the loading and drawing code, but the UI of hgTracks has become rather messy with the addition of new features (e.g. on-by-default dialog box for zoom/highlight) and the UI of this new display should get some attention, ideally with input from a few selected biologists such as UCSC biologists and/or SAB members.

  • How does the user enter exon-mostly mode?
    • From hgTracks view, with whatever exons are in view, assuming there are exons ?
    • From gene details page ?
    • From a gateway that auto-suggests genes ?
  • If we someday support display of arbitrary regions, how do we support definitions of those regions?
    • something like TB's user-defined regions ? [how about fusion events? structural variants / rearrangements?? Those would imply strandful regions... scary]
  • Do we keep the page layout of hgTracks with 10 action buttons below the image, or can we do some redesign, for example having a toolbar or menu(s) above the image? [Especially when we're thinking about adding support for user-editable groups of tracks... what's the UI for that?]

Javascript

Angie would like to rewrite the image display using React, a Javascript library that supports simplified reasoning about UI/DOM state changes while having great rendering performance and handling browser inconsistencies back to IE 8. Paired with a library like immutable-js, we could even have a lightweight undo/redo stack (remember when it was all server-side rendering with forms, and the web browser back button worked so nicely?).