Visualizing Coordinates

From genomewiki
Revision as of 20:24, 13 February 2014 by Galt (talk | contribs) (Created page with "Regarding strand coordinates, there are generally two ways in which this can be done: #1. Specify coordinate on positive strand, and then after the fact, note whether it is actu...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Regarding strand coordinates, there are generally two ways in which this can be done:

  1. 1. Specify coordinate on positive strand, and then after the fact,

note whether it is actually on the negative strand. We typically use this one very much, probably because it makes it easier to compare coordinates, especially if you don't care what strand it is on.

  1. 2. Specify the strand first, and then use the coordinates of that strand.

Both are in use in general and in different places. If #2 is used and it is on the negative stand, people use the phrase that it is in "negative strand coordinates."

Cases that I can remember that do this are the chain files. Also, bizarrely enough, in the psl format, although the main start and end coordinates are in positive strand coords (probably to allow rapid coordinate compares while looking for overlaps at the whole-gene level). the actual block starts, and their order, are in negative strand coordinates.

To convert from #1 to #2, you generally takes start2 = chromSize - end1 end2 = chromSize - start1

To make my graph easier in text, lets say that S and E are start1 and end1 on pos strand coords, and s and e are start and end on neg strand coords.

                       e      s                 ...210  (neg strand coords)
                        YYYYYYY

eziSmorhc=Cnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn

          ppppppppppppppppppppppppppppppppppppppppppppC=chromSize
                        XXXXXXX
          012...        S      E                        (pos strand coords)

with our zero-based half-open coordinates, the positive strand coordinate runs from 0 to chromSize-1, that is [0,chromSize) which is also [0,chromSize-1]. Negative strand coordinates also have the same range the negative strand, of course.

So s = C - E e = C - S

With form #1, we say it is at S,E but by the way, it is really on the neg strand (-). With form #2, we say it is on the negative strand (-), at coordinates s,e.

So, do you want the coordinates first, or the strand? Either way can work.


Note that if you use one-based closed coordinates then the picture looks like this: coord range both strands: [1,chromSize]

                         e     s                ...321  (neg strand coords)
eziSmorhc=C              YYYYYYY
          nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
          pppppppppppppppppppppppppppppppppppppppppppp
                         XXXXXXX                     C=chromSize
          123...         S     E                        (pos strand coords)

s = C - E + 1 e = C - S + 1

So in these coordinates, there is usually some extra +1 or -1 that is needed in coordinate calculations.