Coordinate Transforms: Difference between revisions

From genomewiki
Jump to navigationJump to search
(Brief description of 1-based fully closed vs. 0-based half-open, example w/translating coords to opposite strand.)
 
(Adding FAQ & blog for 0-start references)
 
(3 intermediate revisions by 3 users not shown)
Line 1: Line 1:
The coordinate system used by most of the world is one-based, fully closed -- when a range is given, both the starting point and ending point are in the range.  The size of the range is end-start+1.  Our handy but uncommon coord system is zero-based, half-open -- the start coord is closed and the end coord is open.  We often refer to it as "0-based start, 1-based end" which is not totally accurate but gets the idea across well enough for position->bed conversion. The end is actually 0-based, but not in the range (open) -- it is the 0-based coordinate just after the last point in the range.  (So the exon end coord is also the start coord of the intron after it, another nice property in addition to the lack of +1 adjustments in the arithmetic.)
See also:
* [https://genome.ucsc.edu/FAQ/FAQtracks#tracks1 FAQ: UCSC Coordinate System]
* [http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/ Blog post: The UCSC Genome Browser Coordinate Counting Systems]
 
The coordinate system used by most of the world (e.g. NCBI) is one-based, fully closed -- when a range is given, both the starting point and ending point are in the range.  The size of the range is end-start+1.  Our handy but uncommon coord system is zero-based, half-open -- the start coord is closed and the end coord is open.  This simplies the arithmetic for size computations, which
our code does often: size = end-start, but it complicates other things. 
 
We often refer to the coordinate system as "0-based start, 1-based end" which is not totally accurate but gets the idea across well enough for position->bed conversion. The end is actually 0-based, but not in the range (open) -- it is the 0-based coordinate just after the last point in the range.  (So the exon end coord is also the start coord of the intron after it, another nice property in addition to the lack of +1 adjustments in the arithmetic.)


The notation for closed is <nowiki>[]</nowiki> and open is (), so we can write a usual 1-based range like <nowiki>[1,10]</nowiki> -- our corresponding range is <nowiki>[0,10)</nowiki>.  (zero-based, fully-closed would be <nowiki>[0,9]</nowiki>, but then one would still have the +1's that go with fully-closed range calculations.)
The notation for closed is <nowiki>[]</nowiki> and open is (), so we can write a usual 1-based range like <nowiki>[1,10]</nowiki> -- our corresponding range is <nowiki>[0,10)</nowiki>.  (zero-based, fully-closed would be <nowiki>[0,9]</nowiki>, but then one would still have the +1's that go with fully-closed range calculations.)
Line 27: Line 34:
[[Category:Browser Development]]
[[Category:Browser Development]]
[[Category:Technical FAQ]]
[[Category:Technical FAQ]]
[[Category:Browser Linked]]

Latest revision as of 15:25, 9 April 2017

See also:

The coordinate system used by most of the world (e.g. NCBI) is one-based, fully closed -- when a range is given, both the starting point and ending point are in the range. The size of the range is end-start+1. Our handy but uncommon coord system is zero-based, half-open -- the start coord is closed and the end coord is open. This simplies the arithmetic for size computations, which our code does often: size = end-start, but it complicates other things.

We often refer to the coordinate system as "0-based start, 1-based end" which is not totally accurate but gets the idea across well enough for position->bed conversion. The end is actually 0-based, but not in the range (open) -- it is the 0-based coordinate just after the last point in the range. (So the exon end coord is also the start coord of the intron after it, another nice property in addition to the lack of +1 adjustments in the arithmetic.)

The notation for closed is [] and open is (), so we can write a usual 1-based range like [1,10] -- our corresponding range is [0,10). (zero-based, fully-closed would be [0,9], but then one would still have the +1's that go with fully-closed range calculations.)

Here is an example of converting 1-based closed coords to the opposite strand's coordinates. This is often necessary when mapping coordinates given on the reverse strand to the forward strand, for comparison with other annotations on the forward strand.

We start with 1-based [oneStart, oneEnd]. We translate that range into our system: [oneStart-1, oneEnd). Once we're in our system, we simply subtract those endpoints from chromSize.

ucscRevStart = chromSize - oneEnd
ucscRevEnd   = chromSize - (oneStart - 1)

-- but those coords are still in our system! They are 0-based half-open [ucscRevStart, ucscRevEnd). We add 1 to the start coord to convert the range back to 1-based, fully closed:

[oneRevStart, oneRevEnd] = [ucscRevStart+1, ucscRevEnd]
  = [chromSize-oneEnd+1, chromSize-oneStart+1]

Both start and end have a +1 in the 1-based coordinate reversal.

I find it harder to do arithmetic within the 1-based coordinate system now that I am accustomed to 0-based, half-open, and have to count on my fingers. Here is a visual aid for 1-based coordinate reversal, where chromSize is 10:

1  2  3  4  5  6  7  8  9 10
|  |  |  |  |  |  |  |  |  |
10 9  8  7  6  5  4  3  2  1

Pick a fully-closed range within that (put your hands over the numbers before and after) and you can see that both coords end up being transformed by (size - (coord - 1)) or (size + 1 - coord) depending on how you look at it.