BLAT-Strands-And-Frames

From genomewiki
Revision as of 07:05, 19 September 2018 by Galt (talk | contribs)
Jump to navigationJump to search

hgBlat

Most people first meet BLAT though hgBlat on the UCSC Genome Browser. It talks to the gfServer, picking either the dna server or the translated server which are defined in the blatServers table.

BLAT

Strands and Frames and Searches
Type QStrands QFrames TStrands TFrames Searches
dna 2 1 1 1 2
prot 1 1 2 3 6
rnax 1 3 2 3 18
dnax 2 3 2 3 36

For blat untranslated target index, it has only + strand indexed in 1 frame.

For blat translated target index, it has both + and - strands indexed in 3 frames.

For dna and dnax, it repeats the search after reverse-complementing the query.

For prot, it only has to deal with the frames and strands of the target. The query side is simple.

For rnax, it converts the query to rna in 3 frames, searching target 6 frames each time.

For dnax, it does rnax, then reverse-complements the query, then does rnax again.

Query Types
Type Query Type
dna DNA
prot Protein
rnax Translated RNA
dnax Translated DNA
? BLAT's Guess

For BLAT's Guess, it will only guess dna or protein. It will never guess rnax or dnax.
Also it will only look at the first sequence submitted,
and treat all the rest of the sequences submitted like that first one.


Strand Usage

q = query (what you are searching for)
t = target (usually a genome, what you are searching in)

Strand Notation
PSL Actual DNA Strand Query Types Describes Equivalent
+ + dna qStrand ++
- - dna qStrand -+
++ + prot,rnax,dnax qStrand tStrand
+- - prot,rnax,dnax qStrand tStrand
-+ - dnax qStrand tStrand
-- + dnax qStrand tStrand

For the strand output which uses two characters (all but dna),
The first strand is for the query (qStrand), the second strand is for the target (tStrand).

dna
For dna query, types are really

+ = ++ 
- = -+

Because you are always searching on the positive target strand. For untranslated queries (dna), only the positive target strand is indexed.

Showing that the query was flipped over, not the target, and then the RC'd query is searched again on the positive target strand index.

dnax
Note that both -+ and -- are ONLY seen in type dnax, which does RC of the query and searches again.


POSITIVE STRAND COORDINATES
The qStart,qEnd and tStart, tEnd coordinates are ALWAYS flipped to positive strand coordinates, for both PSL and hyperlink output types of hgBlat. Thus you are ALWAYS looking at positive strand coordinates for those, which also tends to help make things simple and consistent for the user. BUT since one or both may really be on the other strand, (if qStrand or tStrand are negative.) then sometimes those are really the end and start rather than the start and end.
The qStart is always <= qEnd, and same for tStart and tEnd.

PSL COORDINATES ONLY:
When it says that the query or the target strand is negative (-), that means the qStarts or tStarts lists will be negative strand coordinates.

The qStarts and tStarts lists are always given in ascending order coordinates.

If you do need to flip the qStarts or tStarts, be careful to first add the corresponding blockSize to the coordinate.
If k is the number of blocks or exons, and if x can be either the q or t, then

xStart[i] is the i-th element of k qStarts or tStarts in PSL record
xEnd[i] = xStart[i] + blockSize[i]   # has to be calculated, not given

To flip to the opposite strand, start and end reverse places like this: (assuming k blocks, for i = 0 to k-1)

xStartOpp[k-1-i] = chromSize - xEnd[i] = chromSize - (xStart[i] + blockSize[i])
  xEndOpp[k-1-i] = chromSize - xStart[i]

Must also flip the blockSizes list order.

This also why if you flip the query side, you must also flip the target side, or else they would not be able to share a single blocksizes list that goes in the same order for both query and target.


DETAILS LINK
Note that if you click the details link, it will show you the query on the positive strand ALWAYS.

If it is ++ or -- it just shows the target as positive strand too. If it is +- or -+ it shows you the target chrom saying (reversed).

So it is ONLY the target side that gets reversed on this display. You will never see the query get reversed on this display.

(In other words, it does not slavishly follow the +- qStrand tStrand pairs that BLAT shows.)

BROWSER LINK
Note that the browser link position is always going to be that already-positive strand coords tChrom:tStart-tEnd