Chains Nets: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
No edit summary
Line 24: Line 24:


Navigation: back to [[Implementation_Notes]]
Navigation: back to [[Implementation_Notes]]
[[Category:Technical FAQ]]

Revision as of 17:18, 8 May 2006

Chains and nets are Jim Kent's brainchild, published here: [http://www.pnas.org/cgi/content/full/100/20/11484]

They used to be generated by a long manual process documented in some of our older make*.doc files, but are now generated by the script kent/src/utils/doBlastzChainNet.pl .

Here are some musings on the fine points of chains and nets -- these are from Angie's mental model of chains and nets and represent opinions which may be outdated or plain old incorrect. The source code, and the results that we get by running these programs on real data, are the ultimate source of truth about chains and nets.

Chains in a nutshell:

  • a chain is a sequence of gapless aligned blocks, where there must be no overlaps of blocks' target or query coords within the chain. Within a chain, target and query coords are monotonically non-decreasing. (i.e. always increasing or flat)
  • double-sided gaps are a new capability (blastz can't do that) that allow extremely long chains to be constructed.
  • not just orthologs, but paralogs too, can result in good chains. but that's useful!
  • chains should be symmetrical -- e.g. swap human-mouse -> mouse-human chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments.
  • chained blastz alignments are not single-coverage in either target or query unless some subsequent filtering (like netting) is done.
  • chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs).

And nets:

  • a net is a hierarchical collection of chains, with the highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, for several levels. I think a chain's qName also helps to determine which level it lands in, i.e. it makes a difference whether a chain's qName is the same as the top-level chain's qName or not, because the levels have meanings associated with them -- see details page.
  • a net is single-coverage for target but not for query.
  • because it's single-coverage in the target, it's no longer symmetrical.
  • the netter has two outputs, one of which we usually ignore: the target-centric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target single-cov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and target coords. Reciprocal-best nets are symmetrical again.
  • nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level.

"LiftOver chains" are actually chains extracted from nets, or chains filtered by the netting process. Same-species liftOver chains are generated by a series of scripts that Kate wrote, in kent/src/hg/makeDb/makeLoChain/ , and use blat -fastMap as the alignment method. Cross-species liftOver chains are generated by doBlastzChainNet.pl.

Navigation: back to Implementation_Notes