The source tree: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
No edit summary
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The source tree is the collection of all code used by the genome browser group: the browser itself but also tons of other tools that are needed to generate genome annotations. You can find information [http://genome.ucsc.edu/admin/jk-install.html in the help pages] on how to install it and [[Source tree compilation on Debian/Ubuntu|in this wiki]] for Debian/Ubuntu.
"If you can conceive of a bioinformatics job that could be done, it probably already has been done in the kent source tree."  
"If you can conceive of a bioinformatics job that could be done, it probably already has been done in the kent source tree."  
(Hiram)
(Hiram)
Line 4: Line 6:
"The truth is in the source" (slightly adapted from Angie's "[[Chains Nets]]"-article)
"The truth is in the source" (slightly adapted from Angie's "[[Chains Nets]]"-article)


The intent of this page is just to give an impression of the source tree's layout. See [[Implementation_Notes]] for details about the programs.
The intent of this page is just to give an impression of the source tree's layout. See [[Implementation_Notes]] for details about the programs.  


* Tools to explore the source tree:
* Tools to explore the source tree:
Line 11: Line 13:
** midnight commander and a rainy day to explore the source tree...
** midnight commander and a rainy day to explore the source tree...
** Hiram has a [[Kent_source_utilities|nice list]] created from the help infos  
** Hiram has a [[Kent_source_utilities|nice list]] created from the help infos  
** Use the bash construct "2>&1" if you want to pipe help messages to less. By default help goes to stderr and is not readable. <br> Example: ''overlapSelect | less'' does not work, ''overlapSelect 2>&1 | less'' is a better idea.
** Use the bash construct "2>&1" if you want to pipe help messages to less. By default help goes to stderr and goes off the screen quickly. <br> Example: ''overlapSelect | less'' does not work, ''overlapSelect 2>&1 | less'' is a better idea.


; src/java: You better hate Java if you're dealing with the UCSC genome browser. Repeat: "[http://www.joelonsoftware.com/articles/ThePerilsofJavaSchools.html I love pointers!]".
; src/java: You better hate Java if you're dealing with the UCSC genome browser. Repeat: "[http://www.joelonsoftware.com/articles/ThePerilsofJavaSchools.html I love pointers!]".
Line 19: Line 21:
; src/hg/mousestuff: everything related to chaining, netting and whole-genome alignments
; src/hg/mousestuff: everything related to chaining, netting and whole-genome alignments
; src/hg/makeDb: Tools and scripts (!) that were used to load data into the databases. Most of them in subdirs here in the format hgLoadx with x for maf/bed/wiggle/axt/net/chain/out/etc. If you understand these, you can build your own genome browser. :-)  Here are also the trackDb.ra-files, something like a list of all tracks, their descriptions and instructions on how to display them.
; src/hg/makeDb: Tools and scripts (!) that were used to load data into the databases. Most of them in subdirs here in the format hgLoadx with x for maf/bed/wiggle/axt/net/chain/out/etc. If you understand these, you can build your own genome browser. :-)  Here are also the trackDb.ra-files, something like a list of all tracks, their descriptions and instructions on how to display them.
; src/parasol: Jim's [http://www.soe.ucsc.edu/~donnak/eng/parasol.htm parasol] cluster system with documentation
; src/parasol: Jim's [http://genecats.soe.ucsc.edu/eng/parasol.html parasol] cluster system with documentation
; kent/src/hg/psl: Everyhing that has to do with psls ("psl stands for ps-Layout, where ps stands for PatSpace (presumably from Pattern-Space) which comes from a sort of earlier predecessor of blat", Galt)
; kent/src/hg/psl: Everyhing that has to do with psls ("psl stands for ps-Layout, where ps stands for PatSpace (presumably from Pattern-Space) which comes from a sort of earlier predecessor of blat", Galt)


Line 36: Line 38:




Some of the tools are already compiled and can be found at http://hgdownload.soe.ucsc.edu/admin/exe/
[[Category:Technical FAQ]]
[[Category:Technical FAQ]]

Latest revision as of 08:00, 1 September 2018

The source tree is the collection of all code used by the genome browser group: the browser itself but also tons of other tools that are needed to generate genome annotations. You can find information in the help pages on how to install it and in this wiki for Debian/Ubuntu.

"If you can conceive of a bioinformatics job that could be done, it probably already has been done in the kent source tree." (Hiram)

"The truth is in the source" (slightly adapted from Angie's "Chains Nets"-article)

The intent of this page is just to give an impression of the source tree's layout. See Implementation_Notes for details about the programs.

  • Tools to explore the source tree:
    • you can use ctags to index the source tree
    • and use cscope to search in the source tree (any better ideas for searching?)
    • midnight commander and a rainy day to explore the source tree...
    • Hiram has a nice list created from the help infos
    • Use the bash construct "2>&1" if you want to pipe help messages to less. By default help goes to stderr and goes off the screen quickly.
      Example: overlapSelect | less does not work, overlapSelect 2>&1 | less is a better idea.
src/java
You better hate Java if you're dealing with the UCSC genome browser. Repeat: "I love pointers!".
src/product
Very important info on how to install the gbdb on your own computer
src/utils
Various small tools, not especially related to a particular genome
src/hg
Almost all tools related to any kind of genome
src/hg/mousestuff
everything related to chaining, netting and whole-genome alignments
src/hg/makeDb
Tools and scripts (!) that were used to load data into the databases. Most of them in subdirs here in the format hgLoadx with x for maf/bed/wiggle/axt/net/chain/out/etc. If you understand these, you can build your own genome browser. :-) Here are also the trackDb.ra-files, something like a list of all tracks, their descriptions and instructions on how to display them.
src/parasol
Jim's parasol cluster system with documentation
kent/src/hg/psl
Everyhing that has to do with psls ("psl stands for ps-Layout, where ps stands for PatSpace (presumably from Pattern-Space) which comes from a sort of earlier predecessor of blat", Galt)
  • UCSC terms:
    • axt: some aligner, maybe called faAlign? Also the name of a file format for local alignments.
    • psl file: A format for local alignment hits used by blat. (axt and psl are convertible with axtToPsl, blastToPsl is also possible (but not tblastx))
    • chaining: post-treating local alignments to see if two aligned regions that were located close on the query are located close to each other on the subject. Here is more info: Chains Nets
    • chain file: positions of identical nucleotides (=an alignment, you can use pslToChain to create it), ->Chains Nets
    • netting: post-treating chains to group together many chains to see where regions do align (the alignment is lost), ->Chains Nets
    • net file: positions of alignable regions
    • Xa file: ? pslToXa ? What is Xa?


Cool Files:

  • /kent/src/oneShot/testScripters: cool collection of scripts to test the speed of various languages, result as expected: C with Jim's libs is much faster than anything else


Some of the tools are already compiled and can be found at http://hgdownload.soe.ucsc.edu/admin/exe/