https://genomewiki.ucsc.edu/api.php?action=feedcontributions&user=Max&feedformat=atomgenomewiki - User contributions [en]2024-03-29T06:05:47ZUser contributionsMediaWiki 1.38.4https://genomewiki.ucsc.edu/index.php?title=Debugging_slow_CGIs&diff=26026Debugging slow CGIs2023-09-26T09:39:06Z<p>Max: </p>
<hr />
<div>See see [[Debugging cgi-scripts]]<br />
<br />
This page is a collection of considerations and tips for troubleshooting slow CGI response, targeted toward engineers working on the Browser.<br />
<br />
* what does measureTiming say? If nothing is in measureTiming, make sure once this is resolved to add a call somewhere so the slowdown shows up <br />
* Does the problem occur on mirrors? If yes, it's unlikely to be hgnfs1.<br />
* Are the trash cleaners running? If not, disk slowness is expected because the trash/ directory quickly fills up with junk, and the file system struggles to work around having so many files in a single directory.<br />
* Does the problem also occur on hgwdev? If so, debugging is usually easier.<br />
** note that hgwdev is different. e.g. the trackDb caching is not active there and bigDataUrl/tables are checked, which is something that the RR does not do.<br />
* If you have a query string for the CGI (like hgTracks) that you know will demonstrate the problem, try runnning "gdb --args hgTracks <querystring>" a few times, ctrl-x and "bt" to get the backtrace. That can help you track down where the CGI is getting stuck.<br />
* If the CGI is exceptionally long-running and you're having trouble reproducing the issue on the command-line, you can attach to an actively running CGI process (owned by apache) with gdb. This requires sudo privileges for gdb. First, find the process id of the problematic CGI with top or ps aux or ps fax. Next, "sudo gdb <processId>". Sudo is needed because the goal is to attach to a process owned by apache, and normal users don't have permission to do that. Third, in gdb run "attach <pid>". This will interrupt the running CGI and set you up with its active callstack. From there you can run "bt" to get a backtrace and find out where you are, and then go from there.<br />
<br />
Examples:<br />
<br />
* https://redmine.gi.ucsc.edu/issues/32267: MAF drawing code accidentally was always run, even when track was in hide. No indication in measureTiming for this.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Debugging_slow_CGIs&diff=26024Debugging slow CGIs2023-09-22T22:08:10Z<p>Max: </p>
<hr />
<div>See see [[Debugging cgi-scripts]]<br />
<br />
This page is a collection of considerations and tips for troubleshooting slow CGI response, targeted toward engineers working on the Browser.<br />
<br />
* Does the problem occur on mirrors? If yes, it's unlikely to be hgnfs1.<br />
* Are the trash cleaners running? If not, disk slowness is expected because the trash/ directory quickly fills up with junk, and the file system struggles to work around having so many files in a single directory.<br />
* Does the problem also occur on hgwdev? If so, debugging is usually easier.<br />
* If you have a query string for the CGI (like hgTracks) that you know will demonstrate the problem, try runnning "gdb --args hgTracks <querystring>" a few times, ctrl-x and "bt" to get the backtrace. That can help you track down where the CGI is getting stuck.<br />
* If the CGI is exceptionally long-running and you're having trouble reproducing the issue on the command-line, you can attempt to attach to an actively running process (i.e. one owned by apache) with gdb. Note that this requires sudo privileges for gdb. First, find the process id of the problematic CGI (probably using top or something similar). Next, "sudo gdb <cgi>". Sudo is needed because the goal is to attach to a process owned by apache, and normal users don't have permission to do that. Third, in gdb run "attach <pid>". This will interrupt the running CGI and set you up with its active callstack. From there you can run "bt" to get a backtrace and find out where you are, and then go from there.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Debugging_slow_CGIs&diff=26023Debugging slow CGIs2023-09-22T22:07:52Z<p>Max: Created page with "See see Debugging This page is a collection of considerations and tips for troubleshooting slow CGI response, targeted toward engineers working on the Browser. * Does the problem occur on mirrors? If yes, it's unlikely to be hgnfs1. * Are the trash cleaners running? If not, disk slowness is expected because the trash/ directory quickly fills up with junk, and the file system struggles to work around having so many files in a single directory. * Does the problem als..."</p>
<hr />
<div>See see [[Debugging]]<br />
<br />
This page is a collection of considerations and tips for troubleshooting slow CGI response, targeted toward engineers working on the Browser.<br />
<br />
* Does the problem occur on mirrors? If yes, it's unlikely to be hgnfs1.<br />
* Are the trash cleaners running? If not, disk slowness is expected because the trash/ directory quickly fills up with junk, and the file system struggles to work around having so many files in a single directory.<br />
* Does the problem also occur on hgwdev? If so, debugging is usually easier.<br />
* If you have a query string for the CGI (like hgTracks) that you know will demonstrate the problem, try runnning "gdb --args hgTracks <querystring>" a few times, ctrl-x and "bt" to get the backtrace. That can help you track down where the CGI is getting stuck.<br />
* If the CGI is exceptionally long-running and you're having trouble reproducing the issue on the command-line, you can attempt to attach to an actively running process (i.e. one owned by apache) with gdb. Note that this requires sudo privileges for gdb. First, find the process id of the problematic CGI (probably using top or something similar). Next, "sudo gdb <cgi>". Sudo is needed because the goal is to attach to a process owned by apache, and normal users don't have permission to do that. Third, in gdb run "attach <pid>". This will interrupt the running CGI and set you up with its active callstack. From there you can run "bt" to get a backtrace and find out where you are, and then go from there.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Chains_Nets&diff=26022Chains Nets2023-09-18T13:32:20Z<p>Max: </p>
<hr />
<div>Chains and nets are higher-level collections of basic pairwise sequence alignments. Cross-species nets are used to make a single-coverage (on the reference genome) collection of pairwise alignments that are the bases of our Multiz multi-species alignments in the Conservation track. The chain and net algorithms, as well as results from human-mouse alignments, were [[http://www.pnas.org/cgi/content/full/100/20/11484 published]] in 2002. They are generated from genomic local alignments computed by [[Blastz]] (2002-2008) or [[Lastz]] (2008-) post-processed by a series of UCSC programs, most notably axtChain, chainNet and netFilter.<br />
<br />
The contents of this page are from [[User:AngieHinrichs|Angie]]'s mental model of chains and nets and represent opinions which may be outdated or plain old incorrect. The source code, and the results that we get by running these programs on real data, are the ultimate source of truth about chains and nets. <br />
<br />
Please keep in mind that the outputs of any alignment algorithm are not the final Truth about homology between sequences. The scoring system and other parameters of any alignment algorithm are designed to produce high scores for similarities that would likely result from some model of nucleotide-level evolution; tweaking a parameter can change the results significantly. The quality and completeness of the reference assemblies also affect alignment results. That said, chains and nets are powerful constructs for identifying similarities over very large regions of the genome, and inferring chromosomal rearrangements that may have occurred as the two sequences diverged from a common ancestral sequence.<br />
<br />
== Basic definitions ==<br />
<br />
In chain and net lingo, the '''target''' is the reference genome sequence and the '''query''' is some other genome sequence. For example, if you are viewing Human-Mouse alignments in the Human genome browser, human is the target and mouse is the query.<br />
<br />
A '''gapless block''' is a base-for-base alignment between part of the target and part of the query, possibly including mismatching bases. It has the same length in bases on the target and the query. This is the output of the most primitive alignment algorithms. <br />
<br />
A '''gap''' is a link between two gapless blocks, indicating that the target or the query has sequence that should be skipped over in order to make the best-scoring alignment. In other words, the scoring penalty for skipping over one or more bases is less than the penalty for continuing to align the sequences without skipping. <br />
<br />
A '''single-sided gap''' is a gap in which sequence in either target or query must be skipped over. A plausible explanation for needing to skip over a base in the target while not skipping a base in the query is that either the target has an inserted base or the query has a deleted base. Many alignment tools produce alignments with single-sided gaps between gapless blocks. <br />
<br />
A '''double-sided gap''' skips over sequence in both target and query because the sum of penalties for mismatching bases exceeds the penalty for extending a gap across them. This is possible only when the penalty for extending a gap is less than the penalty for creating a new gap and less than the penalty for a mismatch, and when the alignment algorithm is capable of considering double-sided gaps. <br />
<br />
== Chains in a nutshell ==<br />
<br />
A '''chain''' is a sequence of non-overlapping gapless blocks, with single- or double-sided gaps between blocks. Within a chain, target and query coords are monotonically non-decreasing (i.e. always increasing or flat). Chains are constructed by the axtChain program which finds pairwise alignments with the same target and query sequence, on the same strand, that can be merged if overlapping and joined into one longer alignment with a higher score under an affine gap-scoring system (progressively decreasing penalties for longer gaps).<br />
<br />
* double-sided gaps are a new capability (blastz can't do that) that allow extremely long chains to be constructed.<br />
* not just orthologs, but paralogs too, can result in good chains. but that's useful!<br />
* chains should be symmetrical -- e.g. swap human-mouse -> mouse-human chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments. However, [[Blastz]]'s dynamic masking is asymmetrical, so in practice those results are not exactly symmetrical. Also, dynamic masking in conjunction with changed chunk sizes can cause differences in results from one run to the next.<br />
* chained blastz alignments are not single-coverage in either target or query unless some subsequent filtering (like netting) is done. <br />
* chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs).<br />
<br />
== chains and the strand ==<br />
<br />
Chains are unusual in that the position is specified from the end of the chromosome, rather than the start, when they're on the negative strand. This has led to a lot of grief over the years and confuses most programmers when they start coding with chain files.<br />
<br />
Also, the "chain" database tables are not exactly like the chain file. Brian in #19186: In our databases, chains are assumed to be on the + strand on the target/reference, that is, we don't save the target strand in the database schemas for chains. The chain file format, which is something different than the schemas we use for the TWO tables that make up each chain track, explicitly stores the target strand so it could theoretically be -.<br />
<br />
In both formats the blocks that make up a single chain are defined to have the same target and query strand.<br />
<br />
== Nets in a nutshell ==<br />
<br />
A '''net''' is a hierarchical collection of chains, with the highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, which in turn may have gaps filled in by lower-level chains and so on. <br />
<br />
* I think a chain's qName also helps to determine which level it lands in, i.e. it makes a difference whether a chain's qName is the same as the top-level chain's qName or not, because the levels have meanings associated with them -- see details page. <br />
* a net is single-coverage for target but not for query, unless it has been filtered to be single-coverage on both target and query. By convention we add "rbest" to the net filename in that case.<br />
* because it's single-coverage in the target, it's no longer symmetrical.<br />
* the netter has two outputs, one of which we usually ignore: the target-centric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target single-cov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and target coords. Reciprocal-best nets are symmetrical again. <br />
* nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level.<br />
<br />
"LiftOver chains" are actually chains extracted from nets, or chains filtered by the netting process. <br />
<br />
== History ==<br />
<br />
Chains and nets are [[User:Jimkent|Jim Kent]]'s brainchild, building on joint work with blastz author Scott Schwartz. <br />
<br />
Cross-species chains and nets used to be generated by a long manual process documented in some of our older makeDb/doc/*.txt files, but since ~2006 they have been generated by the script kent/src/hg/utils/automation/doBlastzChainNet.pl .<br />
<br />
Same-species liftOver chains use blat -fastMap as the alignment method, and are generated by kent/src/hg/utils/automation/doSameSpeciesLiftOver.pl, based on a series of scripts that [[User:Kate|Kate]] wrote in kent/src/hg/makeDb/makeLoChain/.<br />
<br />
== FAQs? ==<br />
<br />
The original publication describing chains and net, Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes (Kent et al., PNAS September 30, 2003 100 (20) 11484-11489), might be helpful for understanding the rationale behind the process.<br />
<br />
> It’s somehow opposite with the alignment file, right? For example, the psl file records info of sequences from query assembly mapping agains sequences <br />
> from target assembly.<br />
<br />
The PSL format lists query coords before target coords, but the UCSC convention is that the target is the reference genome (on which the Genome Browser displays) and the query is the 'other' (the query could be mRNA from the same species as the target/reference genome, or could be another species). The PSL format is the native output format of BLAT, which differs from BLAST in that BLAT indexes the target and scans the query while BLAST indexes the query and scans the target. Those are pretty esoteric differences, but I think they help explain why the UCSC view of target and query might be the opposite of some others.<br />
<br />
> And then collapse the chains to one.<br />
<br />
The chains are not collapsed to one; that would violate the constraint on a chain that it has monotonically increasing coordinates on the same target and query sequences in the same orientation. In the net, the two chains are retained at different levels (the primary alignment is at the top level and the secondary alignment is at a lower level). When the netChainSubset program extracts the liftOver chains from the net and full set of chains, it outputs the complete primary chain, but it outputs only a portion of the secondary chain.<br />
<br />
> I don’t understand why use the secondary alignment to fill the gap in the primary alignment.<br />
<br />
Chains and nets were designed to capture medium-to-large-scale rearrangements during evolution of species from a common ancestor: duplications, inversions, translocations. For example, in the case of an inversion, we would expect a gap in the top-level alignment to coincide with an alignment to the opposite strand of the same sequence (and similar breakpoints). With a translocation, a gap in the top-level alignment might be "filled" by an alignment to some other chromosome. When there is a duplication in the target, the target might have two (diverged) copies of the ancestral sequence that align to the same un-duplicated location in the query. In that case, the top-level chain would have a gap that is filled by an alignment to the same location on the query (single-coverage on the target, but multiple coverage on the query).<br />
<br />
> I suppose chain should reflect the true difference between two assembly. Say the contig a is actually corresponding to the primary hit region in <br />
> hg19. Here if the gap is filled as described above to generate a chain, wouldn’t that cause the gapped bases from hg19 being lifted to <br />
> a false corresponding region in contig?<br />
<br />
All bioinformatics algorithms are attempts to approximate the truth, and they fall short, especially when their assumptions and parameters are not tuned exactly right for the question at hand. If alignment parameters are overly sensitive for the actual divergence of target and query, then yes, spurious alignments will probably appear in the results. But there may be other explanations for unexpected alignments. (Assembly errors, sequencing errors, unexpected variation, some new discovery for you to make....) There is no simple answer that applies to all situations, and no substitute for trying different parameters and methods and carefully examining the results to see what works best for your particular application. It may help to make custom tracks using our bigChain format so you can examine and compare results in the Genome Browser.<br />
<br />
> I’ve recently discovered that we can use minimap2 to generate alignment file and convert to psl and use that psl to generate chain file. If I can <br />
> solve the multiple-mapping issue in the alignment file level, I don’t need to perform netChainSubset right?<br />
<br />
Perhaps -- that is up to you to determine. Best wishes for your research! If you know results of an evaluation using minimap2 versus lastz, please contact us at genome@soe.ucsc.edu.<br />
<br />
Navigation: back to [[Implementation_Notes]]<br />
<br />
[[Category:Technical FAQ]]<br />
[[Category:Comparative Genomics]]</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Graphviz_static_build&diff=26015Graphviz static build2023-06-26T14:39:59Z<p>Max: </p>
<hr />
<div>Dot is not very static by default. <br />
<br />
I first installed + compiled cairo-1.14.8 pango-1.40.5 libpng-1.6.29 fontconfig-2.12.1 harfbuzz-1.4.6 pcre and libgd-2.2.3<br />
<br />
The source code is all in ~max/software with log.txt sprinkled around with the commands I used. The result is in ~max/usr/lib.<br />
<br />
* libpng:<br />
** curl -LO https://download.sourceforge.net/libpng/libpng-1.6.39.tar.gz<br />
** tar xvzf libpng-1.6.39.tar.gz<br />
** cd libpng-1.6.39 <br />
** configure --prefix=${HOME}/usr --enable-static <br />
** make && make install<br />
* bz2:<br />
** change the Makefile and add -fpic to CFLAGS<br />
** make<br />
** make install PREFIX=/cluster/home/max/usr<br />
* pcre:<br />
** configure --prefix=${HOME}/usr --enable-static --enable-utf && make -j40 && make install<br />
* cairo:<br />
** configure --prefix=${HOME}/usr --enable-static --without-x --enable-xcb=no --enable-xcb-shm=no --enable-xlib-xcb=no --without-xlib-xrender --without-xlib --enable-xlib-xrender=no --enable-xblib=no --disable-xlib --disable-x 2>&1 | less<br />
* pango<br />
** configure --prefix=${HOME}/usr --enable-static --without-xft --enable-static --with-pic CFLAGS=-fPIC CXXFLAGS=-fPIC --disable-shared<br />
* libgd<br />
** configure --prefix=${HOME}/usr --enable-static --without-x --without-xpm<br />
** need to add #include <limits.h> to gd_gd2.c, otherwise it doesn't compile on centos (release: 2.2.4 from early 2017)<br />
* fontconfig<br />
** change src/fcxml.c, add "return;" in line 560 at the beginning of the function FcConfigMessage<br />
** This shuts up the warnings. The format of font.conf has changed and on newer centos versions this leads to tons of warnings<br />
** configure --prefix=${HOME}/usr --enable-static --sysconfdir=/etc --localstatedir=/var --disable-docs --with-pic && make -j30 <br />
** make install does not work<br />
*** cp src/.libs/libfontconfig.a ~/usr/lib/libfontconfig.a<br />
*** cp src/.libs/libfontconfig.so ~/usr/lib/libfontconfig.so <br />
<br />
If you compiled with only libgd then there is antialiasing for some reason. This is why I added cairo.<br />
<br />
Download and extract graphviz 2.28.0<br />
<br />
configure --prefix=${HOME}/usr --enable-static=yes --enable-ltdl=no --enable-swig=no --enable-tcl=no --enable-x=no --with-expat=no --with-visio=no --with-cgraph=no --with-fontconfig=no --disable-sharp --disable-guile --disable-io --disable-java --disable-lua --disable-ocaml --disable-perl --disable-php --disable-python --disable-ruby --disable-tcl --enable-shared=no --with-gtk=no --with-poppler=no --with-gdk-pixbuf=no --with-fontconfig=no --with-gtkgl=no --with-gtkglext=no --with-ann=no --with-glade=no --with-qt=no -with-x=no<br />
<br />
After a graphviz compile, cd cmd/dot and<br />
<br />
rm dot_static<br />
make CCLD="echo gcc"<br />
<br />
Based on the output, I constructed this GCC command:<br />
<br />
gcc -g -O2 -Wno-unknown-pragmas -Wstrict-prototypes -Wpointer-arith -Wall -ffast-math -o dot_static dot_static-dot.o dot_static-dot_builtins.o -L/cluster/home/max/usr/lib ../../plugin/dot_layout/.libs/libgvplugin_dot_layout_C.a ../../plugin/neato_layout/.libs/libgvplugin_neato_layout_C.a ../../plugin/core/.libs/libgvplugin_core_C.a ../../lib/gvc/.libs/libgvc_C.a ../../lib/pathplan/.libs/libpathplan_C.a ../../lib/cgraph/.libs/libcgraph_C.a ../../lib/xdot/.libs/libxdot_C.a ../../lib/cdt/.libs/libcdt_C.a -L/usr/lib64 ../../plugin/gd/.libs/libgvplugin_gd_C.a /cluster/home/max/usr/lib/libgd.a /cluster/home/max/usr/lib/libjpeg.a ../../plugin/pango/.libs/libgvplugin_pango_C.a /cluster/home/max/usr/lib/libpangocairo-1.0.a /cluster/home/max/usr/lib/libpangoft2-1.0.a /cluster/home/max/usr/lib/libharfbuzz.a /cluster/home/max/usr/lib/libpcre.a /cluster/home/max/usr/lib/libpango-1.0.a /cluster/home/max/usr/lib/libgthread-2.0.a /cluster/home/max/usr/lib/libcairo.a /cluster/home/max/usr/lib/libpixman-1.a /cluster/home/max/usr/lib/libfontconfig.a /cluster/home/max/usr/lib/libexpat.a /cluster/home/max/usr/lib/libfreetype.a /cluster/home/max/usr/lib/libbz2.a /cluster/home/max/usr/lib/libpng.a /cluster/home/max/usr/lib/libpng16.a -lz -lm /cluster/home/max/usr/lib/libgobject-2.0.a -lffi /cluster/home/max/usr/lib/libglib-2.0.a -lpthread -lrt -pthread -Wl,-rpath -Wl,/cluster/home/max/usr/lib -Wl,-rpath -Wl,/cluster/home/max/usr/lib<br />
<br />
Now it's much more static:<br />
<br />
linux-vdso.so.1 => (0x00007ffdd2571000)<br />
libpng12.so.0 => /usr/lib64/libpng12.so.0 (0x0000003ba8200000)<br />
libz.so.1 => /cluster/software/lib/libz.so.1 (0x00007f1a3c264000)<br />
libm.so.6 => /lib64/libm.so.6 (0x0000003ba3200000)<br />
libffi.so.5 => /usr/lib64/libffi.so.5 (0x00007f1a3c05c000)<br />
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003ba3600000)<br />
librt.so.1 => /lib64/librt.so.1 (0x0000003ba4200000)<br />
libc.so.6 => /lib64/libc.so.6 (0x0000003ba2e00000)<br />
/lib64/ld-linux-x86-64.so.2 (0x0000003ba2a00000)</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Genome_Browser_Software_Features&diff=25989Genome Browser Software Features2022-11-05T05:57:33Z<p>Max: </p>
<hr />
<div>This page contains a list of major new software features found in the UCSC Genome Browser. The most recent features are always at the top of the page. A more detailed list can be found at https://genecats.gi.ucsc.edu/builds/versions-all/v<versionNumber>.html<br />
<br />
= 2022 =<br />
<br />
== 18 October 2022, v438 ==<br />
<br />
* Added 'copy link' function to gateway description page on GenArk hubs<br />
<br />
== 27 September 2022, v437 ==<br />
<br />
* Adapted Angie's existing variant effects to the new dbSnp that uses SPDI and provided via json formatted data. Now users can see variant effects with version 153 or later<br />
* Added a loading icon on hgGateway when a species or a position is searched<br />
* Updated OMiM build process to use ncbiRefSeq on hg19 and hg38 instead of refGene<br />
<br />
== 6 September 2022, v436 ==<br />
<br />
* Added support for monkeypox/hMPXV to hgPhyloPlace<br />
<br />
== 16 August 2022, v435 ==<br />
<br />
* Gene search on hgGateway now automatically goes to hgTracks position<br />
<br />
<br />
== 26 July 2022, v434 ==<br />
<br />
* Expanded support for psl and chain display<br />
* Added message to track long label if either doWiggle or maxWindowCoverage is active<br />
<br />
== 7 July 2022, v433 ==<br />
<br />
* Expanded chromAuthority support to more track hubs and now available as an option in the Table Browser.<br />
* Copy link icon added to freshly made sessions as well as connected hubs. Likewise "My hubs" renamed to "Connected hubs".<br />
<br />
== 14 June 2022, v432 ==<br />
<br />
* Fixed bug where track hubs attached to assembly hubs were not being searched<br />
* Changed trackDb html search to not require files end in ".html"<br />
* New style RepeatMasker track display now added to GenArk hubs<br />
<br />
== 24 May 2022, v431 ==<br />
<br />
* SQL INJECTION Prevention Version 2 protects subclauses with new %-s behavior that enforces correctness<br />
* Hashtag '#' character now allowed in chrom names<br />
* Made a bedGraphToWig utility for people who have made bedGraphs that could be more densely represented as wigs<br />
* Added first draft code to support specifying a chromosome naming authority<br />
<br />
== 3 May 2022, v430 ==<br />
<br />
* Released new Genark request page <br />
* Added support for chromAlias.bb to bedGraphToBigWig<br />
* Genark hubs now supported in hgConvert and hgLiftOver if they are in hgcentral<br />
* Hi-C tracks now have pack and squish display modes to help limit how much screen space they consume. Pack behaves like the old dense display (1/2 height), squish is 1/4 height, and dense is now 1/8 height<br />
* Hi-C tracks can now filter on interaction distance<br />
* Added chromAlias support to snake display<br />
<br />
== 12 April 2022, v429 ==<br />
<br />
* Sessions must now have a description before they can be added to the Public Sessions listing<br />
* Added chromAlias support to BAM display<br />
* Enabled density coverage mode for PSL tracks<br />
* Allow big* tracks with only a bigDataUrl to be used in Extended DNA<br />
<br />
== 22 March 2022, v428 ==<br />
<br />
* Added barChartBarMinWidth and barChartBarMinPadding trackDb statements for barChart tracks in hubs to specify the minimum pixel width and padding of bars in hgTracks<br />
* Made change to allow trackDb.txt files to be 256K, up from 64K<br />
* Added support for using a bigBed as a chromAlias file in order to take advantage of bPlus tree indices on native chroms as well as aliases<br />
* Added support to bedToBigBed to pass in bigBed as chrom sizes so bigBeds can be built with any sequence names in the chromAlias file<br />
<br />
== 1 March 2022, v427 ==<br />
<br />
* Reduced size of title font on hgTracks<br />
* Added support for chromAlias table in big* files<br />
* Added comma separated output option to table browser when doing primary table or selected fields output for easy opening in excel<br />
* Committed work on GenArk request form<br />
* Fixed multi-threading issues caused by setenv<br />
* Fixed the last instance of back-quoting needed for the new keyword `offset` for MariaDb upgrade<br />
* Made cart rewrites on by default via hg.conf, previously the cartVersion var<br />
<br />
== 8 February 2022, v426 ==<br />
<br />
* Expanded support for chromAlias, now works with big* files and IGV-style format+chromToUcsc<br />
* Fixed compatibility for MariaDb 10.6 which has a new keyword 'offset'<br />
* Smaller hgTracks assembly titles<br />
<br />
== 18 January 2022, v425 ==<br />
<br />
* hg19 kgXref, kgAlias, and search index files to have modern gene symbols<br />
* Removed incognito mode feature since it was proving confusing, and was not being used<br />
* Fixed message in needMem when signed int has overflowed. Example from hgLiftOver when user submitted a 3GB file<br />
* Fixed and updated GB dockerfile, icluding update to Ubuntu 20, fixing compatability with MaraDb, and sprucing up the docs<br />
* Fixed message in needMem when signed int has overflowed. Example from hgLiftOver when user submitted a 3GB file<br />
<br />
= 2021 =<br />
<br />
== 14 December 2021, v424 ==<br />
<br />
* Checked in Robert Hubley's support for bigRmsk track type <br />
* Moved the entire repo to a MIT-by-default license <br />
* Added "incognito" mode to hgTracks which means the browser ignores cookies <br />
* Major updates to the UniProt otto job, now running on 117 assemblies <br />
* Added new features to https like cert verification, error messages that work right with warn and errAbort and our GUI in hgCustom and hgHubConnect and also logging especially for CGIs of problem certs. Also added callback so that we can support multiple levels like warn instead of abort. Cert verify options controlled by env vars and also hg.conf variables [abort|log|warn|none] for CGIs that use the cart. Default is log. Added basic info about the new httpsCertCheck setting to various documents. Added -httpsCertCheck=[abort|warn|none] command-line option to the hubCheck utility<br />
<br />
== 16 November 2021, v423 ==<br />
<br />
* Added timeout option to pipeline functions to allow killing of long-running pipelines, especially ones run from CGIs<br />
* Various UI and text changes to hgHubConnect to make it easier for newbie users<br />
* Added google analytics key to hgdownload and static pages on hgdownload<br />
<br />
== 26 October 2021, v422 ==<br />
<br />
* Added option to track search to also search tracks in unconnected hubs<br />
* Added ssl certificate verification with LETSENCRYPT-compatible trust_first flag to https.c <br />
<br />
== 5 October 2021, v421 ==<br />
<br />
* Changed the primary gene set on hg19 to Gencode V38lift37<br />
* Removed botDelay sleep() until it reaches the warnMs threshold, then use the sleep() and begin issuing the warning message<br />
* Highlight color now persists in the cart <br />
<br />
== 14 September 2021, v420 ==<br />
<br />
* Review of custom track bottleneck penalty policy. Added variable in hg.conf, customTracks.botCheckMult which is an integer to set the penalty for custom track submissions<br />
* Added small folder icons to composite and superTracks on hgTracks<br />
* Added RedMine help widget<br />
<br />
== 24 August 2021, v419 ==<br />
<br />
* Now allows matching on multiple transcripts instead of just the longest when a user enters a pseudo HGVS search term like Gene AminoAcidPos<br />
* hgTrackUi pop-up control on hgTracks now scrollable. This prevents the pop-up window from extending past the bottom of the screen, in turn fixing the always the previously hidden submit buttons<br />
* Finishing Kate's changes to the name of the virtual chrom from "virt" to "multi" in hgConvert, hgIntegrator, hgTables, hgVai <br />
* Various fixes in support of bigMaf tracks<br />
<br />
== 3 August 2021, v418 ==<br />
<br />
* Changed all calls to system() and popen() to use our pipeline library.<br />
* Added Google Analytics button click tracking to hgTracks.<br />
* Implemented mysql advisory locks to prevent duplications in hubStatus table.<br />
* The local cache directory for remote files (udcCache, which is used for file types like bigBed, bigWig, VCF, and bam) now defaults to $TMPDIR/udcCache. The previous default of /tmp/udcCache will still be used if $TMPDIR is undefined in the local environment.<br />
* Various updates to GBiC/GBiB - New push location, changes to GBiC install script to remove errors, no longer support Ubuntu 14, and support for local-only assemblies.<br />
<br />
== 13 July 2021, v417 ==<br />
<br />
* Added the hogExit function to CGIs hgGene, hgTrackUi, hgBlat, hgc, hgPcr, cartDump.<br />
* Changed udcTimeout warning text to a notification “bar” over the position box but under the menu bar.<br />
* singeList bigBed filters fixed, and default filterValues set to filter by multiple.<br />
<br />
== 22 June 2021, v416 ==<br />
<br />
* Added cart editing ability to address changing trackDb hierarchies.<br />
* Allow variable size data tables on hgc.<br />
* Moved warning box to be in the middle of the screen.<br />
<br />
== 1 June 2021, v415 ==<br />
<br />
* Table Browser UI improvements: add section headers, improve layout and add help on mouseover.<br />
* Enable dynamic blat service for all GenArk assembly hubs.<br />
* Add mouseOverFunction option to trackDb to turn off mouse over value display when the numbers become averages.<br />
<br />
== 11 May 2021, v414 ==<br />
<br />
* Auto-attach GenArk hubs if a track hub is attached that designates a non-native database found in GenArk.<br />
* Add db argument to hgc and hgTrackUi URLs so they do not trigger error messages when shared.<br />
* Escape dots in track names in hgTracks.js so track names support dots.<br />
<br />
== 20 April 2021, v413 ==<br />
<br />
* Improvements to dynamic BLAT servers, better error reporting, blatServersCheck support, and activation.<br />
* Make the use of FreeType fonts be the default.<br />
* GBiC now updates its hgcentral.<br />
* Make hgHubConnect use the pipeline library to run hubCheck instead of popen().<br />
<br />
== 30 March 2021, v412 ==<br />
* Add support for dynamic BLAT servers. See http://genomewiki.ucsc.edu/index.php/Running_your_own_gfServer.<br />
* Multi-region feature enhancements, using 'custom regions' mode for sparse tracks. Includes new trackDb setting "multiRegionsBedUrl".<br />
* Implement maxWindowCoverage, a trackDb variable to force the browser to put a track into density coverage mode when the size of the window is bigger than N bases.<br />
<br />
== 1 March 2021, v411 ==<br />
* New JSON output for hgBlat.<br />
* Facet counts now added to Tracks with Dropdown SubGroups.<br />
* Item and gene search refined to give fewer and better results.<br />
* Work on new barChartFacets tag to allow faceted selection on bar charts. <br />
* Add API support for tables with non-standard chrom names (e.g. tname).<br />
* Fixes to better display Hubs using dimensions such as from ENCODE portal.<br />
* Quality checks and improvements for liftOver.<br />
* Ongoing improvements to hgPhyloPlace.<br />
<br />
== 16 February 2021, v410 ==<br />
* Improvements to the Gateway page to help discover Public Assembly Hubs via the common name of the species for novel organisms.<br />
* Final changes for the new wiggle mouse-over data display feature.<br />
* Internal fixes and improvements on the process of data release to support data hosted from binary-indexed (/gbdb/) files and automatic track update procedures around formats like bigPsl.<br />
<br />
== 26 January 2021, v409 ==<br />
* Enhancements to the site not limited to improving exon display, liftOver performance, multi-region display, and alignment functionality. <br />
* Ongoing work to improve bigBarChart displays.<br />
* Ongoing work for new wiggle mouseover pop-up with data values.<br />
* Release of new font options.<br />
<br />
= 2020 =<br />
<br />
== 15 December 2020, v408 ==<br />
* New isPcr setting to allow PCR searches on assembly hubs.<br />
* New barChart support settings barChartCategoryUrl to specify a tab-separated file containing category (bar) labels and colors.<br />
* Ongoing work for new wiggle mouseover pop-up with data values.<br />
* Ongoing work for new font options.<br />
<br />
== 1 December 2020, v407 ==<br />
* New direct search results to Public Sessions, for example, http://genome.ucsc.edu/cgi-bin/hgPublicSessions?search=wuhCor1, where search=wuhCor1 will display all Public Sessions on the coronavirus wuhCor1 genome.<br />
* Improvements to VCF data displays and mouseovers in clinical data, removing default exon display in bigBeds and adding strand and accession.<br />
* Ongoing work for new features including new wiggle mouse over pop-up with data values.<br />
<br />
<br />
== 3 November 2020, v406 ==<br />
* Work to support clinical users including adding a mouseOver for the OMIM Gene track, and reworking the Gene Reviews track format to support disease info in mouseOver and having default accession displays for refGene.<br />
* Work to improve the speeds on the Track Setting page of large consortium hubs, so View, Facets, and Matrices display quickly. <br />
* Work to improve Blat servers and the Blat All function to avoid timeouts from external mirror queries.<br />
* Ongoing work for new features including new font options, new wiggle mouse over pop-up with data values.<br />
* Work to support VCF data displays formats relating to the coronavirus genome.<br />
<br />
== 13 October 2020, v405 ==<br />
* New display of chromosome alias names from the "view sequences" link on a genome's gateway page.<br />
* Various UI improvements such as small grey boxes around the filters so one can see where a filter starts and where it ends when multiple ones are offered and a new trackDb setting sampleColorFile for coloring a tree in VCF+tree display.<br />
* Enhancements to support clinical data tracks and the continuous updating of the data with automatic scripts.<br />
<br />
== 22 September 2020, v404 ==<br />
* A new chromAlias function enabled for bed file custom track input to map to alternative names. <br />
* Support for a new mouseOver trackDb setting, which allows constructing bigBed (and related) item mouseover's from a pattern using arbitrary text and fields in the data schema.<br />
* Improvements to the VCF trio display to color by function when variants are in an intergenic region.<br />
* Various fixes and improvements to highlights, shortcuts, hub support, and ongoing work on new features. <br />
<br />
<br />
== 1 September 2020, v403 ==<br />
* Enhancements to meet Chrome Browser asynchronous requests changes.<br />
* Enhancements to VCF functions, from details page to toleration of IUPAC ambiguous characters and handling of viewing monoploid data (such as SARS-CoV-2).<br />
* Further steps toward a new feature of Recommended Track Sets and initial work on a new Related Tracks feature.<br />
<br />
== 11 August 2020, v402 ==<br />
* New feature: Recommended Track Sets.<br />
* Added new trackDb options to turn off VCF filters.<br />
* Added two new color options for trio display - color by functional effect and de novo child variants.<br />
* Updated geoIp data to ensure that users are directed to the nearest mirror site.<br />
<br />
== 21 July 2020, v401 ==<br />
* Enhancements to new track and display type <code>vcfPhasedTrio</code> in development.<br />
* Enabling VCF tracks to color haplotype display by variant's effect on gene.<br />
* Speeding up VCF tree-parsing with tree files with thousands of nodes.<br />
* Documentation to aid users desiring to obtain new dbSnp table data format from the Table Browser. <br />
* Improvements to various tools such as genePredToGtf to correctly number exons on the negative strand.<br />
* Work to merge the display of GENCODE data involving building a new database (for knownGene tables) and supporting software presenting details about genes (hgGene). <br />
<br />
== 30 June 2020, v400 ==<br />
* New autocomplete in the position bar clicks the go button when selecting a gene search result.<br />
* Improved the ability to allow the copying and pasting of genomic coordinates in the main browser page.<br />
* Work to support new ENCODE Candidate cis-Regulatory Elements (cCRE) tracks for hg38 and mm10. <br />
* Improvements and fixes to support Pubic Hubs, Assembly Hubs and BLAT.<br />
<br />
== 9 June 2020, v399 ==<br />
* New super track display: to make hierarchy more apparent, the child member of supertracks now have a description and track list of all super track children on the configuration page.<br />
* New track and display type <code>vcfPhasedTrio</code> in development.<br />
* Support of smaller genomes and RNA genomes from BLAT searches to VCF configuration options. <br />
* Improvements to Public Hub Search to avoid issues when remote hubs are non-functional.<br />
* Improvements of links to GnomAD, ExAC VCF details. <br />
<br />
== 19 May 2020, v398 ==<br />
* Increase in the default label character width from 17 characters to 20 characters. <br />
* New feature [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#mergeSpannedItems mergeSpannedItems setting] allows merging all track items that extend beyond both sides of the current viewing window into one bed item in the display. <br />
* Interface enhancements to new [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#hideEmptySubtracks hideEmptySubtracks setting].<br />
* Improvements to support RNA viral genomes such as searching UUU in the shortMatch track and VCF improvements to display a phylogenetic tree of virus strains and improvements to BLAT for smaller genomes.<br />
<br />
== 28 April 2020, v397 ==<br />
* Enhancements to the barChart format to take new [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#barChartSizeWindows <code>barChartSizeWindows</code>] setting allowing customizability of display.<br />
* Improvements to the ClinVar tracks for clinical use and enhancements to the VCF format for label display. <br />
* Other work to improve the automatic update of data in the Browser and the building of new data tracks.<br />
<br />
== 7 April 2020, v396 ==<br />
* New "Hub Development" tab enhanced with checkboxes to add "measureTiming" providing statistics about timing and "udcTimeout" allowing speedier checks of hub changes when developing. <br />
* Enhancements to VCF file format handling, such as new option to cluster variants as ordered in the original file. <br />
* Enhancements to new [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#hideEmptySubtracks hideEmptySubtracks setting] in regards to syntax and issues. <br />
* Enhancements to support Hi-C file format, and to GBiC setup script to specify specific assemblies and other changes.<br />
<br />
== 17 March 2020, v395 ==<br />
* Improvements across the site to better work with HiC data.<br />
* Enhanced error control for unusual submitted custom data, such as invalid group assignments. <br />
* Improvements to utilities to be more collaborative with external toolkits from better usage statements to the handling of chromosome name conversions. <br />
<br />
== 21 February 2020, v394 ==<br />
* Public sessions now sort by most recent addition.<br />
* Fix of links to the GTEx portal from the details page of an item in the GTEx track.<br />
* Improvements of the autoscale in Track Collections with add and subtract mode<br />
* Improvements to the filtering function now provided for Track Hubs<br />
* Improvements for using the Table Browser to extract data via identifiers for data in bigBeds such as Track Hubs<br />
* Improvements to the hubClone tool allowing users to duplicate a hub's structural text files as well as simultaneously use a download option to acquire all of the remote data underlying the hub.<br />
* Other fixes and enhancements, such as to automatic assembly hub building procedures and work to establish data archives. <br />
<br />
== 4 February 2020, v393 ==<br />
* Increased in hubs the number of subGroups to support large external consortium projects, such as the ENCODE DAC hub. <br />
* Various fixes to multiple levels of the code from VCF Filtering, enhancing recent new features and better supporting the GBiB mirroring operations.<br />
* Various ongoing work to support quicker track display performance and work to allow IPv6 better for mixed IPv6/IPv4 systems (discovered by mirror site). <br />
<br />
== 14 January 2020, v392 ==<br />
* Changes to the new lollipop display to use color field rather than automatically being colored.<br />
* Changes to improve the new SNP data format.<br />
* Improvements to the new API in regards to DNA extraction.<br />
* Enhancements for future gnomAD track.<br />
* Work to improve speeds with a trackDb caching mechanism.<br />
<br />
= 2019 =<br />
<br />
== 10 December 2019, v391 ==<br />
* Extensive enhancements to allow complex filtering options on [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#bigBed bigBed] files in Track Hubs. <br />
* Improvements to hub connection performance and operation. <br />
* Addition of tools to help convert between NCBI/EBI chromosome names ([https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/utils/chromToUcsc/chromToUcsc?branch=master&file=src/utils/chromToUcsc/chromToUcsc#L68 chromToUcsc]) and various internal tools to add archiving tracks and test for hub configuration mistakes.<br />
* Improvement to interact details display and ongoing work incorporating ENCODE data tracks and documenting new [http://genome.ucsc.edu/goldenPath/help/hic.html Hi-C] track type. <br />
<br />
== 12 November 2019, v390 ==<br />
* Release of a new bigDbSnp track type to support the pending release of dbSnp152.<br />
* New "autoScale group" setting in trackDb for Track Hubs.<br />
* New display of how many items are filtered out for tracks using filters.<br />
* Improved performance for new Hi-C track data in using auto-scale mode.<br />
* Improvements to speed of discovering public hub search results. <br />
* Improvements for VCF filter operation and fixes for custom track interactions on the Data Integrator.<br />
<br />
== 22 October 2019, v389 ==<br />
* New feature for barChart tracks a trackDb setting: barChartMaxHeight<br />
* New feature for composite tracks a trackDb setting: hideEmptySubtracks. <br />
* Further work toward the intial release of Hi-C track data.<br />
* Improvements around the working of filters on tracks and a note displaying when a filter is activated. <br />
* Improvements to API and various other tools.<br />
* Work to display the plantinum genomes as assembly hubs.<br />
* Continued work to import the latest SNP data in JSON format. <br />
<br />
== 1 October 2019, v388 ==<br />
* Multiple Track Hubs can now be attached using the hubUrl= parameters by repeating it in the URL.<br />
* Work to enable Track Hubs to have the ability to attach to other existing Assembly Track Hubs.<br />
* New "group auto-scale" available for multiple tracks in a composite group enabling a group of signal tracks to scale in tandem to the highest auto-scaled track in the group.<br />
* New feature for interact tracks to open a multi-region view of the two ends of the interact data with a custom track to annotate the ends. <br />
* New code to attempt to thwart the abuse by bot-driven queries that could occasionally overwhelm the site. <br />
* Enhancements to the new JSON API code to better output data on cases such as schema details.<br />
<br />
== 10 September 2019, v387 ==<br />
* Added new highlighted indication of being in the multi-region or reverse browsing modes.<br />
* Work to improve the checks of BLAT servers and work to improve the checking of Hubs.<br />
* Fixes for various links to external sites such as to the Mutation Position Imaging Toolbox (MuPIT) from gene details pages.<br />
<br />
== 20 August 2019, v386 ==<br />
* Improvements to new API to allow data fetching from gene models in bigGenePred format.<br />
* Improvements to sorting BLAT results upon clicking into the details page when Browsing results.<br />
* Enhancements to BLAT ability to better handle results with assemblies such as hg38 with many alt/fix patch sequences.<br />
* Enhancements across the site to support IPv6 for different tools from BLAT, parasol, and IP Geolocation.<br />
* Various fixes across the site to enhance tools, such as the Table Browser, or to further support ongoing work to handle Hi-C data.<br />
* Development to expand Track Hub checking when a hub is first connected.<br />
<br />
== 30 July 2019, v385 ==<br />
* A change to the code loading big* based custom tracks to append a "chr" if the data does not include it (1 becomes chr1). <br />
* Work to make new Hi-C data output from Table Browser work.<br />
* Ongoing work to handle sizeable snp152 involving new JSON downloads, new bigDbSnp track type.<br />
* Continued improvements on new JSON data API Interface.<br />
* Improvements to pipelines to handle changing input OMIM metadata.<br />
* Work to bring in ENCODE3 mouse data. <br />
<br />
== 9 July 2019, v384 ==<br />
* Zebrafish icon added in the popular species gateway menu.<br />
* New trackDb setting <code>interactMultiRegion</code> allows interact track details' pages to activate multi-region link to view both ends simultaneously. <br />
* Work to improve pennantIcons display in supertracks and composites.<br />
* Further enhancements to the new JSON data API Interface.<br />
* Initial commit of Hi-C draw support for custom tracks. <br />
* Improvements across the site such as regarding custom tracks error_logs and handling of sessions data such as when multiRegionsBedUrl file is used. <br />
<br />
<br />
== 18 June 2019, v383 ==<br />
* Improvements for BAM custom tracks so that the file.bam.bai data can be named file.bai as well. <br />
* Continued work on new JSON data API Interface.<br />
* Continued work on lollipop display mode for variation & frequency data,<br />
* Further enhancements to storage preservation methods regarding sessions with custom data.<br />
<br />
== 28 May 2019, v382 ==<br />
* Improvements to enable display of wide-ranging ENCODE3 TFBS data. <br />
* Continued work on new JSON data API Interface.<br />
* Initial commit of Hi-C draw support. <br />
* Improvements to often used utilities such as htmlCheck and bedToBigBed<br />
* Documentation enhancements to better explain the new BLAT All process.<br />
* Enhancements to bigBeds ability to handle filtering of data.<br />
<br />
== 7 May 2019, v381 ==<br />
* Improved the working of highlights and selecting regions while Browsing. <br />
* Work to improve the ability to create Track Hubs and Tracks with a matrix of up to 1000 items for large data sets, such as for ENCODE3.<br />
* Continued work on new JSON data API Interface.<br />
* Continued work on lollipop display mode for variation & frequency data,<br />
* Further enhancements to storage preservation methods regarding sessions with custom data.<br />
<br />
== 16 April 2019, v380 ==<br />
* Improved support for VCF with automatic detection if data should be type vcfTabix or vcf and support for VCF UI in Track Hubs. <br />
* Continued work on lollipop display mode for variation & frequency data,<br />
* Continued work on new JSON data API Interface.<br />
* Enhancements to storage preservation methods regarding sessions with custom data.<br />
<br />
== 26 March 2019, v379 ==<br />
* Work on new JSON data API Interface<br />
* Enhancements to [https://genome-source.gi.ucsc.edu/gitlist/kent.git/raw/master/src/utils/findMotif/findMotif.c findMotif utility] to allow 32 character motifs and specified number of mis-matches.<br />
* Improved error handling of Track Hubs when required type line missing and edge-case issue of custom tracks loaded onto disconnected assembly hubs.<br />
* Improvement on right-click UI pop-up configurations in regards to updating nested composite tracks.<br />
* Tools added to help build large track groups, [https://genome-source.gi.ucsc.edu/gitlist/kent.git/raw/master/src/utils/tdbRename tdbRename] and [https://genome-source.gi.ucsc.edu/gitlist/kent.git/raw/master/src/utils/tdbSort tdbSort] to sort and rename sections of large trackDb files. <br />
* New [https://genome-source.gi.ucsc.edu/gitlist/kent.git/raw/master/src/hg/utils/hubClone/hubClone.c hubClone] tool to clone remote hub text files to a local copy, fixing up bigDataUrls to aid in debugging or providing a template for new hubs. <br />
* Various fixes to browser performance with regards to memory, position and database information, display of individual items, links to external sites such as HGNC, and enhancements to new custom track backup feature. <br />
<br />
== 5 March 2019, v378 ==<br />
* Fixed twoBitMask script to work with 64-bit twoBit files.<br />
* Added <trackName>_hideKids=1 directive to URL parsing to turn off super track or composite children.<br />
* Added default score filtering to Interact type.<br />
* Added ability to backup Custom Tracks that have been saved in Sessions.<br />
* Added hubClone utility to precompiled apps available for downloading.<br />
* Added support for VCF 4.2 '*' ALT value.<br />
<br />
== 12 February 2019, v377 ==<br />
* Added new Projects dropdown menu.<br />
* Created hubClone utility for copying a remote hub to local and changing bigDataUrls to remote.<br />
* Fixed Table Browser position radio button so that it remains selected after a search.<br />
* Initial work on lollipop display mode.<br />
<br />
== 22 January 2019, v376 ==<br />
* Added format checking of uploaded session files.<br />
* Made hubCheck more resilient when trackDbHub spec file is mis-formatted.<br />
* Cleaned up some code warnings found by gcc-7 version.<br />
* ClinGen CNVs track items are now named '*_unk' instead of '*_not provided' when the variant origin is not provided.<br />
* Fixed bug in hgSession where session thumbnails were not generated when short links were present.<br />
* Fixed table schema for bigBed track tables to now show update time of the file, not the table.<br />
<br />
= 2018 =<br />
<br />
== 18 December 2018, v375 ==<br />
* Added feature to the Sessions tool to print short links to sessions.<br />
* Allowed GBiBs to download from genome-euro if they are closer to the European server.<br />
* Modified the GENCODE tracks on hg19 to link to the Ensembl GRCh37 browser.<br />
* Handled corner cases in variantProjector caused by unnecessary gaps introduced by bug in RefSeq transcript alignments.<br />
* Fixed a GBiC bug related to SELinux observed in Redhat Linux.<br />
* Added support for labelFields statements at the view level.<br />
<br />
== 27 November 2018, v374 ==<br />
* Added phylogenetic tree utility (binaryTree.pl) to convert NCBI Taxonomy polytomy trees to binary trees.<br />
* Made enhancements to hgTracks & hgc for PSL alignments to alt/fix sequences.<br />
* Fixed gff3ToPsl util: block-order bug when GFF3 strand is '+' and Target strand is '-'.<br />
* GBiB link to shared data folder is now relative so menubar works when GBiB is accessed from a real server.<br />
<br />
== 30 October 2018, v373 ==<br />
* Released new feature: BLAT ALL Genomes.<br />
* Completed interact track type feature enhancements: pack/squish modes, cluster view, flipped (inverted) display.<br />
* Fixed corner case: make multi-region config box ignore :start-end ranges after alt/fix sequence name.<br />
* Fixed bug in hgvsToVcf: bug triggered by inconsistent tx version between ncbiRefSeq and genbank tables.<br />
* Fixed bug in barChart track display when no items are in current range.<br />
* Added ability to access UDC sparse file using virtual memory.<br />
<br />
== 9 October 2018, v372 ==<br />
* Improvements to new interact track type.<br />
* Work to expand track database arguments to work with new data visualizations.<br />
* Improvements to barChart format and work on other tools used in generating data formats. <br />
<br />
== 18 September 2018, v371 ==<br />
* Enhanced External Tools clicks to CRISPOR for larger regions.<br />
* Made valid assembly links clickable on Public Hub page.<br />
* Fixed a bug in VAI for deletion across exon/intron splice on - strand.<br />
* Extract RefSeq Annotation Release from GFF header if present.<br />
* Fixed bamToPsl to work on huge BAMs.<br />
* On-going work on bigBed filtering.<br />
* Extended trackDb setting for interact track, to support offset source/target endpoints. <br />
<br />
== 28 August 2018, v370 ==<br />
* Enhancements to new Gene Interaction type to allow a filter to hide interactions missing at one or both endpoints in the window.<br />
* Security improvements to prevent cross-site scripting.<br />
* Fixes to allow the new interactions track type to work properly with track composites.<br />
<br />
== 7 August 2018, v369 ==<br />
* New keyboard shortcut "v s" to jump to view sequences. <br />
* An improvement to custom tracks error messages to check the assembly sequences when given invalid chromosome names.<br />
* A fix to a bug in the Variation Annotation Integrator that caused duplicated lines of output and other similar fixes to scripts and tools. <br />
* An improvement in the display color for ClinVar tracks. <br />
* Ongoing work on introducing numeric and filterBy support to bigBeds and Hubs.<br />
<br />
== 16 July 2018, v368 ==<br />
* Added support for url, urlLabel and extraFields trackDb settings for bigBarCharts format.<br />
* Fixes to ensure the gnomAD track now correctly provides links out to that resource.<br />
* Various work to support patch sequences and fixes to automated scripts to help improve site performance. <br />
<br />
== 26 June 2018, v367 ==<br />
* New documentation for coming [http://genome.ucsc.edu/goldenPath/help/interact.html new interact and bigInteract] track type. <br />
* Enhancements to barChart and bigBarChart data to display barplot on hgc instead of boxplot when there are a small number of samples.<br />
* Addition of links featuring [http://genome.ucsc.edu/gtexBodyMap.html GTEx body map page] in various relevant code locations across the site. <br />
* Various enhancements and fixes to the site to improve functionality across browser platforms, such as safari.<br />
<br />
== 5 June 2018, v366 ==<br />
* Improvement to enable bigBed searchIndex to work within composites.<br />
* Changes to support DECIPHER CNVs new data format and work on the coming new track.<br />
* Fix for the bedGraph format in assembly hubs and a fix for composite display in multi-exon mode and other browsing visualization improvements.<br />
* A fix for links to mirbase in the related track and fixes for input from [https://github.com/ucscGenomeBrowser/kent/issues/12 Martin Marcher from GitHub] on GBiC operation.<br />
<br />
== 15 May 2018, v365 ==<br />
* Increased tool support for Assembly Hubs using genomes that might include periods in names. <br />
* Increases in speed and performance around custom tracks and default track displays. <br />
* Improvements to error messages and data building scripts.<br />
* Fixes and developments around newest Track Collection Builder tool and work on new interactions type track.<br />
<br />
== 24 April 2018, v364 ==<br />
* Improvements to HGVS support for the Variant Annotation Integrator. <br />
* Work on new interaction track type to allow displaying long distance genomic connections. <br />
* Creation of pipelines to help users generate automated alignments, often requested in mailing list questions.<br />
<br />
== 3 April 2018, v363 ==<br />
* Support for HGVS ENST* and ENSP* terms from latest GENCODE in position/search. <br />
* Ongoing work for new interactions track type.<br />
* Ongoing work on new Track Collection Builder tool. <br />
* Further support for pipelines for NCBI RefSeq tracks and various fixes to tools and utilities.<br />
<br />
== 13 March 2018, v362 ==<br />
* Searching Public Hubs now includes results from metadata tags on tracks.<br />
* In GenBank mRNA tracks patent sequences are now off by default, and a new Track Setting enables turning them on.<br />
* Track hubs can now be structured to exist entirely in the hub.txt file when a "useOneFile on" setting is included in the first hub stanza and the hub is limited to one genome.<br />
* The file size limit of hub.txt file has been expanded from 256K to up to 16M.<br />
* Improvements to the Variant Annotation Integrator for detection of ambiguous regions and HGVS term support.<br />
* Various fixes across the site to improve security and on going work for a new collection tool.<br />
* Work to allow Assembly Hubs to have additional Track Hubs attached to them.<br />
<br />
== 20 February 2018, v361 ==<br />
* Improved Variant Annotation Integrator prediction results involving specific RefSeq annotations where reference genome and transcript sequence differ. <br />
* Adding an additional recovery email address option when accounts first created for sessions, and improved sorting of existing sessions.<br />
* Security improvements and url-encoding for parts of the site.<br />
* Various fixes to tools, such as the Table Browser, and improvements to pipelines to generate new data tracks.<br />
<br />
== 30 January 2018, v360 ==<br />
* Update of NCBI browser links from Map Viewer to new Genome Data Viewer.<br />
* Fixed Variant Annotation Integrator bug with custom track type pgSNP with custom hub assembly.<br />
* Fixed track details in assembly hubs around auto-generated tracks and for bigBarChart, pgSnp, VCF, bigMaf types.<br />
* Improved gateway assembly search to find newer public hubs.<br />
<br />
== 9 January 2018, v359 ==<br />
<br />
* Added Parasol binaries to list of downloadable apps<br />
* Improved line limit exceed error messages from hgLoadBed.<br />
* Enhanced new track type: longTabix.<br />
* Fixed problem with saving session to file after cart dump.<br />
* Fixed problem with loading assembly hubs with custom tracks.<br />
* Fixed bug in hgTables for bigDataUrl-only tracks<br />
* Fixed hgTablesTest blank line / no results corner case.<br />
<br />
= 2017 =<br />
<br />
== 5 December 2017, v358 ==<br />
<br />
* Work to enable new bigNarrowPeak format that can be used in Hubs, documentation pending release of code.<br />
* Enhancements to trackDb pennantIcon setting that can be used in Hubs to support arbitrary text (e.g. for 'New' tag).<br />
* Support for showing schema links for data hosted by bigDataUrls that can be used in Hubs.<br />
* Various improvements to support hubs such as fixes to new disconnect button, ensuring assembly hubs can work in the Variant Annotation Integrator, and saving hub sessions with custom data.<br />
* Addition of source code version printout to most utilities to help users track version number. <br />
* Enhancement to site to help display current database and search with en-dashes or em-dashes in coordinate ranges.<br />
* User session file support improvements around assembly hubs with custom track data and fixing files download code to enable backups. <br />
* Improvements to the RefSeq tracks to better support links and versioning display of data.<br />
<br />
==7 November 2017, v357==<br />
* Enhancements to further improve HGVS search terms and search functionality and including HGVS output option in vai.pl script. <br />
* Fixes to improve stability of user custom data connected with saved sessions.<br />
* Enhancements to Genome Browser in the Cloud (GBiC) script to check response speeds and automatically select new genome-euro MySQL server on installation if faster.<br />
* Ongoing work to introduce a new tool to allow novel collections of tracks.<br />
* Various fixes and improvements across the Browser not limited to enhanced dataVersion to allow for better historical connection to track data revisions and updated display of version information on GENCODE gene track details pages, and updated links to external sites like the CDC HuGE database, MGI, and GAD on track data pages.<br />
<br />
== 17 October 2017, v356 ==<br />
* Support for new big* formats such as bigMaf, bigPsl, and others for use in the [http://genome.ucsc.edu/cgi-bin/hgIntegrator Data Integrator] tool.<br />
* Ongoing work to introduce a new tool to allow novel collections of tracks and code enhancements for coming GTEx eQTL tracks.<br />
* Various fixes and improvements across the Browser from highlighting functionality, HGVS search results, and infrequent empty cache issue for data in custom tracks and track hubs.<br />
<br />
== 26 September 2017, v355 ==<br />
* Internal work to prepare for adding a MySQL server, genome-euro-mysql.soe.ucsc.edu, in Europe to improve GBiB performance.<br />
* Improvements to processes for creating NCBI RefSeq Genes automation scripts and other track data under development.<br />
* Ongoing work to introduce a new tool to allow novel collections of tracks. <br />
* Various fixes and improvements across the Browser from highlighting functionality, clicks of detail description pages, and Table Browser output. <br />
<br />
== 5 September 2017, v354 ==<br />
* HGVS variant nomenclature output now available with [http://genome.ucsc.edu/cgi-bin/hgVai Variant Annotation Integrator] when RefSeq Genes selected for gene prediction source, allowing turning custom tracks or variant identifiers like rs2021974 into <code>NC_000020.10:g.15382919G>A</code> or <code>NM_001351661:c.541-29131G>A</code>.<br />
* New tool vcfToHgvs that produces HGVS using detailed alignments and sequence of RefSeq transcripts.<br />
* Ongoing work for new tracks using new bigBarChart display and a new tools underdevelopment and other fixes and enhancements to source code.<br />
<br />
== 15 August 2017, v353 ==<br />
* Improvements to new Public Hubs search page, search terms expanded ("H3K4ME" matches H3K4ME1, H3K4ME2, and H3K4ME3).<br />
* Enhancements to [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#halSnake halSnake format] to allow display of the number of bases in inserts.<br />
* Fixes for Table Browser selection options and support for "chr" sequence names in HGVS genomic (g.) terms.<br />
* Fixes to the main Browser for right-click enabled track configuration option and the display of highlights displaying near the ends of chromosomes. <br />
* Fixes to enhance GBiB and CRISPR data and recent [http://genome.ucsc.edu/cgi-bin/hgGeneGraph Gene Interactions] tool.<br />
<br />
== 25 July 2017, v352 ==<br />
* Further support for HGVS position/search terms.<br />
* Improvements for the Public Hubs page to load a bit faster.<br />
* Enhancements to support the GBiB and GBiC products. <br />
* Various fixes for tools and new software to support tagStorm format.<br />
<br />
== 5 July 2017, v351 ==<br />
* Major enhancements to [http://genome.ucsc.edu/cgi-bin/hgHubConnect?#publicHubs Public Hub] searching feature, where search results can now be expanded, and right clicks provide direct links to matching tracks and assemblies.<br />
* New CGI tool, hgLinkIn, which translates external identifiers to assembly positions for UniProt genes (http://genome.ucsc.edu/cgi-bin/hgLinkIn?id=O95477&resource=uniProt)<br />
* Further support for HGVS position/search terms.<br />
* Various fixes and new software to support metadata processing around the tagStorm format.<br />
<br />
== 13 June 2017, v350 ==<br />
* New CGI tool: the gene interactions viewer, [http://genome.ucsc.edu/cgi-bin/hgGeneGraph hgGeneGraph]. Highlighted by new hg19 and hg38 [http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=interactions Protein Interactions from Curated Databases and Text-Mining tracks].<br />
* Final documentation of new [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#bigBarChart barChart and bigBarChart] track types that allow for [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hideTracks=1&gtexGene=pack GTEx] like displays with custom data in custom tracks and track hubs. <br />
* Added support for HGVS m. and n. terms in position/search.<br />
* Added support to use optional Apache Basic Authentication instead of hgLogin.<br />
* Various fixes and enhancements to code to support tracks, tools, and metadata storage.<br />
<br />
== 22 May 2017, v349 ==<br />
* Enhanced multi-region feature now allows users [http://genome.ucsc.edu/goldenPath/help/multiRegionHelp.html#CustomRegions to directly enter custom BED coordinates], instead of requiring a URL to remotely hosted BED coordinates. <br />
* New [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#bigBarChart barChart and bigBarChart] track types that allow for [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hideTracks=1&gtexGene=pack GTEx] like displays with custom data in custom tracks and track hubs. <br />
* New tools added such as expMatrixToBarchartBed to support the new bigBarChart type available in Users Apps package. <br />
* New CGI hgGeneGraph to allow for interactive protein interactions track viewer coming soon for hg19 and hg38.<br />
* New hg.conf option, login.relativeLink for firewalled genome browsers.<br />
* New hg.conf option, useBlatBigPsl=off, to turn off now default option to have custom track output from BLAT searches.<br />
* Enhancements to [http://genome.ucsc.edu/goldenPath/newsarch.html#060116 density graph plot feature] available for BAMs and other track types so that graph is more specific to alignment regions.<br />
* Enhancements to HGVS search parameters such as to when they point to indel points or negative coordinates. <br />
* Fix for colors on bigGenePred track type.<br />
* Fix for VAI and background work for proper transcript HGVS VAI output.<br />
* Various other enhancement and background improvements to improve Genome Browser in the Cloud (GBiC) sessions, track hubs, log-in user support and style across the site.<br />
<br />
== 2 May 2017, v348 ==<br />
* New zoom dialog enhancements to allow multiple highlights including specification of custom colors, along with supporting shortcut keys. <br />
* New Variant Annotation Integrator (VAI) script, vai.pl, added to our userApps.<br />
* Enhancements for HGVS position search and highlight.<br />
* Addition of ability to add metadata to track hubs using a tab separated file.<br />
* Preliminary support for new barChart/bigBarchart track type to allow GTEx type display in hubs.<br />
* Ongoing development for BLAT custom track output. <br />
* Various improvements and fixes to improve user experience.<br />
<br />
== 11 April 2017, v347 ==<br />
* Added Proxy support for HTTPS and FTP.<br />
* Fixed udcCache to allow HIPAA-compliant Amazon Storage securely signed URL redirects.<br />
* Added "filter activated" to labels for some tracks if filters are configured. <br />
* Added HGVS terms as variant input option in Variant Annotation Integrator.<br />
* After you change your password, you are now automatically logged in.<br />
<br />
== 21 March 2017, v346 ==<br />
* New keyboard shortcuts <code>h then m</code> for highlight mark and <code>h then c</code> for highlight clear, and new ability to add more than a single highlight.<br />
* Changes to the default width of the screen to 950px.<br />
* Ongoing work to support custom track output from BLAT results.<br />
* Fix to allow Public Sessions on mirror sites including genome-euro and genome-asia. <br />
* Fixes and improvements to many tools after recent software changes in v345.<br />
<br />
== 28 Feb 2017, v345 ==<br />
* Release of reestylized document pages to fit with the new homepage and genomes gateway page. <br />
* Introduced feature to allow arbitrary fields within bigBeds to be used as labels in hgTracks. For example, using <code>labelFields fieldName1, fieldName2</code> in trackDb.txt in a hub, the option to display those fields on the Browser will exist on that track's Track Settings page.<br />
* Fixes and enhancements to various CGIs such as improving the Table Browser to tolerate characters like "/" in output file names.<br />
<br />
== 7 Feb 2017, v344 ==<br />
* Changed restrictions on the maximum chromosome name length to 255 from 32 characters.<br />
* Added support of cytoBand Ideogram on assembly hubs.<br />
* Various bug fixes and ongoing work to fix bigMaf, bigPsl track types and bring tagStorm metadata support to hubs.<br />
<br />
== 17 Jan 2017, v343 ==<br />
* Introduced limited support for pasting DNA sequence into the position box to trigger a BLAT search.<br />
* Ongoing work to support custom track output from BLAT results.<br />
* Feature to allow track hub track visibilities to be set via the URL without the track's "hub_1234_" decoration (&trackName=pack versus &hub_1234_trackName=pack).<br />
* Improved documentation for the keyboard shortcuts, press "?" to see all shortcuts.<br />
<br />
= 2016 =<br />
<br />
== 13 Dec 2016, v342 ==<br />
* New CGI [http://genome.ucsc.edu/cgi-bin/hgGtexTrackSettings?db=hg38&g=gtexGene hgGtexTrackSettings] that provides dynamic highlighting and selection of the 53 GTEx tissues on a human body illustration.<br />
* New track database (trackDb) setting [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#vcfTabix <tt>bigDataIndex</tt>]for vcfTabix and BAM files to allow indexed data file (bam.bai vcf.tbi) to exist at an alternate URL.<br />
* New shortcut keys when browsing (1-6) that jump from 10 to 1,000,000 bp regions, press "?" to see all shortcuts. <br />
* Implementation of default display of codon-numbering when zoomed into the base view. (Shortcut "e v" allows exon view; "d v" returns to default view). <br />
* Features for Track Hubs: new tagStorm metadata format; ability to self-reference hub.txt, genomes.txt, trackDb.txt in one file. <br />
* Fixes and enhancements for handling split tables, hal snake tracks, and developmental code to allow BLAT custom track output and a command-line variant annotator tool.<br />
<br />
== 15 Nov 2016, v341 ==<br />
* New track database (trackDb) setting [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#darkerLabels <tt>darkerLabels</tt>] allows left labels on a track to have a darker display, useful where the track color might be too light for readable labels.<br />
* New link out to ExAC site for ExAC track detail pages like [http://exac.broadinstitute.org/variant/21-24451624-A-T <em>ExAC: 21:24451624 A/T</em>].<br />
* Enhancements for VCF tracks in the Browser and on a VCF details page.<br />
:: For TUMOR/NORMAL VCF, show tumor allele counts in mouseover.<br />
:: Infer genotypes from SGT tag for VCF from Strelka.<br />
* Table Browser enhancement to support intersection feature when data supplied via bigDataUrl tracks.<br />
* Error message improvement regarding track hubs created incorrectly with identical track names.<br />
<br />
== 25 October 2016, v340 ==<br />
* New disconnect button for track hubs from the main browser.<br />
* Improved HGVS position search.<br />
* VCF parser support for VCFv4.2.<br />
* Fix for urls using <code>hideTracks=1</code> to recognize and adjust super track visibilities.<br />
* Speed optimizations for MAF displays.<br />
<br />
== 4 October 2016, v339 ==<br />
* A new method to display UCSC Genes knownGene tracks in the main Browser through the bigBed bigGenePred format to increase the speed of displays.<br />
* Optimizations to increase the speed of codon drawing and other browser operations. <br />
* Renaming of the CGI phyloGif renamed to phylPng.<br />
* Fixes for fleeting network errors that displayed in Firefox and Safari, and fixes for new track data. <br />
* Improvements to code and infrastructure for future tracks releases such as for CRISPR data. <br />
<br />
== 13 September 2016, v338 ==<br />
* Introduce search support for Human Genome Variation Society (HGVS) simple substitutions such as "c." (Ex. hg38 search: <code>NM_007294.3(BRCA1):c.2231C>A</code>) and "p." (Ex. hg38 search: <code>NP_002993.1:p.Asp92Glu</code>), and pseudo-HGVS (gene symbol and protein change, ex. hg38 search: <code>BRCA1 Ala744Cys</code>) as position/search terms.<br />
* Added GTEx gene expression to [http://genome.ucsc.edu/cgi-bin/hgNear Gene Sorter] in 3 columns: GTEx (expression), GTEX ID (ENS), and GTEX Delta (distance). The GTEx expression column is now the default expression column, GNF Atlas 2, is still available but not default.<br />
* Various new command line tools, and enhancements to existing ones, around ongoing gene prediction work.<br />
* Fixes to existing bugs on Gene Details pages and other browser pages and tools.<br />
<br />
== 23 August 2016, v337 ==<br />
* Made hgSession/hgLogin use HTTPS for hgLogin, unless disabled with <code>login.https=off</code> in hg.conf file.<br />
* Fix to remove limit on number of sessions listed in Public Sessions.<br />
* Enhance track hub pennantIcon to take a URL.<br />
* Enhance hgTracks to draw multiple highlights from the cart variable via URL (not UI).<br />
* Fix for the ${ and $} variables to mean start and end location of clicked item.<br />
* Fix track hub problem where tracks in composites were not sorted in initial display. <br />
* Enhancements and fixes to bigBed, bigPsl, metadata display, and hubs.<br />
<br />
== 2 August 2016, v336 ==<br />
* Added support for CRAM in GBiBs.<br />
* Enhanced CRAM output choices to Table Browser.<br />
* Fixes to various parts of software (hgTracks, hgc, hgVai, hgTables) to improve links to data. <br />
* Improvements for new Public Sessions hub, including more configuration options for mirrors.<br />
* Making density graphs automatic for mirrors by removing requirement of hg.conf setting to display.<br />
* Introduced HTSlib library (open source code written, licensed and distributed by Genome Research Limited) to the kent source tree<br />
<br />
== 12 July 2016, v335 ==<br />
* Improvements with new redesigned homepage.<br />
* Improvements and further documentation to support new bigMaf, bigPsl, bigChain, bigGenePred and CRAM formats.<br />
* Ongoing work related to the move of Genbank metadata to hgFixed, such as hgVai fix for refSeqStatus txStatus info.<br />
* Genome Graphs fix to allow SNP queries again. <br />
* Improvements to multi-region customUrl feature allowing user defined regions.<br />
* Fix to maxIntron option in BLAT when non-default value is used in a protein query.<br />
* Keyboard shortcut for new Public Sessions page.<br />
<br />
== 21 June 2016, v334 ==<br />
* New CGI hgPublicSessions that allows users to submit their sessions to be viewed by all the public on a special gallery page.<br />
* Initial release of Chromatin Interaction display (longTabix).<br />
* Performance enhancement for ORegAnno tracks across wide regions.<br />
* Improvements to bigMaf, bigChain, and bigPsl support.<br />
* Table Browser noGenome support that can turn off genome-wide queries if needed for certain data types.<br />
* Support changes for mirrors and GBiB regarding phylo tree drawn on hgGateway.<br />
* Support changes for new genome-asia mirror.<br />
<br />
== 31 May 2016, v333 ==<br />
* Added 'RNA-Seq Expression' section to hgGene, that displays GTEx tissue-specific gene expression in a box-plot graph. Support default collapsed sections of hgGene. Set microarray section to default off (and change label to 'Microarray Expression'.<br />
* Add settings to improve display of large composite tracks with metadata (hoverMetadata) and many or fixed track colors (darkerLabels).<br />
* Display metadata on details page directly (don't require click on metadata link).<br />
* Restored cytoBand fuzzy search.<br />
* Started implementing Chromatin interaction display mode; Libified ave program.<br />
* Use autoscale and maximum mode as defaults for BAM density graphs.<br />
* Support bigPsl, bigChain, and bigMaf as custom tracks.<br />
<br />
== 10 May 2016, v332 ==<br />
* New Gateway display added.<br />
* New keyboard shortcuts to the dropdown menu and the help menu added.<br />
* Enhancements to support a BAM density display feature.<br />
* Enhancements to improve support multi-region support for pgSnp, certain hgc clicks and certain custom region custom track displays.<br />
* Fixes for GENCODE gene tracks regarding prefixes and ids when obtaining records.<br />
* Library changes to improve fileModTime error message, and relaxing check of metadata in VCF files.<br />
<br />
== 19 April 2016, v331 ==<br />
* Added calls to bottleneck sever for more of Table Browser; botting for BED files will be delayed appropriately.<br />
* Changed Genbank procedures to allow pushing tables rather than updating them on each machine.<br />
* Fixed bug in hgc for SNP effect prediction of liftUp'd genePredExt tracks with empty exonFrames column. <br />
* Final changes to support GTEx Gene Tissue Expression track.<br />
* Fixed two issues in GTEx display when in multi-region mode: better accounting for label width in pack mode; gtexGene for limitedVis full->pack->squish->dense.<br />
* Fixed problem in pack mode with special exception for altGraphX spaceSavers.<br />
* Fixed a bug where next/prev exon link did not stop at coding region start/end in both directions.<br />
* Enhanced nextExon mouseover labels to more clearly explain what it does.<br />
<br />
== 29 March 2016, v330 ==<br />
* Created automated script to fetch CRAM reference sequences in conjunction with hgTracks.<br />
* Clarified mouse-over message on hgTracks image to more clearly explain that by pressing the double-headed arrow, you will move to the Next/Previous Exon Edge.<br />
* Fixed crash in custom tracks of type big* when user pressed nextItem.<br />
* Formatted bigBed extra fields as a table on hgc page.<br />
* Fixed a problem with SIG_PIPE in our https connection handling.<br />
* Fixed hgc position for multi-region modes with multiple different chromosomes on the screen, such as custom regions by url mode or alternate-haplotype mode.<br />
* Fixed a problem with chain lines disappearing when zoomed way in.<br />
* Fixed a bug in outputting sequence in the table browser for refSeq table.<br />
<br />
<br />
== 8 March 2016, v329 ==<br />
* Added "Apply" button to popup dialog box for configuring tracks on Tracks page.<br />
* Fixes and updates for OMIM data pipeline.<br />
* Fixed error handling for bigDataUrl/network tracks to be consistent and work across all regions/windows.<br />
* Fixed bug detected with clinGen track caused by missing chrom name in a structure.<br />
* Fixed bug in query index order for ensGeneTrack for exon/Gene-Mostly.<br />
* Added assembly hub fields organism, description and scientificName to index of hubCrawl.<br />
* Fixed string-escaping bug that broke the "BRCA" autocomplete for hg38 in hgSuggest.<br />
* Removed some human-specific tweaks so cytoBand search will support horse & other organisms.<br />
* Fixed bug in hgGateway search-box autocomplete.js.<br />
* Fixed problem where hub defaultPos is sometimes ignored when using using hubUrl= and genome=.<br />
* Assume that the BAM index file is named fileName.bai.<br />
<br />
== 16 Feb 2016, v328 ==<br />
* Multi-region display. This feature allows users to "slice" their track viewing experience into a variety of different modes that focus the display on certain features: exon-only, gene-only, or user-defined BED coordinates. Only the portions of track annotations that fall within these displayed regions are shown; extraneous intergenic, intronic and otherwise unwanted regions are hidden from view.<br />
* Fixed VCF Custom Tracks loading problem over FTP: make lineFile code assume index is .tbi, and not check for .csi.<br />
* Fixed bug that occurred when users supplied backwards coordinate range (start > end) which caused both the start and end to be off by 1.<br />
* Ongoing work on GTEx gene expression track: track configuration page including popup version; added tissue selection via sortable table.<br />
* Ongoing work on GBIB: allow R for GTEx tracks; security updates; local track loading.<br />
* Ongoing work on supporting CRAM data type.<br />
<br />
== 19 Jan 2016, v327 ==<br />
* Multi-region display. This feature allows users the ability to "slice" up their normal track viewing experience into a variety of different modes: exon, gene, or user defined BED coordinates, and visualize the track annotations only in those regions, effectively removing the intergenic, intron or otherwise unwanted regions from the viewing window.<br />
* Added Server Name Indication (SNI) support for HTTPS with certificates for wild-card domains.<br />
* Ongoing work to support CRAM, BAM, and tabix using htslib.<br />
<br />
= 2015 =<br />
<br />
== 15 Dec 2015, v326 ==<br />
* Folded in support for htslib to get CRAM support. Also supports tabix and BAM.<br />
* Enlarged buffers to help prevent problems with long URLs.<br />
* Ongoing work on GTEx gene expression track display.<br />
* Added View DNA shortcut to View menu.<br />
* Added hg.conf option to suppress "very early errors" that display on the browser.<br />
* Added support to Data Integrator for adding in related tables and fields using all.joiner.<br />
<br />
== 24 Nov 2015, v325 ==<br />
* Started work to support bigChain (remotely-hosted chain file) file type in the browser.<br />
* Ongoing work to support bigMaf (use semi-colon instead of ^A as line separator). <br />
* Fixed a problem with BAM files that have no alignment. <br />
* Fixed bugs in Data Integrator: 'defined regions' was selected when there weren't regions anymore; autocomplete behavior didn't follow changes of db; added simple header (database, region, date) to hgIntegrator output. <br />
<br />
== 3 Nov 2015, v324 ==<br />
* Added drop-down menus to the Genomes and Genome Browser menus.<br />
* Initial work on supporting bigMaf (remotely-hosted multiple alignment file) data type in the browser.<br />
* Initial support for GTEx gene expression and related tracks.<br />
* Fixed loading bigWig files from local dir via hg.conf option.<br />
* Changed some absolute links into relative links for the benefit of mirror sites.<br />
* Fixed code so that relative links in CGIs are no longer confused by HTML base tags in included files.<br />
* When returning to a previously-visited assembly, take the user back to their old position when position="lastDbRef".<br />
* Provide a link back to the genome browser from BLAT results.<br />
* Fixed bug in Data Integrator that broke output field selection for subtracks. <br />
<br />
== 13 Oct 2015, v323 ==<br />
* Added "View -> In External Tool" menu which sends DNA in region to external tools.<br />
* Added functionality to pslCheck program: it now verifies q/tNumInsert and q/tBaseInsert.<br />
* Removed message and 2-second delay from cartReset.<br />
* Added support (and cache) for UDC redirects.<br />
* Ongoing additions and fixes to Data Integrator tool.<br />
<br />
== 22 Sept 2015, v322 ==<br />
* Flipped the switch to make GRCh38/hg38 the default human assembly browser. <br />
* Activated keyboard shortcuts.<br />
* Do not allow new default tracks to sneak into existing saved sessions; ensure that they maintain exact original track contents.<br />
* Added chainToPslBasic utility which quickly converts chain files to PSL files without the overhead of computing match and mismatch counts. <br />
* Added -max option to bigWigMerge.<br />
* Added utilities to the user apps: hgLoadMaf, hgLoadMafSummary and hgLoadChain.<br />
* Fixed phyloGif to accept spaces and special characters in tree node labels.<br />
* Changes to Data Integrator: Fixed bug where wiggle db values were overhanging edges of search region; Fixed bug where it was using out-of-range large-bin items if query limit was hit (affects very large mysql tables like hg19.phastCons100Way); Added context-specific Help menu item.<br />
* Fixed bug in hgTracks: lookupTrackHandler wasn't getting child tracks' trackHandler settings.<br />
* Fixed bug in hgCustom: was using hardcoded default form action instead of cart value.<br />
* Fixed bug in Table Browser: prevent mysql query for composite track with bigData type.<br />
* Fixed bug in hubPublicCheck in the case of missing descriptionUrl.<br />
* Fixed crash in snake track controls.<br />
* Added the ability to specify genome within assembly hub on URL.<br />
<br />
== 1 Sept 2015, v321 ==<br />
* Fixed the anno* libs to prevent region-average wiggle values from being split into multiple region averages.<br />
* Exclude custom MAF tracks (not just wigMaf) from the Data Integrator.<br />
* Added several tools to the Send To menu.<br />
* Allow empty name fields in bigBed files.<br />
* Turn on the Short Match track if anything is typed into the "motif" section of the hgTrackUi page.<br />
* Added link on gene details page to GTEx (Genotype Tissue Expression) for all human assemblies.<br />
* Added SSL Support to hg.conf, hg/lib/jksql.c and to the hgsql-and-family functions (#15751). <br />
* Fixed bug with PSL to genePred frame correction when there is target deletion (#15803).<br />
* Fixed bug converting GFF3 alignments to PSL when alignments start or end with indels.<br />
* Added pslClone utility.<br />
* Started work to support a new data type: bigPsl.<br />
<br />
== 11 Aug 2015, v320 ==<br />
* Changed hgMenubar to work properly with Apache error pages.<br />
* Final tweaks to the top menu bar color.<br />
* Fixed the order of species in the hgMirror list.<br />
* Added a returnTo option to the "sendto" menu to accommodate mirror sites.<br />
* Fixed the postscript/PDF output so that all of the text is properly escaped.<br />
* Added Data Integrator to hgCustom's 'manage custom tracks' page; changed UI from column of buttons to select + go button.<br />
* Removed obsolete 8-bit color support.<br />
* Fixed describe table schema and paste identifiers for assembly hubs.<br />
<br />
<br />
== 21 July 2015, v319 ==<br />
* Released version 1 of trackDbDoc for Track Hubs.<br />
* Upgraded hubCheck tool to verify versions.<br />
* Added an hg.conf option to specify the Galaxy instance to connect to.<br />
* Fixed searchs in VisiGene tool.<br />
* Added new menus to static pages.<br />
* Ongoing work on VAI.<br />
* Added geneId to TransMap Genes tracks.<br />
<br />
== 30 June 2015, v318 ==<br />
* Created new CGI, hgMenubar, in support of adding new menus to all static pages.<br />
* Small tweaks GBiB: support remote data tracks without an associated table; fix species sort order in hgMirror.<br />
* Added support for bigGenePred to hgVai and hgIntegrator.<br />
* Ongoing work on the Data Integrator.<br />
* Added support for v4.2 VCF files.<br />
* Fixed bug in Table Browser: for join results, look for commas in the 2nd column not 1st.<br />
* Fixed one-off bug in DAS for almost-BED tables (e.g. ctgPos).<br />
* Added left label for snake tracks in pack and dense modes.<br />
<br />
== 9 June 2015, v317 ==<br />
* Release of new CGI: Data Integrator (hgIntegrator).<br />
* Tolerate absence of big genbank tables on mirrors.<br />
* Fixed GBiB so that it does not overwrite local hg.conf file.<br />
* Ensured that chromosomes are sorted by case in bedToBigBed.<br />
* Fixed start/stop codon exon number in genePredToGtf.<br />
* Started work on Track Hub spec versioning support. This includes new options to the hubCheck utility.<br />
<br />
== 19 May 2015, v316 ==<br />
* Ongoing work on GBiB.<br />
* Ongoing work on Data Integrator (previously referred to as Annotation Integrator), hgIntegrator.<br />
* Fixed bug in BAM display to ignore mate-pairs that are on different chroms.<br />
* Fixed Table Browser: fixed bug that broke MAF output; fixed filterFields for bigDataUrl-only tracks.<br />
* Allowed hgloadBed to load zero-length BEDs at the beginning of a chromosome.<br />
<br />
== 28 April 2015, v315 ==<br />
* Changed genome-euro redirect to be opt-in, rather than opt-out.<br />
* Allow ORFeome and MGC synthetic mRNAs even when there is no Genbank entry.<br />
* Added several new feature types to gff3ToGenePred (rRNA, ncRNA, primary_transcript). <br />
* Allow mirrors to specify a different userDb and/or sessionDb table name. <br />
* Changed UDC to allow: (1) load bigDataUrls from a specific local directory, configured in hg.conf. (2) deactivate the udc cache.<br />
* Handle tags in VCF's INFO column that appear to have |-separated columns in both description and data by formatting the contents as HTML table for readability.<br />
* Polished support for bigDataUrl-only tracks.<br />
* Added extra column for gene symbol for UCSC Genes output from hgVai and hgIntegrator. <br />
<br />
== 7 April 2015, v314 ==<br />
* Renamed hgAi CGI to hgIntegrator. <br />
* Added assembly hub support for hgIntegrator. <br />
* Added connections from Table Browser to the GREAT server for mm10.<br />
* Updated details page to distinguish older ENCODE tracks from newer ones.<br />
* Allow gff3ToPsl to work with query and target being different sets of sequences (e.g. to map between different genomes).<br />
* Made the hgLogin email site-specific.<br />
* In support of mirrors and GBiBs, made the dbDb, defaultDb, genomeClade, and clade table names configurable.<br />
<br />
== 17 March 2015, v313 ==<br />
* Allow selection of coalescent ancestor to HAL snake tracks.<br />
* Fixed menu duplication bug if there is a custom track on a track hub.<br />
* Allow zero length blocks in browser and in bigBed's.<br />
* Allow use of both itemRBG & the ability to used codon coloring in bigGenePred types.<br />
* Initial implementation of schema and display code for Genotype-Tissue Expression (GTEX) tracks.<br />
* Added link to chromInfo page to $db.chrom.sizes files on hgdownload.<br />
* Added a more flexible display of options to hgAi.<br />
<br />
== 24 Feb 2015, v312 ==<br />
* Implemented changes to hgAi suggested in review of JS for new UI framework. <br />
* Fixed hgAi image-loading bug. <br />
* Fixed checkTableCoords to exit error status on error. <br />
* Added tolerance for hash table reaching its max size rather than exiting. <br />
* Added support for bigBed custom tracks. <br />
* Fixed a bug that caused split tables to crash in the browser.<br />
* Extended mouseover to show exon and intron numbers for linkedFeatures tracks, added new trackDb setting "exonNumbers off" that allows track creators to suppress this new feature. <br />
* Added command-line tools to inventory track settings in public hubs (hubTrackSettings) and check urls in a table (checkUrlsInTable).<br />
* Fixed memory leak in pslPosTarget.<br />
* Added genePredFilter program to discard invalid genePreds created by importing data from incorrect GFFs. <br />
* Added ability to suppress auto-updates from outside GBiB.<br />
<br />
== 3 Feb 2015, v311 ==<br />
* Created new UI framework based on ReactJS and ImmutableJS, with Annotation Integrator interface hgAi.<br />
* Fixed the stacked bar chart display to stack in the order the wiggles appear in the order they are listed on the description page.<br />
* Fixed disappearing bigBed items when zoomed way in.<br />
* Added support for both '=' and 'X' in BAM cigar files.<br />
* Fixed bug in subtrack table sort. It will now correctly sort in columns besides the primary sort column.<br />
* Fixed menu link to hgTables so that it automatically selects the selected group track and table.<br />
* Added a "data last updated" value to the schema page in hgTables.<br />
<br />
== 13 Jan 2015, v310 ==<br />
* Added BLAT support to assembly hubs.<br />
* Released first version of hgBeacon CGI (in support of GA4GH) for LOVD and HGMD tracks on hg19.<br />
* Rearranged the way we compile in HAL libraries so testing can be done before installing the libraries. Upgraded to newest HAL library. <br />
* Enabled pslCDnaGenomeMatch.c to take 2bit files as input instead of nib files since 2bit format is more common. <br />
* Added hubPublicCheck to the list of utilities available to users.<br />
* Turned off min/max in wiggle track configuration when autoscale is on.<br />
* Added two new trackDb statements: labelOnFeature and linkIdInName.<br />
* Fixed a bug so that the currently-selected table launches when the Table Browser starts.<br />
* Fixed the instance where the chromosome position was displaying twice in the Web browser tab label.<br />
<br />
= 2014 =<br />
<br />
== 9 Dec 2014, v309 ==<br />
* Allowed for easier discovery of track hub track HTML.<br />
* Allow for missing hg.conf file.<br />
* Added support for bigGenePred in hubCheck and custom tracks.<br />
* Added hubClear URL variable.<br />
* Created beacon program for GA4GH.<br />
* Fixed hgTables to show example missing identifiers.<br />
* Added code for proteomics track support - PeptideAtlas.<br />
* Bolded canonical gene in search results page.<br />
* Fixed bug in hgVai that caused omission of many dbNSFP annotations.<br />
<br />
== 12 Nov 2014, v308 ==<br />
* More perfection of GBiB product.<br />
* Added support for GENCODE filter-by tag and highlight-by tag.<br />
* Fixed custom wiggle tracks on assembly hubs.<br />
* Started work on BAM density plots (turning BAMs into wiggles when they are too dense).<br />
<br />
== 21 Oct 2014, v307 ==<br />
* Added new program (written in both C and in perl): pslScore program to calculate pslScore.<br />
* Added MAF SNP view to MAF display.<br />
* Created new utility, mafToSnpBed, used in construction of tables used in the new SNP-oriented MAF display.<br />
* Fixed bug in chrom ideogram position-selection.<br />
* Fixed a problem in MAF display resulting in empty displays when the start address was 1 or 2.<br />
* Added an option to mafGene to output a unique character for every codon.<br />
* Completed GBiB product for initial release.<br />
<br />
== 30 Sept 2014, v306 ==<br />
* Created tool to lift UniProt annotations to a genome: uniprotLift.<br />
* Created a tool to fix overlapping bed blocks, created by dnax blat and pslMap on protein psl files: bedFixBlockOverlaps.<br />
* Corrected VCFs in hgTracks: detect haploid sequence properly instead of matching "chrY".<br />
* Fixed missing NULL check in hacTree.c.<br />
* Started work on program that makes a track hub for the hg38 DNAse data. <br />
* Ongoing work on parallelized version of hacTree. <br />
* Added "showSnp" mode to MAF display which shows red ticks for mismatches. <br />
* Modified gff3 library to tolerate missing phase keyword. <br />
* Modified hgHubConnect to not redirect to hgGateway if error on hub load.<br />
* Added exon frames if missing in genePredToBigGenePred.<br />
* Stripped tabs from custom track descriptions causing javascript crash.<br />
* Fixed bedToBigBed crash when trying to create an index on a non-existant field.<br />
<br />
== 9 Sept 2014, v305 ==<br />
* Added support for bigGenePred in track hubs.<br />
* Fixed problem that resulted in trackDb include statements bypassing UDC cache.<br />
* Added minMax option to bigWigAverageOverBed.<br />
* Implemented ga4ghToBed to test out GA4GH APIs at Google.<br />
* Created a multithreaded version of the hacTree code for clustering.<br />
* Fixed missing null check in hgTables, also avoid null by looking up trackDb for second table in intersection.<br />
* Fixed missing null check when loading user settings from URL.<br />
* Improved hgBlat error messages for sizes of queries with ignored characters like N.<br />
* Added option to pslCDnaFilter to save filtering statistics to file rather then writing to stderr.<br />
* Started work on schema and loader for GTEx (Genotype Tissue Expression) data tables.<br />
<br />
== 19 Aug 2014, v304 ==<br />
* Added code to distinguish IE-pre-version-11 from IE-post-11. <br />
* Fixed problem for IE11 so users can properly set sub-track visibility.<br />
* Added code to 'spectrum' (aka 'useScore') actually work on any pair of color/altColor settings.<br />
* Added support to Table Browser: export to GenomeSpace.<br />
* Added support for track hubs with "." in the name.<br />
* Allow dash in GFF3 tags.<br />
* Added functionality to bigWigCorrelate so that it can work on a whole list rather than just a pair of bigWigs.<br />
* Added warning to hgTracks when udcTimeout variable is present, and provided link to remove from cart to enhance performance.<br />
* Extended regulation cluster tools to incorporate metadata into DNase clusters.<br />
* Added functionality to SNP display in hgTracks: option to show organism's alleles even when ortho alleles are not available; by default, show alleles on + strand or if user clicks 'reverse', all on - strand (with option to revert to showing dbSNP's strand).<br />
* Added functionality to details page for VCF data: if haplotype clustering is not enabled, don't show 'sorting order' option; show Hardy-Weinberg only when explicitly enabled since it's usually N/A.<br />
* Finished work on the new repeat masker display.<br />
<br />
== 29 July 2014, v303 ==<br />
* Changed all links in navigation bar and on static doc pages to relative links. This makes it so that it is no longer required to install the browser at the top level.<br />
* Removed udcTimeout from the cart at end of hgTracks. This prevents this debugging tag from persisting and slowing down tracks involving bigDataUrls, and track hubs indefinitely.<br />
* Created new program, expData, which takes in expression data and creates a binary tree using a hierarchical agglomerative clustering library. The output is a .json file type intended for d3 visualizations. <br />
* Started work adding new Repeat Masker nesting visualization display code from Robert Hubley.<br />
* Changed the sort order for assembly hubs. Sort according to the orderKey value on the order of databases in assembly hubs. The default is to list in the order in genomes.txt.<br />
* Fixed a bug that listed unconnected unlisted hubs when a hub was broken.<br />
* Refactored hubCheck to support HAL library.<br />
* Fixed a problem where sessions on assembly hubs would not hold onto custom tracks.<br />
* Added "retry hub" error message to hgHubConnect so broken hubs can be disconnected.<br />
* Fixed bug where in some circumstances, users get a blank cookie value.<br />
* Fixed bug in TF ChIP-seq track: suppress display of a motif for a TF peak whose highest-scoring motif lies outside the viewing window.<br />
* Bug fixes for VAI tool: first character of symbolic alt alleles were sometimes skipped; trailing quote was not stripped from FILTER header lines' descriptions; incorrect calls for large deletion; was calling frameshift instead of stop_gained when insertion makes a stop codon.<br />
* Ongoing work on VAI tool: moved regulatory data into own section, suppressed regulatory details from protein-coding consequence lines because they're in regulatory feature lines.<br />
* Ongoing work on GBiB including installation helpers for Windows and OSX.<br />
<br />
== 8 July 2014, v302 ==<br />
* Added regulatory consequence-calling to VAI using ENCODE Clustered TFBS and DNase.<br />
* Fixed a bug in searching where supertracks were not displaying when a search term matched an item in the track.<br />
* Cleaned up JavaScript code based on run through jshint.<br />
* Fixed hgTrackUi to better support ranges in Filter settings. <br />
* Fixed off-by-one pixel height problem with BED files generated by hgGenome.<br />
* Fixed a bug where hiding a track in certain conditions resulted in an error.<br />
* Updated trackDbDoc to make it clear that '-' are not allowed in track names.<br />
* Added 2 new tokens for trackDb URL setting which are for clicked item start and end.<br />
* Sped up image-only reload as seen on FireFox when there are many (50) tracks displayed.<br />
* Added functionality to the chains display to make directional arrows visible more often. <br />
* Added descriptionUrl to hubPublicCheck -addHub command. <br />
* Enabled bigBed item search on native bigBed tracks. <br />
* Ongoing work on GBiB.<br />
<br />
== 17 June 2014, v301 ==<br />
* Enabled searching for tracks with Track Hubs. <br />
* Added support for Watson/Crick mode (i.e. negative values wiggle data types).<br />
* Added copyright notices to several source files.<br />
* Made prototype Tukay plotter and bar grapher.<br />
* Created a BAM filtering program and added to bamFile.c library.<br />
* Created a convertor program: BAM to fastq.<br />
* Added support for details page of in-progress DNase Combined Sites track.<br />
* Created tool hgBedSources, to generate a bed file with id list and an id+name map file from input file of bed items + list of names.<br />
* Changed ENCODE peaks display to be full height when no peak summit ('point source') is defined (when there is peak summit, the peak is full height and the remainder of the item is half height).<br />
* Make-over for Track Hub Portal to make it more user-friendly.<br />
* Fixed bug in border-case drawing problem with snake tracks (HAL data type).<br />
* Added support for mafDot in trackDb to show "." if a base is identical to maf reference.<br />
* Fixed some bugs in splitFileByColumn.<br />
* Compiled source code on Mac OSX Mavericks and fixed issues.<br />
* Edited htmlCheck so that it now tolerates missing HIDDEN and CHECKBOX input types.<br />
* Added back missing dense mode left labels to pgSNP tracks.<br />
* Fixed VAI so it ignores "variants" with no observed variation.<br />
* Fixed VAI so that it stops trying to find a chrom that is not in the input file.<br />
* Added support for assemblies with many sequences (> 100) to checkTableCoords.c.<br />
<br />
== 27 May 2014, v300 ==<br />
* Added new feature: the ability to search track hub info pages.<br />
* Fixed bug in hgTables where BAM position range was including an extra base to the left.<br />
* Fixed duplicate labels in hgTracks when pgSnp and unclustered VCF were in full visibility mode.<br />
* Resolved new compiler warnings for Mac OS X Mavericks (10.9). <br />
* Added option to pslMap to limit mapping alignments to those prefixed with the corresponding input qName. <br />
* Reconfigured hgTracks visibility updates by drop downs in track controls.<br />
* Fixed a bug in the snakes display having to do with new compact mode.<br />
<br />
== 6 May 2014, v299 ==<br />
* Created new display method: stacked overlay mode for multiWigs.<br />
* Tuned performance of initial display of public hub list.<br />
* Compacted the snakes/HAL display.<br />
* Fixed multiple alignment tracks when viewed with the visibility set to squish.<br />
* Fixed bug in main display, where items were being drawn in the wrong place due to a calculation error.<br />
* Suppress "Variant Identifiers" option in VAI for assemblies without a SNP table.<br />
* Fixed bug in hgc where printCustomTrackUrl was called twice for VCF/tabix.<br />
* Fixed bug in hgc for pgSnp data types: don't look ahead to next exon from last exon.<br />
* Detect but tolerate extra tab at end of line from 1000Genomes phase1 VCF files.<br />
* Handle "chr"-less VCF files.<br />
* Ongoing work on Genome Browser in a Box (GBiB) including committing all configuration files.<br />
<br />
== 15 April 2014, v298 ==<br />
* Allow user to select which gene name to display in Ensembl Gene tracks.<br />
* Fixed link to NCBI for hg19.<br />
* Increased buffer size in hubConnect to deal with huge data sets.<br />
* Ongoing work on Genome Browser in a Box (GBiB) including tweaks to VAI, hgLiftOver, and nib file access.<br />
* Made the UI more user-friendly in the Hub Portal by making a click-able expansion of assembly lists.<br />
* Fixed bug in which a symbolic position could get stuck in the cart.<br />
<br />
== 25 March 2014, v297 ==<br />
* Added 100X zoom out button to main tracks image.<br />
* Improved error messages for directly uploaded bigData Custom Tracks.<br />
* Fixed broken VCF intersection in the Table Browser.<br />
* Fixed custom coloring of SNPs by exceptions in hgTracks.<br />
* Added support for (small) plain VCF custom tracks to the: Table Browser, Tracks page, details pages, VAI, and Custom Tracks.<br />
* Trim identical bases on right of indel alleles to prevent false overlap in VCF files in: Tracks page, details pages, VAI.<br />
* Fixed bug in GTF to genePred conversion where an assertion was triggered when the GTF had a spliced stop codon where the exon containing the first part of the stop codon only consisted of the stop codon bases.<br />
* Altered the way ajax return from update of tracks in the hgTracks image is parsed.<br />
* Added hgsid back into the hgTracks URL (as shortened by history plug-in).<br />
* Simple change to prevent top menu from wrapping, which broke the menu.<br />
* Added random session key for greater security of cart data.<br />
* Added bottleneck delay to cartDump.<br />
* Fixed GTF output from Table Browser with BAM Custom Tracks and hubs.<br />
* Fixed bug in bedDetail that prevented long HTML fields.<br />
* Ongoing work on Genome Browser in a Box (GBiB).<br />
<br />
== 4 March 2014, v296 ==<br />
* Fixed two Custom Track issues regarding multiple threads.<br />
* Fixed bug in lavToPsl that was clipping (#12727). Brian<br />
* Fixed crash in multiWig if individual bigWig is not found (#12644). Brian<br />
* Tweaked genePred reader to tolerate missing exon frames (#12674). Brian<br />
* Fixed bug in bigWig summary calculations (#12558). Brian<br />
* Fixed bug in drag-highlight that caused the gene name auto-complete dialog to appear beneath hgTracks images.<br />
* Fixed a Mac-Safari specific bug in the right-click disable of drag-highlight.<br />
* Fixed a problem with custom track bed detail files.<br />
* Allow hubs to use pennant icon.<br />
* Made improvements to hgc and hgTrackUi to support non-human SNPs.<br />
* Ongoing work on Genome Browser in a Box (GBiB) including fixing bug in reading MAFs from stdin; fixing bug with lineFile seeks when on top of udc.<br />
<br />
<br />
== 11 Feb 2014, v295 ==<br />
* New feature: highlight region in the main display.<br />
* Fixed a bug in the main display when there is no P/Q arm information available.<br />
* Automated part of our dbSNP process: locate NCBI assembly report file, use it to map RefSeq contig names to GenBank contig names.<br />
* Changed hgTracks "refresh" behavior to scroll to top of page.<br />
* Changed configure page to present all groups with unified column widths for cleaner look.<br />
* Fixed hgTracks problem when "ruler" track is hidden.<br />
* Fixed FireFox-only bug where back-button return to hgTracks page left visibility drop-downs disabled.<br />
* Added a feature to snake tracks: it now draws a pale yellow bar when there are N's in the query.<br />
* Fixed gff3ToPsl to handle NCBI's gff cigar format.<br />
* Fixed Table Browser's gtf output to work with split-tables.<br />
* Created special chromosome-ordering for hg38 which has many alternate chromosomes.<br />
* Allow an FTP URL with an unencoded "+" character.<br />
* Ongoing work to display of motifs in Transcription Factor cluster tracks.<br />
* Ongoing work on Genome Browser in a Box (GBiB).<br />
<br />
== 21 Jan 2014, v294 ==<br />
* Allow colons in both HTTP and FTP URLs.<br />
* In support of the back button, added the assembly, organism and position to the URL.<br />
* Fixed a bug where sort was alphabetical on numeric values. This caused tracks to jump to unexpected places in the display image.<br />
* Reversed table name check strategy: check normal name first, split tables last.<br />
* Fixed BAM display where color-by-tag was turning blue in large regions.<br />
* Fixed VCF reader to not require '=' in metadata lines in versions before 4.1.<br />
* Added paste/upload options to VAI: accepts rs# IDs.<br />
* Created New tool for bigWig manipulation: bigWigCat.<br />
* Sped up the Table Browser by fetching only the list of chromosomes used by the table instead of all chromosomes.<br />
* Changed Genbank's alignment strategy to remove recognition of repeat-masked sequence.<br />
* Ongoing work on Genome Browser in a Box (GBiB) including a new CGI: hgMirror.<br />
<br />
= 2013 =<br />
== 16 Dec 2013, v293 ==<br />
* Added support for back-button in hgTracks; it now supports history of "positions". Track configuration settings will only reflect current settings.<br />
* Allowed track hubs to be more forgiving of poorly-formatted lines in txt file.<br />
* Fixed a bug in maf frames builder where blocks of one base were being ignored.<br />
* Fixed net.c for FTP subtleties regarding url-encoding and the leading slash in the path requested.<br />
* Fixed a new problem in hgTables caused by a recent bugfix to gffOut. This ensures that the exonCount, exonStarts, and exonEnds match.<br />
* Fixed bug seen only in IE10: white-on-white text in site menus and black-on-blue group headers.<br />
* Minor change to support non-standard UWash Exome Variants VCF data in hgTracks display tool tip.<br />
* Fixed https access problem with ENCODE Experiment matrix.<br />
* Added build scripts to the source tree that were local files in the build account.<br />
* Added generic scripts for all the steps of trash cleaning for mirror site users.<br />
* Ongoing work on VAI: Fixed incorrect use of NMD_transcript_variant: it means 'variant in a transcript that is already subject to NMD, not that the variant causes NMD; Added ability to paste in rs# IDs as variants to annotate; Cleaned up logic for detecting/adding extra left base for indel in VCF; Added hgTrackUi links for tracks offered by hgVai; User request: don't truncate alt allele column so short; Made per-gene-track Artificial Example Variants (not only per-region); Added comment method to annoFormatters; Added comment about no items in region.<br />
* Enabled hyperlinks output for bedGraph and microarray tracks from Table Browser.<br />
* Extended base/difference-coloring code to support LRG Regions, LRG transcripts.<br />
* Ongoing work on Genome Browser in a Box (GBiB).<br />
<br />
== 19 Nov 2013, v292 ==<br />
* Added support for display of motifs in Transcription Factor cluster track, and other factorSource type tracks.<br />
* Set network default read write timeout for tcp connections to 2 minutes.<br />
* Fixed case where strand output in liftOver could be wrong when multiple flag was used with bed6 strands and it hit multiple chains.<br />
* Added IUPAC support to the oligoMatch utility.<br />
* Fixed the hgTables gtf-output frame column, at least for tables that have exonFrames column available like refGene and other genePredExt tables.<br />
* Fixed bug in genePredCheck to recognize case-sensitive chrom names.<br />
* Started work on fixing bugs in source tree so that it can be build on new Mac OSX compiler.<br />
* Ongoing work to support HAL and halSnakes in assembly hubs.<br />
<br />
== 29 Oct 2013, v291 ==<br />
* Fixed ajax code to be tolerant of ISPs that strip newlines from content.<br />
* Fixed a problem where hub assembly lists were not nicely wrapping on hub portal page.<br />
* Removed an invalid assert in Gene Haplotypes feature that assumed that if only one haplotype is found, it is the reference haplotype.<br />
* Rewrote some of makeTrackIndex: replaced mdb and cv access with the lib routines that are much more efficient.<br />
* Fixed a bug (and added tests) in pslOpen that would hang on files with only comments.<br />
* Changed "genes" field in gwasCatalog to a longblob from a varchar(255).<br />
* Updated build system to automatically detect MySQL libraries and build environment.<br />
* Fixed several bugs in anno* libs found by Case Western team.<br />
* Tweaked search position for Artificial Example Variants.<br />
* Updated links to new Galaxy server from Table Browser.<br />
* Ongoing work to support HAL and halSnakes in assembly hubs including adding a chromosome color mode for HAL snakes and adding MAF output in Table Browser for HAL data type. <br />
<br />
== 8 Oct 2013, v290 ==<br />
* Added db.neverLocal to hg.conf to help mirrors debug their mysql problems.<br />
* Fixed a problem with Ubuntu and other systems where load data LOCAL infile has been disabled by default for mysql clients.<br />
* Improved mysql setup documentation regarding both setting the default storage engine to myisam and turning on local-infile in my.cnf.<br />
* Improved paraFetch timeouts and retries.<br />
* Improved hdb.c and hgc to handle bed4 and bed9 when element has size=0 start==end such as a SNP insertion point.<br />
* Fixed a couple of small issues with the login code.<br />
* Added new options to pslCDnaFilter to ignore introns and repeat masked sequence.<br />
* Fixed bug in pslMap where mappings would be missed if multiple mapping alignments mapped to the same rangetree node.<br />
* Added support for version numbers in genbank tables.<br />
* Fixed bug in bigBed item search (bad allocation in bPlusTree code). <br />
* Ongoing improvements to the VAI. <br />
* Ongoing work to support HAL and halSnakes in assembly hubs.<br />
<br />
== 17 Sept 2013, v289 ==<br />
* Fixed item search code.<br />
* Allow hgVai to accept assembly hubs but bow out if no gene track exists.<br />
* Fixed crash in Human Proteins hgc page.<br />
* Fixed a table browser crash when switching to an assembly that didn't have a track group.<br />
* Changed dbList in hubStatus table to be a blob; 255 chars was not big enough.<br />
* Fixed hgLogin buffer overflow error.<br />
* Ongoing work to support HAL and halSnakes in assembly hubs.<br />
* Ongoing work on hgVai.<br />
<br />
== 26 Aug 2013, v288 ==<br />
* Fixed bug in hgLogin, set realName with same value as userName when creating account.<br />
* Fixed HTML encoding in hgLogin.<br />
* Reordered the fields the same way for display of bigBed and BED tracks.<br />
* Fixed noScoreFilter for bigBed to work properly.<br />
* Tuned the performance for the track and file search.<br />
* Fixed lack of base-difference colors in hub BAM tracks and even native BAM tracks that didn't have a whole bunch of extra trackDb settings.<br />
* Fixed hgVai when specifying region with VCF input.<br />
* Fixed hgVai to support assemblies with no SNP tracks. <br />
* Ongoing work on annoStreamDb: use mysql to sort incrementally-updated GenBank tables; fix to support assemblies with thousands of sequences.<br />
* Ongoing work to support HAL and halSnakes in assembly hubs.<br />
<br />
== 5 Aug 2013, v287 ==<br />
* Improved performance of hgTracks search and hgFiles search.<br />
* Added support for 'factorSource' file format.<br />
* Added support for displaying external html pages directly in details pages (iframeUrl and iframeOptions in trackDb settings).<br />
* Removed hard-coded http string from hgLogin and hgSession.<br />
* Changed hgGene to not set the chromosome position in the cart.<br />
* Ongoing work to allow visualization of nested Repeat Masker tracks.<br />
<br />
== 15 July 2013, v286 ==<br />
* Made the userApps easier to build.<br />
* Added Ensembl navigation link to assemblies as appropriate.<br />
* Fixed hgConvert so that it doesn't die if the other assembly is not in dbDb.<br />
* Released final version of the Variant Annotation Integrator (hgVai) and documentation.<br />
* Fixed text display for warning in hgVai and hgTables.<br />
* Used the new NCBI link to display GeneReviews article.<br />
* One more anti-spam trick for hgUserSuggestion.<br />
* Fixed buffer overflow in hgTracks.<br />
* Handle HTTPS for redirects between UCSC and other official mirror sites like genome-euro.<br />
* Fixed obscure bug in dumpstack to use _exit instead of exit so that the child cleanup will not close the mysql connections that are shared with the parent process.<br />
* Fixed a minor bug in pushCarefulMemHandler mutex handling. <br />
<br />
== 25 June 2013, v285 ==<br />
* Released new CGI to act as a Suggestion Box.<br />
* Released new CGI for Variant Annotation Integrator.<br />
* Released new Assembly Hubs feature.<br />
* Simplified process for users to download and/or compile utilities.<br />
* Allowed mutex to work with multiple threads.<br />
* Removed calls that caused duplicate warn handler pop underflow errors.<br />
* Performance enhancements for track search functionality.<br />
* Fixed gene details links to InterPro (#11078) and TreeFam.<br />
* Added new utility: bigWigCorrelate.<br />
* Added multi-view composite support for type pgSnp.<br />
* Ongoing work on bigBed Item Search.<br />
<br />
== 3 June 2013, v284 ==<br />
* Released Gene Haplotype Alleles section for gene details pages, hg19.<br />
* Added features for Publications track: color by topic, impact factor or year, sped it up, various IDs on main table (for VEP), allow OCRed images. <br />
* Added trix search and bigBed item search for track hubs.<br />
* Ongoing work on simplifying the build and number of utilities.<br />
* Ongoing work on Variant Annotation Integrator: hgVai.<br />
<br />
== 13 May 2013, v283 ==<br />
* Added support for HTTPS in hgSession. <br />
* Allowed CGI timing function to be controlled via options in hg.conf file. <br />
* Continued improvements to makefile system; destination binaries now depend upon their objects and the libraries. <br />
* Fixed wigColorBy bug hgTracks right-click update. <br />
* Fixed clipping error in anti-aliased lines that could cause crash. <br />
* Fixed bug in handling of right-click on a hub track. <br />
* Fixed bug in updating hubStatus table with longLabel. <br />
* Fixed a buffer flow problem which caused a crash for bigWig custom tracks. <br />
* Fixed some problems with spaces in labels in bigBeds. <br />
* Sped up bigBed's handling of long records. <br />
* Fixed link from gene details page to Interpro. <br />
* Ongoing work on Variant Annotation Integrator. <br />
* Ongoing work on Assembly Hubs. <br />
* Ongoing work on Gene Haplotype Alleles feature. <br />
<br />
== 22 April 2013, v282 ==<br />
* Overhauled makefile system.<br />
* Created a secure and sharable sendmail utility: mailViaPipe (and used it for hgLogin).<br />
* Fixed bug with cookies in hgLogin.<br />
* Found and fixed a problem with excessive activity in custom tracks by adding a 1000 track limit per submission.<br />
* Added anti-alias lines to browser image.<br />
* Added udcDir option to twoBit utilities.<br />
* Created new utility (bedScore) that assigns (or transforms) score field in BED files, based on values in a selectable column of the file, using user choice of 4 algorithms.<br />
* Started work on a tag-storm format (hierarchical .ra) for use in storing metadata.<br />
* Ongoing work on annoGrator (AVI)<br />
* Refactored and extended gadPos program to use Gencode as another gene symbol lookup.<br />
* Ongoing work on euroNode.<br />
<br />
== 1 April 2013, v281 ==<br />
* Reduced memory use of pslMap by up to two orders of magnitude.<br />
* Added hints about how to improve browser images for publications.<br />
* Fixed bug where session custom tracks are clobbered when original hgsid session modifies CTs.<br />
* Added support for HGVS's chromosome range format with "_".<br />
* Changed the transparency so that it shows up in PDF, and also can be normalized when we add more cell lines.<br />
* Fixed a bug where sometimes zoom summaries would not be written out by bedGraphToBigWig.<br />
* Ongoing work on Variant Annotation Integrator tool: bugfixes, baseline VEP-formatted output.<br />
* Ongoing work on faux centromere generation code. <br />
* Ongoing work on assembly hubs. <br />
<br />
== 11 March 2013, v280 ==<br />
* Fixed problem in hub support for 'parent' setting when used by superTrack 'children'.<br />
* Fixed right-click getDna link to use session ID and database.<br />
* Several display and usage improvements to GENCODE track.<br />
* Started work on transparency code.<br />
* Ongoing work on Variant Annotation Integrator tool.<br />
* Support drag-select in ideogram even if it has only one bar.<br />
* Initial work on Assembly Hubs.<br />
<br />
== 19 Feb. 2013, v279 ==<br />
* Added maxWindowToDraw to dense VCF tracks' trackDb entries.<br />
* New error check in checkTableCoords to avoid segfault on bogus bed12.<br />
* Added missing implementation of TB paste/upload IDs for bigBed.<br />
* Smartened up hgFileUi to recognize when ENCODE restriction timestamps are in effect.<br />
* Enlarged buffer that overflowed with huge trackUi matrices.<br />
* Fixed sort by metadata ordering in the 'Assayed cells' section of details page of Clusters tracks.<br />
* Performance improvement to genbank dumps of all assembly table status dumps: added sleeps.<br />
* Fixed error in bedToPsl.<br />
* Fixed udc cache location problem related to saved sessions and early cart processing.<br />
* Fixed bigBedIntervalQuery and hgc to not drop 0-length items (insertions).<br />
* Fixed invalid assert in hgTrackUI JavaScript code that caused weird error messages for certain tracks.<br />
* Ongoing work on Variant Annotation Finder.<br />
* Ongoing work on Gene Alleles.<br />
* Ongoing work on Haplosufficiency Map.<br />
* Ongoing work on faux centromere generation code.<br />
<br />
== 29 Jan. 2013, v278 ==<br />
* Continued work on faux-centromere generation code.<br />
* Fixed bedGraphToBigWig to make it handle large input better.<br />
* Added CGI timing measurement for all CGIs to apache error_log.<br />
* When Apache kills timed-out CGIs, we now trap SIGTERM and call exit to allow atexit to clean up.<br />
* euroNode/mirrors menu: Added support for database table-based GUI menu items.<br />
* Fixed unitProt's links back to the Genome Browser.<br />
* Removed link to the no longer maintained Stanford SOURCE database from refGene and few other places.<br />
* Increased limit of trackDb.txt file to 64Mb.<br />
* Made Genbank tracks fail more gently when sequence is missing (ie. when updates are turned off).<br />
* Tweaks to VCF: tolerate Complete Genomics's AN=0 records.<br />
* Accommodate missing name/ID in DGV track details page.<br />
* Fixed Table Browser to display example identifiers on paste/upload page for bigBed data type.<br />
* Changed hgc peak cluster handler to sort on metadata.<br />
* Changed hgTrackUi subtrack list default to 'all' if there is no matrix.<br />
<br />
== 7 Jan. 2013, v277 ==<br />
* Continued work on Variant Annotation Finder (added bigWig support).<br />
* Added code to re-try a hub with an error message after a hg.conf configurable amount of time has passed.<br />
* Fixed a problem in the table browser when track hubs have tables with dots in them. <br />
* Used threads to process bigDataUrl customTracks in parallel.<br />
* Set CHARSET=iso-8859-1 for all hgLogin forms.<br />
* Fixed bug in searching for cytoBand identifiers on mouse browsers.<br />
<br />
<br />
Visit the [[Genome_Browser_Software_Features_%282008-2012%29]] page for features released in the 2008-2013 time frame.<br />
<br />
[[Category:Browser Development]]</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Chains_Nets&diff=25871Chains Nets2021-10-26T07:45:33Z<p>Max: /* FAQs? */</p>
<hr />
<div>Chains and nets are higher-level collections of basic pairwise sequence alignments. Cross-species nets are used to make a single-coverage (on the reference genome) collection of pairwise alignments that are the bases of our Multiz multi-species alignments in the Conservation track. The chain and net algorithms, as well as results from human-mouse alignments, were [[http://www.pnas.org/cgi/content/full/100/20/11484 published]] in 2002. They are generated from genomic local alignments computed by [[Blastz]] (2002-2008) or [[Lastz]] (2008-) post-processed by a series of UCSC programs, most notably axtChain, chainNet and netFilter.<br />
<br />
The contents of this page are from [[User:AngieHinrichs|Angie]]'s mental model of chains and nets and represent opinions which may be outdated or plain old incorrect. The source code, and the results that we get by running these programs on real data, are the ultimate source of truth about chains and nets. <br />
<br />
Please keep in mind that the outputs of any alignment algorithm are not the final Truth about homology between sequences. The scoring system and other parameters of any alignment algorithm are designed to produce high scores for similarities that would likely result from some model of nucleotide-level evolution; tweaking a parameter can change the results significantly. The quality and completeness of the reference assemblies also affect alignment results. That said, chains and nets are powerful constructs for identifying similarities over very large regions of the genome, and inferring chromosomal rearrangements that may have occurred as the two sequences diverged from a common ancestral sequence.<br />
<br />
== Basic definitions ==<br />
<br />
In chain and net lingo, the '''target''' is the reference genome sequence and the '''query''' is some other genome sequence. For example, if you are viewing Human-Mouse alignments in the Human genome browser, human is the target and mouse is the query.<br />
<br />
A '''gapless block''' is a base-for-base alignment between part of the target and part of the query, possibly including mismatching bases. It has the same length in bases on the target and the query. This is the output of the most primitive alignment algorithms. <br />
<br />
A '''gap''' is a link between two gapless blocks, indicating that the target or the query has sequence that should be skipped over in order to make the best-scoring alignment. In other words, the scoring penalty for skipping over one or more bases is less than the penalty for continuing to align the sequences without skipping. <br />
<br />
A '''single-sided gap''' is a gap in which sequence in either target or query must be skipped over. A plausible explanation for needing to skip over a base in the target while not skipping a base in the query is that either the target has an inserted base or the query has a deleted base. Many alignment tools produce alignments with single-sided gaps between gapless blocks. <br />
<br />
A '''double-sided gap''' skips over sequence in both target and query because the sum of penalties for mismatching bases exceeds the penalty for extending a gap across them. This is possible only when the penalty for extending a gap is less than the penalty for creating a new gap and less than the penalty for a mismatch, and when the alignment algorithm is capable of considering double-sided gaps. <br />
<br />
== Chains in a nutshell ==<br />
<br />
A '''chain''' is a sequence of non-overlapping gapless blocks, with single- or double-sided gaps between blocks. Within a chain, target and query coords are monotonically non-decreasing (i.e. always increasing or flat). Chains are constructed by the axtChain program which finds pairwise alignments with the same target and query sequence, on the same strand, that can be merged if overlapping and joined into one longer alignment with a higher score under an affine gap-scoring system (progressively decreasing penalties for longer gaps).<br />
<br />
* double-sided gaps are a new capability (blastz can't do that) that allow extremely long chains to be constructed.<br />
* not just orthologs, but paralogs too, can result in good chains. but that's useful!<br />
* chains should be symmetrical -- e.g. swap human-mouse -> mouse-human chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments. However, [[Blastz]]'s dynamic masking is asymmetrical, so in practice those results are not exactly symmetrical. Also, dynamic masking in conjunction with changed chunk sizes can cause differences in results from one run to the next.<br />
* chained blastz alignments are not single-coverage in either target or query unless some subsequent filtering (like netting) is done. <br />
* chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs).<br />
<br />
== Nets in a nutshell ==<br />
<br />
A '''net''' is a hierarchical collection of chains, with the highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, which in turn may have gaps filled in by lower-level chains and so on. <br />
<br />
* I think a chain's qName also helps to determine which level it lands in, i.e. it makes a difference whether a chain's qName is the same as the top-level chain's qName or not, because the levels have meanings associated with them -- see details page. <br />
* a net is single-coverage for target but not for query, unless it has been filtered to be single-coverage on both target and query. By convention we add "rbest" to the net filename in that case.<br />
* because it's single-coverage in the target, it's no longer symmetrical.<br />
* the netter has two outputs, one of which we usually ignore: the target-centric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target single-cov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and target coords. Reciprocal-best nets are symmetrical again. <br />
* nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level.<br />
<br />
"LiftOver chains" are actually chains extracted from nets, or chains filtered by the netting process. <br />
<br />
== History ==<br />
<br />
Chains and nets are [[User:Jimkent|Jim Kent]]'s brainchild, building on joint work with blastz author Scott Schwartz. <br />
<br />
Cross-species chains and nets used to be generated by a long manual process documented in some of our older makeDb/doc/*.txt files, but since ~2006 they have been generated by the script kent/src/hg/utils/automation/doBlastzChainNet.pl .<br />
<br />
Same-species liftOver chains use blat -fastMap as the alignment method, and are generated by kent/src/hg/utils/automation/doSameSpeciesLiftOver.pl, based on a series of scripts that [[User:Kate|Kate]] wrote in kent/src/hg/makeDb/makeLoChain/.<br />
<br />
== FAQs? ==<br />
<br />
The original publication describing chains and net, Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes (Kent et al., PNAS September 30, 2003 100 (20) 11484-11489), might be helpful for understanding the rationale behind the process.<br />
<br />
> It’s somehow opposite with the alignment file, right? For example, the psl file records info of sequences from query assembly mapping agains sequences <br />
> from target assembly.<br />
<br />
The PSL format lists query coords before target coords, but the UCSC convention is that the target is the reference genome (on which the Genome Browser displays) and the query is the 'other' (the query could be mRNA from the same species as the target/reference genome, or could be another species). The PSL format is the native output format of BLAT, which differs from BLAST in that BLAT indexes the target and scans the query while BLAST indexes the query and scans the target. Those are pretty esoteric differences, but I think they help explain why the UCSC view of target and query might be the opposite of some others.<br />
<br />
> And then collapse the chains to one.<br />
<br />
The chains are not collapsed to one; that would violate the constraint on a chain that it has monotonically increasing coordinates on the same target and query sequences in the same orientation. In the net, the two chains are retained at different levels (the primary alignment is at the top level and the secondary alignment is at a lower level). When the netChainSubset program extracts the liftOver chains from the net and full set of chains, it outputs the complete primary chain, but it outputs only a portion of the secondary chain.<br />
<br />
> I don’t understand why use the secondary alignment to fill the gap in the primary alignment.<br />
<br />
Chains and nets were designed to capture medium-to-large-scale rearrangements during evolution of species from a common ancestor: duplications, inversions, translocations. For example, in the case of an inversion, we would expect a gap in the top-level alignment to coincide with an alignment to the opposite strand of the same sequence (and similar breakpoints). With a translocation, a gap in the top-level alignment might be "filled" by an alignment to some other chromosome. When there is a duplication in the target, the target might have two (diverged) copies of the ancestral sequence that align to the same un-duplicated location in the query. In that case, the top-level chain would have a gap that is filled by an alignment to the same location on the query (single-coverage on the target, but multiple coverage on the query).<br />
<br />
> I suppose chain should reflect the true difference between two assembly. Say the contig a is actually corresponding to the primary hit region in <br />
> hg19. Here if the gap is filled as described above to generate a chain, wouldn’t that cause the gapped bases from hg19 being lifted to <br />
> a false corresponding region in contig?<br />
<br />
All bioinformatics algorithms are attempts to approximate the truth, and they fall short, especially when their assumptions and parameters are not tuned exactly right for the question at hand. If alignment parameters are overly sensitive for the actual divergence of target and query, then yes, spurious alignments will probably appear in the results. But there may be other explanations for unexpected alignments. (Assembly errors, sequencing errors, unexpected variation, some new discovery for you to make....) There is no simple answer that applies to all situations, and no substitute for trying different parameters and methods and carefully examining the results to see what works best for your particular application. It may help to make custom tracks using our bigChain format so you can examine and compare results in the Genome Browser.<br />
<br />
> I’ve recently discovered that we can use minimap2 to generate alignment file and convert to psl and use that psl to generate chain file. If I can <br />
> solve the multiple-mapping issue in the alignment file level, I don’t need to perform netChainSubset right?<br />
<br />
Perhaps -- that is up to you to determine. Best wishes for your research! If you know results of an evaluation using minimap2 versus lastz, please contact us at genome@soe.ucsc.edu.<br />
<br />
Navigation: back to [[Implementation_Notes]]<br />
<br />
[[Category:Technical FAQ]]<br />
[[Category:Comparative Genomics]]</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Cart_editing&diff=25836Cart editing2021-06-01T15:45:49Z<p>Max: Created page with "If we make a new superTrack and move an existing track into it, then we have a problem with old sessions, ones that predate this change: If you load an old saved session, the..."</p>
<hr />
<div>If we make a new superTrack and move an existing track into it, then we have a problem with old sessions, ones that predate this change:<br />
<br />
If you load an old saved session, the was saved before the superTrack-move and the session has this track on, then the track disappears if you load the session after the superTrack-move, and there is no error at all. <br />
<br />
This is because the superTracks is not set to "show" in the old cart.<br />
<br />
Cart editing solves this. Cart editing consists of two parts:<br />
<br />
* a version number for trackDb, always written into all saved carts, so we can distinguish old from new carts.<br />
* a piece of C code (in cartEdit.c called from cart.c) that adds a new cart variable, always when a cart that is loaded, anytime or anywhere from any CGI and that cart is is older than the current trackDb.<br />
<br />
So, before you move a track into a new superTrack you must do one thing:<br />
<br />
ask an engineer to add code to cartEdit.c<br />
<br />
Then wait until the code goes out the normal way to the RR. After the release, any time, you can do these two things:<br />
<br />
1) increase the version number in the top-level trackDb.ra file, at the top<br />
2) add the superTrack and move the old track into it</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Bin_indexing_system&diff=25835Bin indexing system2021-05-27T09:59:27Z<p>Max: /* Python and Ruby */</p>
<hr />
<div>==Introduction==<br />
<br />
The binning index system used in the genome browser is a mechanism used in concert with MySQL indexes to speed up selection of MySQL rows for genome coordinate overlapping items. This type of search is sometimes called a ''range request''. The system as first used in the genome browser is described in: <em>"The Human Genome Browser at UCSC"</em><br />
[http://genome.cshlp.org/content/12/6/996.full Kent, et. al. Genome Research 2002.12:996-1006], see [http://genome.cshlp.org/content/12/6/996/F7.expansion.html Figure 7], quote:<br />
<br />
We settled on a binning scheme suggested by Lincoln Stein and Richard Durbin. A simple version of this scheme is shown in<br />
Figure7. In the browser itself, we use five different sizes of bins: 128 kb, 1 Mb, 8 Mb, 64 Mb, and 512 Mb.<br />
<br />
That initial implementation has since been enhanced by an additional level of bins to allow items of size up to 4 Gb (actually only to 2Gb given integer size limits). The new and the old system coexist together. Given an item with a chromEnd coordinate of less than or equal to 512 Mb, a bin number in the old system will be used. An item with a chromEnd coordinate greater than 512 Mb, a bin number in the new system will be used.<br />
<br />
Since all of these bins are in sizes of powers of two, the calculation of the bin number is a simple matter of bit shifting of the chromStart and chromEnd coordinates. The C code for the bin calculation can be seen in the kent source tree in ''src/lib/binRange.c''.<br />
<br />
==Initial implementation==<br />
<br />
Used when chromEnd is less than or equal to 536,870,912 = 2<sup>29</sup><br />
<br />
<TABLE BORDER=1><TR><TD COLSPAN=2>&nbsp;</TD><TH COLSPAN=2>bin numbers</TH><TH>bin</TH></TR><br />
<TR><TH>level</TH><TH>#bins</TH><TH>start</TH><TH>end</TH><TH>size</TH></TR><br />
<TR><TD>0</TD><TD>1</TD><TD>0</TD><TD>0</TD><TD>512 Mb</TD></TR><br />
<TR><TD>1</TD><TD>8</TD><TD>1</TD><TD>8</TD><TD>64 Mb</TD></TR><br />
<TR><TD>2</TD><TD>64</TD><TD>9</TD><TD>72</TD><TD>8 Mb</TD></TR><br />
<TR><TD>3</TD><TD>512</TD><TD>73</TD><TD>584</TD><TD>1 Mb</TD></TR><br />
<TR><TD>4</TD><TD>4096</TD><TD>585</TD><TD>4680</TD><TD>128 kb</TD></TR><br />
</TABLE><br />
<br />
==Extended implementation==<br />
<br />
Used when chromEnd is greater than 536,870,912 = 2<sup>29</sup><br />
and less than 2,147,483,647 = 2<sup>31</sup> - 1<br />
<br />
<TABLE BORDER=1><TR><TD COLSPAN=2>&nbsp;</TD><TH COLSPAN=2>bin numbers</TH><TH>bin</TH></TR><br />
<TR><TH>level</TH><TH>#bins</TH><TH>start</TH><TH>end</TH><TH>size</TH></TR><br />
<TR><TD>0</TD><TD>1</TD><TD>4691</TD><TD>4691</TD><TD>2 Gb</TD></TR><br />
<TR><TD>1</TD><TD>8</TD><TD>4683</TD><TD>4685</TD><TD>512 Mb</TD></TR><br />
<TR><TD>2</TD><TD>64</TD><TD>4698</TD><TD>4721</TD><TD>64 Mb</TD></TR><br />
<TR><TD>3</TD><TD>512</TD><TD>4818</TD><TD>5009</TD><TD>8 Mb</TD></TR><br />
<TR><TD>4</TD><TD>4,096</TD><TD>5778</TD><TD>7313</TD><TD>1 Mb</TD></TR><br />
<TR><TD>5</TD><TD>32,768</TD><TD>13458</TD><TD>25745</TD><TD>128 kb</TD></TR><br />
</TABLE><br />
<br />
==Initial implementation C code==<br />
<br />
<PRE><br />
/* This file is copyright 2002 Jim Kent, but license is hereby<br />
* granted for all use - public, private or commercial. */<br />
<br />
static int binOffsets[] = {512+64+8+1, 64+8+1, 8+1, 1, 0};<br />
#define _binFirstShift 17 /* How much to shift to get to finest bin. */<br />
#define _binNextShift 3 /* How much to shift to get to next larger bin.<br />
<br />
static int binFromRangeStandard(int start, int end)<br />
/* Given start,end in chromosome coordinates assign it<br />
* a bin. There's a bin for each 128k segment, for each<br />
* 1M segment, for each 8M segment, for each 64M segment,<br />
* and for each chromosome (which is assumed to be less than<br />
* 512M.) A range goes into the smallest bin it will fit in. */<br />
{<br />
int startBin = start, endBin = end-1, i;<br />
startBin >>= _binFirstShift;<br />
endBin >>= _binFirstShift;<br />
for (i=0; i<ArraySize(binOffsets); ++i)<br />
{<br />
if (startBin == endBin)<br />
return binOffsets[i] + startBin;<br />
startBin >>= _binNextShift;<br />
endBin >>= _binNextShift;<br />
}<br />
errAbort("start %d, end %d out of range in findBin (max is 512M)", start, end);<br />
return 0;<br />
}<br />
</PRE><br />
<br />
==Extended implementation C code==<br />
<br />
<PRE><br />
/* This file is copyright 2002 Jim Kent, but license is hereby<br />
* granted for all use - public, private or commercial. */<br />
<br />
/* add one new level to get coverage past chrom sizes of 512 Mb<br />
* effective limit is now the size of an integer since chrom start<br />
* and end coordinates are always being used in int's == 2Gb-1 */<br />
static int binOffsetsExtended[] =<br />
{4096+512+64+8+1, 512+64+8+1, 64+8+1, 8+1, 1, 0};<br />
<br />
static int binFromRangeExtended(int start, int end)<br />
/* Given start,end in chromosome coordinates assign it<br />
* a bin. There's a bin for each 128k segment, for each<br />
* 1M segment, for each 8M segment, for each 64M segment,<br />
* for each 512M segment, and one top level bin for 4Gb.<br />
* Note, since start and end are int's, the practical limit<br />
* is up to 2Gb-1, and thus, only four result bins on the second<br />
* level.<br />
* A range goes into the smallest bin it will fit in. */<br />
{<br />
int startBin = start, endBin = end-1, i;<br />
startBin >>= _binFirstShift;<br />
endBin >>= _binFirstShift;<br />
for (i=0; i<ArraySize(binOffsetsExtended); ++i)<br />
{<br />
if (startBin == endBin)<br />
return _binOffsetOldToExtended + binOffsetsExtended[i] + startBin;<br />
startBin >>= _binNextShift;<br />
endBin >>= _binNextShift;<br />
}<br />
errAbort("start %d, end %d out of range in findBin (max is 2Gb)", start, end);<br />
return 0;<br />
}<br />
</PRE><br />
<br />
== Unix Command Line, Perl, Python and Ruby ==<br />
<br />
A generic Perl script for any Unix command line, by Heng Li:<br />
<br />
http://lh3lh3.users.sourceforge.net/ucsc-mysql.shtml<br />
<br />
For Python, see:<br />
<br />
https://github.com/brentp/cruzdb/blob/master/cruzdb/__init__.py#L489<br />
<br />
For Ruby, see:<br />
<br />
https://github.com/misshie/bioruby-ucsc-api/blob/1915b710a9064209dcffd8eef39bd548ad199fc6/lib/bio-ucsc/ucsc_bin.rb<br />
<br />
[[Category:Technical FAQ]]</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Bin_indexing_system&diff=25834Bin indexing system2021-05-27T09:59:07Z<p>Max: /* Perl, Python, Ruby and Unix Command Line */</p>
<hr />
<div>==Introduction==<br />
<br />
The binning index system used in the genome browser is a mechanism used in concert with MySQL indexes to speed up selection of MySQL rows for genome coordinate overlapping items. This type of search is sometimes called a ''range request''. The system as first used in the genome browser is described in: <em>"The Human Genome Browser at UCSC"</em><br />
[http://genome.cshlp.org/content/12/6/996.full Kent, et. al. Genome Research 2002.12:996-1006], see [http://genome.cshlp.org/content/12/6/996/F7.expansion.html Figure 7], quote:<br />
<br />
We settled on a binning scheme suggested by Lincoln Stein and Richard Durbin. A simple version of this scheme is shown in<br />
Figure7. In the browser itself, we use five different sizes of bins: 128 kb, 1 Mb, 8 Mb, 64 Mb, and 512 Mb.<br />
<br />
That initial implementation has since been enhanced by an additional level of bins to allow items of size up to 4 Gb (actually only to 2Gb given integer size limits). The new and the old system coexist together. Given an item with a chromEnd coordinate of less than or equal to 512 Mb, a bin number in the old system will be used. An item with a chromEnd coordinate greater than 512 Mb, a bin number in the new system will be used.<br />
<br />
Since all of these bins are in sizes of powers of two, the calculation of the bin number is a simple matter of bit shifting of the chromStart and chromEnd coordinates. The C code for the bin calculation can be seen in the kent source tree in ''src/lib/binRange.c''.<br />
<br />
==Initial implementation==<br />
<br />
Used when chromEnd is less than or equal to 536,870,912 = 2<sup>29</sup><br />
<br />
<TABLE BORDER=1><TR><TD COLSPAN=2>&nbsp;</TD><TH COLSPAN=2>bin numbers</TH><TH>bin</TH></TR><br />
<TR><TH>level</TH><TH>#bins</TH><TH>start</TH><TH>end</TH><TH>size</TH></TR><br />
<TR><TD>0</TD><TD>1</TD><TD>0</TD><TD>0</TD><TD>512 Mb</TD></TR><br />
<TR><TD>1</TD><TD>8</TD><TD>1</TD><TD>8</TD><TD>64 Mb</TD></TR><br />
<TR><TD>2</TD><TD>64</TD><TD>9</TD><TD>72</TD><TD>8 Mb</TD></TR><br />
<TR><TD>3</TD><TD>512</TD><TD>73</TD><TD>584</TD><TD>1 Mb</TD></TR><br />
<TR><TD>4</TD><TD>4096</TD><TD>585</TD><TD>4680</TD><TD>128 kb</TD></TR><br />
</TABLE><br />
<br />
==Extended implementation==<br />
<br />
Used when chromEnd is greater than 536,870,912 = 2<sup>29</sup><br />
and less than 2,147,483,647 = 2<sup>31</sup> - 1<br />
<br />
<TABLE BORDER=1><TR><TD COLSPAN=2>&nbsp;</TD><TH COLSPAN=2>bin numbers</TH><TH>bin</TH></TR><br />
<TR><TH>level</TH><TH>#bins</TH><TH>start</TH><TH>end</TH><TH>size</TH></TR><br />
<TR><TD>0</TD><TD>1</TD><TD>4691</TD><TD>4691</TD><TD>2 Gb</TD></TR><br />
<TR><TD>1</TD><TD>8</TD><TD>4683</TD><TD>4685</TD><TD>512 Mb</TD></TR><br />
<TR><TD>2</TD><TD>64</TD><TD>4698</TD><TD>4721</TD><TD>64 Mb</TD></TR><br />
<TR><TD>3</TD><TD>512</TD><TD>4818</TD><TD>5009</TD><TD>8 Mb</TD></TR><br />
<TR><TD>4</TD><TD>4,096</TD><TD>5778</TD><TD>7313</TD><TD>1 Mb</TD></TR><br />
<TR><TD>5</TD><TD>32,768</TD><TD>13458</TD><TD>25745</TD><TD>128 kb</TD></TR><br />
</TABLE><br />
<br />
==Initial implementation C code==<br />
<br />
<PRE><br />
/* This file is copyright 2002 Jim Kent, but license is hereby<br />
* granted for all use - public, private or commercial. */<br />
<br />
static int binOffsets[] = {512+64+8+1, 64+8+1, 8+1, 1, 0};<br />
#define _binFirstShift 17 /* How much to shift to get to finest bin. */<br />
#define _binNextShift 3 /* How much to shift to get to next larger bin.<br />
<br />
static int binFromRangeStandard(int start, int end)<br />
/* Given start,end in chromosome coordinates assign it<br />
* a bin. There's a bin for each 128k segment, for each<br />
* 1M segment, for each 8M segment, for each 64M segment,<br />
* and for each chromosome (which is assumed to be less than<br />
* 512M.) A range goes into the smallest bin it will fit in. */<br />
{<br />
int startBin = start, endBin = end-1, i;<br />
startBin >>= _binFirstShift;<br />
endBin >>= _binFirstShift;<br />
for (i=0; i<ArraySize(binOffsets); ++i)<br />
{<br />
if (startBin == endBin)<br />
return binOffsets[i] + startBin;<br />
startBin >>= _binNextShift;<br />
endBin >>= _binNextShift;<br />
}<br />
errAbort("start %d, end %d out of range in findBin (max is 512M)", start, end);<br />
return 0;<br />
}<br />
</PRE><br />
<br />
==Extended implementation C code==<br />
<br />
<PRE><br />
/* This file is copyright 2002 Jim Kent, but license is hereby<br />
* granted for all use - public, private or commercial. */<br />
<br />
/* add one new level to get coverage past chrom sizes of 512 Mb<br />
* effective limit is now the size of an integer since chrom start<br />
* and end coordinates are always being used in int's == 2Gb-1 */<br />
static int binOffsetsExtended[] =<br />
{4096+512+64+8+1, 512+64+8+1, 64+8+1, 8+1, 1, 0};<br />
<br />
static int binFromRangeExtended(int start, int end)<br />
/* Given start,end in chromosome coordinates assign it<br />
* a bin. There's a bin for each 128k segment, for each<br />
* 1M segment, for each 8M segment, for each 64M segment,<br />
* for each 512M segment, and one top level bin for 4Gb.<br />
* Note, since start and end are int's, the practical limit<br />
* is up to 2Gb-1, and thus, only four result bins on the second<br />
* level.<br />
* A range goes into the smallest bin it will fit in. */<br />
{<br />
int startBin = start, endBin = end-1, i;<br />
startBin >>= _binFirstShift;<br />
endBin >>= _binFirstShift;<br />
for (i=0; i<ArraySize(binOffsetsExtended); ++i)<br />
{<br />
if (startBin == endBin)<br />
return _binOffsetOldToExtended + binOffsetsExtended[i] + startBin;<br />
startBin >>= _binNextShift;<br />
endBin >>= _binNextShift;<br />
}<br />
errAbort("start %d, end %d out of range in findBin (max is 2Gb)", start, end);<br />
return 0;<br />
}<br />
</PRE><br />
<br />
== Python and Ruby ==<br />
<br />
A generic Perl script for any Unix command line, by Heng Li:<br />
<br />
http://lh3lh3.users.sourceforge.net/ucsc-mysql.shtml<br />
<br />
For Python, see:<br />
<br />
https://github.com/brentp/cruzdb/blob/master/cruzdb/__init__.py#L489<br />
<br />
For Ruby, see:<br />
<br />
https://github.com/misshie/bioruby-ucsc-api/blob/1915b710a9064209dcffd8eef39bd548ad199fc6/lib/bio-ucsc/ucsc_bin.rb<br />
<br />
[[Category:Technical FAQ]]</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Bin_indexing_system&diff=25833Bin indexing system2021-05-27T09:54:14Z<p>Max: </p>
<hr />
<div>==Introduction==<br />
<br />
The binning index system used in the genome browser is a mechanism used in concert with MySQL indexes to speed up selection of MySQL rows for genome coordinate overlapping items. This type of search is sometimes called a ''range request''. The system as first used in the genome browser is described in: <em>"The Human Genome Browser at UCSC"</em><br />
[http://genome.cshlp.org/content/12/6/996.full Kent, et. al. Genome Research 2002.12:996-1006], see [http://genome.cshlp.org/content/12/6/996/F7.expansion.html Figure 7], quote:<br />
<br />
We settled on a binning scheme suggested by Lincoln Stein and Richard Durbin. A simple version of this scheme is shown in<br />
Figure7. In the browser itself, we use five different sizes of bins: 128 kb, 1 Mb, 8 Mb, 64 Mb, and 512 Mb.<br />
<br />
That initial implementation has since been enhanced by an additional level of bins to allow items of size up to 4 Gb (actually only to 2Gb given integer size limits). The new and the old system coexist together. Given an item with a chromEnd coordinate of less than or equal to 512 Mb, a bin number in the old system will be used. An item with a chromEnd coordinate greater than 512 Mb, a bin number in the new system will be used.<br />
<br />
Since all of these bins are in sizes of powers of two, the calculation of the bin number is a simple matter of bit shifting of the chromStart and chromEnd coordinates. The C code for the bin calculation can be seen in the kent source tree in ''src/lib/binRange.c''.<br />
<br />
==Initial implementation==<br />
<br />
Used when chromEnd is less than or equal to 536,870,912 = 2<sup>29</sup><br />
<br />
<TABLE BORDER=1><TR><TD COLSPAN=2>&nbsp;</TD><TH COLSPAN=2>bin numbers</TH><TH>bin</TH></TR><br />
<TR><TH>level</TH><TH>#bins</TH><TH>start</TH><TH>end</TH><TH>size</TH></TR><br />
<TR><TD>0</TD><TD>1</TD><TD>0</TD><TD>0</TD><TD>512 Mb</TD></TR><br />
<TR><TD>1</TD><TD>8</TD><TD>1</TD><TD>8</TD><TD>64 Mb</TD></TR><br />
<TR><TD>2</TD><TD>64</TD><TD>9</TD><TD>72</TD><TD>8 Mb</TD></TR><br />
<TR><TD>3</TD><TD>512</TD><TD>73</TD><TD>584</TD><TD>1 Mb</TD></TR><br />
<TR><TD>4</TD><TD>4096</TD><TD>585</TD><TD>4680</TD><TD>128 kb</TD></TR><br />
</TABLE><br />
<br />
==Extended implementation==<br />
<br />
Used when chromEnd is greater than 536,870,912 = 2<sup>29</sup><br />
and less than 2,147,483,647 = 2<sup>31</sup> - 1<br />
<br />
<TABLE BORDER=1><TR><TD COLSPAN=2>&nbsp;</TD><TH COLSPAN=2>bin numbers</TH><TH>bin</TH></TR><br />
<TR><TH>level</TH><TH>#bins</TH><TH>start</TH><TH>end</TH><TH>size</TH></TR><br />
<TR><TD>0</TD><TD>1</TD><TD>4691</TD><TD>4691</TD><TD>2 Gb</TD></TR><br />
<TR><TD>1</TD><TD>8</TD><TD>4683</TD><TD>4685</TD><TD>512 Mb</TD></TR><br />
<TR><TD>2</TD><TD>64</TD><TD>4698</TD><TD>4721</TD><TD>64 Mb</TD></TR><br />
<TR><TD>3</TD><TD>512</TD><TD>4818</TD><TD>5009</TD><TD>8 Mb</TD></TR><br />
<TR><TD>4</TD><TD>4,096</TD><TD>5778</TD><TD>7313</TD><TD>1 Mb</TD></TR><br />
<TR><TD>5</TD><TD>32,768</TD><TD>13458</TD><TD>25745</TD><TD>128 kb</TD></TR><br />
</TABLE><br />
<br />
==Initial implementation C code==<br />
<br />
<PRE><br />
/* This file is copyright 2002 Jim Kent, but license is hereby<br />
* granted for all use - public, private or commercial. */<br />
<br />
static int binOffsets[] = {512+64+8+1, 64+8+1, 8+1, 1, 0};<br />
#define _binFirstShift 17 /* How much to shift to get to finest bin. */<br />
#define _binNextShift 3 /* How much to shift to get to next larger bin.<br />
<br />
static int binFromRangeStandard(int start, int end)<br />
/* Given start,end in chromosome coordinates assign it<br />
* a bin. There's a bin for each 128k segment, for each<br />
* 1M segment, for each 8M segment, for each 64M segment,<br />
* and for each chromosome (which is assumed to be less than<br />
* 512M.) A range goes into the smallest bin it will fit in. */<br />
{<br />
int startBin = start, endBin = end-1, i;<br />
startBin >>= _binFirstShift;<br />
endBin >>= _binFirstShift;<br />
for (i=0; i<ArraySize(binOffsets); ++i)<br />
{<br />
if (startBin == endBin)<br />
return binOffsets[i] + startBin;<br />
startBin >>= _binNextShift;<br />
endBin >>= _binNextShift;<br />
}<br />
errAbort("start %d, end %d out of range in findBin (max is 512M)", start, end);<br />
return 0;<br />
}<br />
</PRE><br />
<br />
==Extended implementation C code==<br />
<br />
<PRE><br />
/* This file is copyright 2002 Jim Kent, but license is hereby<br />
* granted for all use - public, private or commercial. */<br />
<br />
/* add one new level to get coverage past chrom sizes of 512 Mb<br />
* effective limit is now the size of an integer since chrom start<br />
* and end coordinates are always being used in int's == 2Gb-1 */<br />
static int binOffsetsExtended[] =<br />
{4096+512+64+8+1, 512+64+8+1, 64+8+1, 8+1, 1, 0};<br />
<br />
static int binFromRangeExtended(int start, int end)<br />
/* Given start,end in chromosome coordinates assign it<br />
* a bin. There's a bin for each 128k segment, for each<br />
* 1M segment, for each 8M segment, for each 64M segment,<br />
* for each 512M segment, and one top level bin for 4Gb.<br />
* Note, since start and end are int's, the practical limit<br />
* is up to 2Gb-1, and thus, only four result bins on the second<br />
* level.<br />
* A range goes into the smallest bin it will fit in. */<br />
{<br />
int startBin = start, endBin = end-1, i;<br />
startBin >>= _binFirstShift;<br />
endBin >>= _binFirstShift;<br />
for (i=0; i<ArraySize(binOffsetsExtended); ++i)<br />
{<br />
if (startBin == endBin)<br />
return _binOffsetOldToExtended + binOffsetsExtended[i] + startBin;<br />
startBin >>= _binNextShift;<br />
endBin >>= _binNextShift;<br />
}<br />
errAbort("start %d, end %d out of range in findBin (max is 2Gb)", start, end);<br />
return 0;<br />
}<br />
</PRE><br />
<br />
== Python and Ruby ==<br />
<br />
For Python, see:<br />
https://github.com/brentp/cruzdb/blob/master/cruzdb/__init__.py#L489<br />
<br />
For Ruby, see:<br />
https://github.com/misshie/bioruby-ucsc-api/blob/1915b710a9064209dcffd8eef39bd548ad199fc6/lib/bio-ucsc/ucsc_bin.rb<br />
<br />
[[Category:Technical FAQ]]</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Chains_Nets&diff=25714Chains Nets2020-10-18T20:39:28Z<p>Max: /* History */</p>
<hr />
<div>Chains and nets are higher-level collections of basic pairwise sequence alignments. Cross-species nets are used to make a single-coverage (on the reference genome) collection of pairwise alignments that are the bases of our Multiz multi-species alignments in the Conservation track. The chain and net algorithms, as well as results from human-mouse alignments, were [[http://www.pnas.org/cgi/content/full/100/20/11484 published]] in 2002. They are generated from genomic local alignments computed by [[Blastz]] (2002-2008) or [[Lastz]] (2008-) post-processed by a series of UCSC programs, most notably axtChain, chainNet and netFilter.<br />
<br />
The contents of this page are from [[User:AngieHinrichs|Angie]]'s mental model of chains and nets and represent opinions which may be outdated or plain old incorrect. The source code, and the results that we get by running these programs on real data, are the ultimate source of truth about chains and nets. <br />
<br />
Please keep in mind that the outputs of any alignment algorithm are not the final Truth about homology between sequences. The scoring system and other parameters of any alignment algorithm are designed to produce high scores for similarities that would likely result from some model of nucleotide-level evolution; tweaking a parameter can change the results significantly. The quality and completeness of the reference assemblies also affect alignment results. That said, chains and nets are powerful constructs for identifying similarities over very large regions of the genome, and inferring chromosomal rearrangements that may have occurred as the two sequences diverged from a common ancestral sequence.<br />
<br />
== Basic definitions ==<br />
<br />
In chain and net lingo, the '''target''' is the reference genome sequence and the '''query''' is some other genome sequence. For example, if you are viewing Human-Mouse alignments in the Human genome browser, human is the target and mouse is the query.<br />
<br />
A '''gapless block''' is a base-for-base alignment between part of the target and part of the query, possibly including mismatching bases. It has the same length in bases on the target and the query. This is the output of the most primitive alignment algorithms. <br />
<br />
A '''gap''' is a link between two gapless blocks, indicating that the target or the query has sequence that should be skipped over in order to make the best-scoring alignment. In other words, the scoring penalty for skipping over one or more bases is less than the penalty for continuing to align the sequences without skipping. <br />
<br />
A '''single-sided gap''' is a gap in which sequence in either target or query must be skipped over. A plausible explanation for needing to skip over a base in the target while not skipping a base in the query is that either the target has an inserted base or the query has a deleted base. Many alignment tools produce alignments with single-sided gaps between gapless blocks. <br />
<br />
A '''double-sided gap''' skips over sequence in both target and query because the sum of penalties for mismatching bases exceeds the penalty for extending a gap across them. This is possible only when the penalty for extending a gap is less than the penalty for creating a new gap and less than the penalty for a mismatch, and when the alignment algorithm is capable of considering double-sided gaps. <br />
<br />
== Chains in a nutshell ==<br />
<br />
A '''chain''' is a sequence of non-overlapping gapless blocks, with single- or double-sided gaps between blocks. Within a chain, target and query coords are monotonically non-decreasing (i.e. always increasing or flat). Chains are constructed by the axtChain program which finds pairwise alignments with the same target and query sequence, on the same strand, that can be merged if overlapping and joined into one longer alignment with a higher score under an affine gap-scoring system (progressively decreasing penalties for longer gaps).<br />
<br />
* double-sided gaps are a new capability (blastz can't do that) that allow extremely long chains to be constructed.<br />
* not just orthologs, but paralogs too, can result in good chains. but that's useful!<br />
* chains should be symmetrical -- e.g. swap human-mouse -> mouse-human chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments. However, [[Blastz]]'s dynamic masking is asymmetrical, so in practice those results are not exactly symmetrical. Also, dynamic masking in conjunction with changed chunk sizes can cause differences in results from one run to the next.<br />
* chained blastz alignments are not single-coverage in either target or query unless some subsequent filtering (like netting) is done. <br />
* chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs).<br />
<br />
== Nets in a nutshell ==<br />
<br />
A '''net''' is a hierarchical collection of chains, with the highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, which in turn may have gaps filled in by lower-level chains and so on. <br />
<br />
* I think a chain's qName also helps to determine which level it lands in, i.e. it makes a difference whether a chain's qName is the same as the top-level chain's qName or not, because the levels have meanings associated with them -- see details page. <br />
* a net is single-coverage for target but not for query, unless it has been filtered to be single-coverage on both target and query. By convention we add "rbest" to the net filename in that case.<br />
* because it's single-coverage in the target, it's no longer symmetrical.<br />
* the netter has two outputs, one of which we usually ignore: the target-centric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target single-cov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and target coords. Reciprocal-best nets are symmetrical again. <br />
* nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level.<br />
<br />
"LiftOver chains" are actually chains extracted from nets, or chains filtered by the netting process. <br />
<br />
== History ==<br />
<br />
Chains and nets are [[User:Jimkent|Jim Kent]]'s brainchild, building on joint work with blastz author Scott Schwartz. <br />
<br />
Cross-species chains and nets used to be generated by a long manual process documented in some of our older makeDb/doc/*.txt files, but since ~2006 they have been generated by the script kent/src/hg/utils/automation/doBlastzChainNet.pl .<br />
<br />
Same-species liftOver chains use blat -fastMap as the alignment method, and are generated by kent/src/hg/utils/automation/doSameSpeciesLiftOver.pl, based on a series of scripts that [[User:Kate|Kate]] wrote in kent/src/hg/makeDb/makeLoChain/.<br />
<br />
== FAQs? ==<br />
<br />
The original publication describing chains and net, Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes (Kent et al., PNAS September 30, 2003 100 (20) 11484-11489), might be helpful for understanding the rationale behind the process.<br />
<br />
> It’s somehow opposite with the alignment file, right? For example, the psl file records info of sequences from query assembly mapping agains sequences <br />
> from target assembly.<br />
<br />
The PSL format lists query coords before target coords, but the UCSC convention is that the target is the reference genome (on which the Genome Browser displays) and the query is the 'other' (the query could be mRNA from the same species as the target/reference genome, or could be another species). The PSL format is the native output format of BLAT, which differs from BLAST in that BLAT indexes the target and scans the query while BLAST indexes the query and scans the target. Those are pretty esoteric differences, but I think they help explain why the UCSC view of target and query might be the opposite of some others.<br />
<br />
> And then collapse the chains to one.<br />
<br />
The chains are not collapsed to one; that would violate the constraint on a chain that it has monotonically increasing coordinates on the same target and query sequences in the same orientation. In the net, the two chains are retained at different levels (the primary alignment is at the top level and the secondary alignment is at a lower level). When the netChainSubset program extracts the liftOver chains from the net and full set of chains, it outputs the complete primary chain, but it outputs only a portion of the secondary chain.<br />
<br />
> I don’t understand why use the secondary alignment to fill the gap in the primary alignment.<br />
<br />
Chains and nets were designed to capture medium-to-large-scale rearrangements during evolution of species from a common ancestor: duplications, inversions, translocations. For example, in the case of an inversion, we would expect a gap in the top-level alignment to coincide with an alignment to the opposite strand of the same sequence (and similar breakpoints). With a translocation, a gap in the top-level alignment might be "filled" by an alignment to some other chromosome. When there is a duplication in the target, the target might have two (diverged) copies of the ancestral sequence that align to the same un-duplicated location in the query. In that case, the top-level chain would have a gap that is filled by an alignment to the same location on the query (single-coverage on the target, but multiple coverage on the query).<br />
<br />
> I suppose chain should reflect the true difference between two assembly. Say the contig a is actually corresponding to the primary hit region in <br />
> hg19. Here if the gap is filled as described above to generate a chain, wouldn’t that cause the gapped bases from hg19 being lifted to <br />
> a false corresponding region in contig?<br />
<br />
All bioinformatics algorithms are attempts to approximate the truth, and they fall short, especially when their assumptions and parameters are not tuned exactly right for the question at hand. If alignment parameters are overly sensitive for the actual divergence of target and query, then yes, spurious alignments will probably appear in the results. But there may be other explanations for unexpected alignments. (Assembly errors, sequencing errors, unexpected variation, some new discovery for you to make....) There is no simple answer that applies to all situations, and no substitute for trying different parameters and methods and carefully examining the results to see what works best for your particular application. It may help to make custom tracks using our bigChain format so you can examine and compare results in the Genome Browser.<br />
<br />
> I’ve recently discovered that we can use minimap2 to generate alignment file and convert to psl and use that psl to generate chain file. If I can <br />
> solve the multiple-mapping issue in the alignment file level, I don’t need to perform netChainSubset right?<br />
<br />
Perhaps -- that is up to you to determine. Best wishes for your research!<br />
<br />
Navigation: back to [[Implementation_Notes]]<br />
<br />
[[Category:Technical FAQ]]<br />
[[Category:Comparative Genomics]]</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Where_is_the_annotation_data&diff=25634Where is the annotation data2020-06-17T13:59:56Z<p>Max: </p>
<hr />
<div>Sometimes you know the name of the track, e.g. clinvarMain or est, from clicking around in the UI, and you need the name of the database table or file. <br />
This is not as straightforward as it seems.<br />
<br />
1) First you can look at the trackDb files. in kent/src/hg/makeDb/trackDb, "grep -r <trackName> *" to find the trackDb.ra file that defines your track. If it has a bigDataUrl line, then the data is in a bigBed/bigWig file referenced here. The path MUST be /gbdb/<db>/xxxxx, we have a habit of storing big files under /gbdb/<db>/bbi, but not always. Either way, use the bigBedToBed to bigBedInfo -as or similar tools for bigWig to look at the data.<br />
<br />
2) In all other cases, the data is in a database table with the same name as the track. You can use hgSql or 'mysql --no-defaults -h genome-mysql.cse.ucsc.edu -u genome -A' and then <br />
<br />
use hg19<br />
select * from clinvarMain;<br />
<br />
This either shows (A) the data itself as rows or (B) a pointer to the data file. If it shows the data itself, case 2A, you can go to Mysql, and do a "SELECT * from <tableName> limit 10" to get an idea of the data itself (appending \G if your screen is too small) or "DESCRIBE <tableName>" to get the table schema.<br />
<br />
Sometimes the table contains only a single row e.g. /gbdb/hg19/bbi/clinvar/clinvarMain.bb, this is case 2B. So this is non-MySQL data, you know where to look and can display the data with "bigBedToBed /gbdb/hg19/bbi/clinvar/clinvarMain.bb" and "bigBedInfo -as /gbdb/hg19/bbi/clinvar/clinvarMain.bb" to get the data schema.<br />
<br />
3) other cases are obscure and very very rare these days. E.g. in hg18 there are some tables that are split by chromosome. You can check with "SHOW TABLES LIKE '%tableName%'" in mysql if there is a table like "chr4_clinvarMain" (there isn't, for this example). For very few and very old tracks, the table name is just different from the track name and the table browser is your friend, or the source code or your trusted older colleagues.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Where_is_the_annotation_data&diff=25633Where is the annotation data2020-06-17T13:12:36Z<p>Max: Created page with "Sometimes you know the name of the track, e.g. clinvarMain or est, from clicking around in the UI, and you need the name of the database table or file. This is not as straigh..."</p>
<hr />
<div>Sometimes you know the name of the track, e.g. clinvarMain or est, from clicking around in the UI, and you need the name of the database table or file. <br />
This is not as straightforward as it seems.<br />
<br />
1) First you can look at the trackDb files. in kent/src/hg/makeDb/trackDb, "grep -r <trackName> *" to find the trackDb.ra file that defines your track. If it has a bigDataUrl line, then the data is in a bigBed/bigWig file referenced here. The path MUST be /gbdb/<db>/xxxxx, we have a habit of storing big files under /gbdb/<db>/bbi, but not always<br />
<br />
2) In most cases, the data is in a database table with the same name as the track. You can use hgSql or 'mysql --no-defaults -h genome-mysql.cse.ucsc.edu -u genome -A' and then <br />
<br />
use hg19<br />
select * from clinvarMain;<br />
<br />
This either shows (1) the data itself as rows or (2) a pointer to the data file. If it shows the data itself, you can go to Mysql, and do a "SELECT * from <tableName> limit 10" to get an idea or "DESCRIBE <tableName>" to get the schema.<br />
<br />
Sometimes the table contains only a single row e.g. /gbdb/hg19/bbi/clinvar/clinvarMain.bb, this is case (2). So this is non-MySQL data, you know where to look and can display the data with "bigBedToBed /gbdb/hg19/bbi/clinvar/clinvarMain.bb" and "bigBedInfo -as /gbdb/hg19/bbi/clinvar/clinvarMain.bb" to get the data schema.<br />
<br />
3) other cases are obscure and very very rare these days. E.g. in hg18 there are some tables that are split by chromosome. You can check with "SHOW TABLES LIKE '%tableName%'" in mysql if there is a table like "chr4_clinvarMain" (there isn't). For very few and very old tracks, the table name different from the track name.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Assembly_Hubs&diff=25550Assembly Hubs2020-03-09T17:12:18Z<p>Max: /* Troubleshooting BLAT servers for your hub */</p>
<hr />
<div>==Overview==<br />
<br />
The Assembly Hub function allows you to display your novel genome sequence using the UCSC Genome Browser.<br />
<br />
==Web Server==<br />
<br />
To display your novel genome sequence, use a web server at your institution (or free services like [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html#Hosting Cyverse], for usage behind a firewall you can also load them locally through [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib GBiB]) to supply your files to the UCSC Genome Browser. Note that hosting hub files on HTTP is highly recommended and much more efficient than FTP. You then establish a hierarchy of directories and files to host your novel genome sequence. For example:<br />
<pre><br />
myHub/ - directory to organize your files on this hub<br />
hub.txt – primary reference text file to define the hub, refers to:<br />
genomes.txt – definitions for each genome assembly on this hub<br />
newOrg1/ - directory of files for this specific genome assembly<br />
newOrg1.2bit – ‘2bit’ file constructed from your fasta sequence<br />
description.html – information about this assembly for users<br />
trackDb.txt – definitions for tracks on this genome assembly<br />
groups.txt – definitions for track groups on this assembly<br />
bigWig and bigBed files – data for tracks on this assembly<br />
external track hub data tracks can be displayed on this assembly<br />
</pre><br />
<br />
The URL to reference this hub would be: <nowiki>http://yourLab.yourInstitution.edu/myHub/hub.txt</nowiki><br />
<br />
<b>Note:</b> there is now a <code>useOneFile on</code> hub setting that allows the hub properties to be specified in a single file. More information about this setting can be found on the [https://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#UseOneFile Genome Browser User Guide].<br />
<br />
You can view a working example hierarchy of files at:<br />
[http://genome-test.gi.ucsc.edu/~hiram/hubs/Plants/ Plants]<br><br />
A smaller slice of this hub is represented in a [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Quick Start Guide to Assembly Hubs].<br />
<br />
== Linking to your assembly hub ==<br />
<br />
You can build direct links to the genome(s) in your assembly hub:<br />
<br />
* The hub connect page<br />
** http://genome.ucsc.edu/cgi-bin/hgHubConnect?hgHub_do_redirect=on&hgHubConnect.remakeTrackHub=on&hgHub_do_firstDb=1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt<br />
* The genome gateway page<br />
** http://genome.ucsc.edu/cgi-bin/hgGateway?genome=araTha1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt<br />
* Directly to the genome browser<br />
** http://genome.ucsc.edu/cgi-bin/hgTracks?genome=araTha1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt<br />
<br />
==hub.txt==<br />
<br />
The initial file [http://genome-test.gi.ucsc.edu/~hiram/hubs/Plants/hub.txt hub.txt] is the primary URL reference for your assembly hub. The format of the file:<br />
<pre><br />
hub hubName<br />
shortLabel genome<br />
longLabel Comment describing this hub contents<br />
genomesFile genomes.txt<br />
email contactEmail@institution.edu<br />
descriptionUrl aboutHub.html<br />
</pre><br />
<br />
The ''shortLabel'' is the name that will appear in the ''genome'' pull-down menu at the UCSC gateway page. Example: '''Plants'''<br />
<br />
The ''genomesFile'' is a reference to the next definition file in this chain that will describe the assemblies and tracks available<br />
at this hub. Typically '''genomes.txt''' is at the same directory level as this '''hub.txt''', however it can also be a relative path<br />
reference to a different directory level.<br />
<br />
The ''email'' address provides users a contact point for queries related to this assembly hub.<br />
<br />
The ''descriptionUrl'' provides a relative path or URL link to a webpage describing the overall hub.<br />
<br />
==genomes.txt==<br />
The [http://genome-test.gi.ucsc.edu/~hiram/hubs/Plants/genomes.txt genomes.txt] file provides the references to the<br />
genome assemblies and tracks available at this assembly hub. The example file indicates the typical contents:<br />
<pre><br />
genome ricCom1<br />
trackDb ricCom1/trackDb.txt<br />
groups ricCom1/groups.txt<br />
description July 2011 Castor bean<br />
twoBitPath ricCom1/ricCom1.2bit<br />
organism Ricinus communis<br />
defaultPos EQ973772:1000000-2000000<br />
orderKey 4800<br />
scientificName Ricinus communis<br />
htmlPath ricCom1/description.html<br />
</pre><br />
<br />
There can be multiple assembly definitions in this single file. Separate these stanzas with blank lines. The references to other files are relative path references. In this example there is a sub-directory here called '''ricCom1''' which contains the files for this specific assembly.<br />
<br />
* The ''genome'' name is the equivalent to the UCSC database name. The genome browser displays this database name in title pages in the genome browser.<br />
* The ''trackDb'' refers to a file which defines the tracks to place on this genome assembly. The format of this file is described in the [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hub] help reference documentation.<br />
* The ''groups'' refers to a file which defines the track groups on this genome browser. Track groups are the sections of related tracks grouped together under the primary genome browser graphics display image.<br />
* The ''description'' will be displayed for user information on the gateway page and most title pages of this genome assembly browser. It is the name displayed in the ''assembly'' pull-down menu on the browser gateway page.<br />
* The ''twoBitPath'' refers to the '''.2bit''' file containing the sequence for this assembly. Typically this file is constructed from the original fasta files for the sequence using the kent program '''faToTwoBit'''. This line can also point to a URL, for example, if you are duplicating an existing Assembly Hub, you can use the original hub's 2bit file's URL location here.<br />
* The ''organism'' string is displayed along with the ''description'' on most title pages in the genome browser. Adjust your names in ''organism'' and ''description'' until they are appropriate. This example is very close to what the genome browser normally displays. This ''organism'' name is the name that appears in the ''genome'' pull-down menu on the browser gateway page.<br />
* The ''defaultPos'' specifies the default position the genome browser will open when a user first views this assembly. This is usually selected to highlight a popular gene or region of interest in the genome assembly.<br />
* The ''orderKey'' is used with other genome definitions at this hub to order the pull-down menu ordering the ''genome'' pull-down menu.<br />
* The ''htmlPath'' refers to an html file that is used on the gateway page to display information about the assembly.<br />
<br />
Note that it is strongly encouraged to give each of your genomes stanza's a line for ''defaultPos, scientificName, organism, description'' (along with other above settings) so that when your hub is attached it will load a specified default location and have text to be more easily searched from the Gateway page.<br />
<br />
==2bit file==<br />
The '''.2bit''' file is constructed from the fasta sequence for the assembly. The ''kent'' source program ''faToTwoBit'' is used to construct this file. Download the program from the [http://hgdownload.soe.ucsc.edu/admin/exe/ downloads] section of the Browser. <br />
For example:<br />
faToTwoBit ricCom1.fa ricCom1.2bit<br />
<br />
Use the ''twoBitInfo'' to verify the sequences in this assembly and create a ''chrom.sizes'' file which is not used in the hub, but is useful in later processing to<br />
construct the ''big*'' files:<br />
twoBitInfo ricCom1.2bit stdout | sort -k2rn > ricCom1.chrom.sizes<br />
<br />
The ''.2bit'' commands can function with the ''.2bit'' file at a URL:<br />
twoBitInfo -udcDir=. <nowiki>http://genome-test.gi.ucsc.edu/~hiram/hubs/Plants/ricCom1/ricCom1.2bit</nowiki> stdout | sort -k2nr > ricCom1.chrom.sizes<br />
<br />
Sequence can be extracted from the ''.2bit'' file with the ''twoBitToFa'' command, for example:<br />
twoBitToFa -seq=chrCp -udcDir=. <nowiki>http://genome-test.gi.ucsc.edu/~hiram/hubs/Plants/ricCom1/ricCom1.2bit</nowiki> stdout > ricCom1.chrCp.fa<br />
<br />
==groups.txt==<br />
<br />
The [http://genome-test.gi.ucsc.edu/~hiram/hubs/Plants/ricCom1/groups.txt groups.txt] file defines the grouping of track controls under the primary genome browser image display. The example referenced here has the usual definitions as found in the UCSC Genome Browser.<br />
<br />
Each group is defined, for example the '''Mapping''' group:<br />
<pre><br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
</pre><br />
<br />
* The ''name'' is used in the ''trackDb.txt'' track definition ''group'', to assign a particular track to this group.<br />
* The ''label'' is displayed on the genome browser as the title of this group of track controls<br />
* The ''priority'' orders this track group with the other track groups<br />
* The ''defaultIsClosed'' determines if this track group is expanded or closed by default. Values to use are '''0''' or '''1'''<br />
<br />
==Building Tracks==<br />
<br />
Tracks are defined in the '''trackDb.txt''' where each stanza describes how tracks are displayed (shortLabel/longLabel/color/visibility) and other information such as what group the track should belong to (referencing the groups.txt) and if any additional html should display when one clicks into the track or a track item:<br />
<br />
track gap_<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl bbi/ricCom1.gap.bb<br />
type bigBed 4<br />
group map<br />
html ../trackDescriptions/gap<br />
<br />
For more informations about the syntax of the '''trackDb.txt''' file, use [https://genome.ucsc.edu/goldenpath/help/trackDb/trackDbHub.html UCSC's Hub Track Database Definition page]<br />
<br />
It helps to have a cluster super computer to process the genomes to construct tracks. It can be done for small genomes<br />
on single computers that have multiple cores. The process for each track is unique. Please note the continuing<br />
document: [[Browser Track Construction]] for a discussion of constructing tracks for your assembly hub.<br />
<br />
===Cytoband Track===<br />
<br />
Assembly hubs can have a Cytoband track that can allow for quicker navigation of individual chromosomes and display banding pattern information if known. <br />
<br />
A quick version of the track can be built using the existing chrom.sizes files for your assembly<br />
(the banding options include <tt>gneg, gpos25, gpos50, gpos75, gpos100, acen, gvar, or stalk</tt>).<br />
<br />
<tt>cat araTha1.chrom.sizes | sort -k1,1 -k2,2n | awk '{print $1,0,$2,$1,"gneg"}' > cytoBandIdeo.bed</tt><br />
<br />
The resulting bed file can be turned into a big bed and given a .as file (example [http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/cytoBand.as here])<br />
to inform the browser it is not a normal bed.<br />
<br />
<tt> bedToBigBed -type=bed4 cytoBandIdeo.bed -as=cytoBand.as araTha1.chrom.sizes cytoBandIdeo.bigBed</tt><br />
<br />
In the trackDb, as long as the track is named cytoBandIdeo (<tt>track cytoBandIdeo</tt><br />
[http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/trackDb.txt example])<br />
it will load in the assembly hub.<br />
<br />
==Assembly Hub Resources==<br />
There are resources for automatically building assembly hubs available from [http://gonramp.wustl.edu/ G-OnRamp] and [https://github.com/Gaius-Augustus/MakeHub MakeHub]. <br />
<br />
There is also a collection of Example NCBI assembly hubs that are already working and can either be used or copied as<br />
a template to build further hubs. <br />
<br />
===G-OnRamp===<br />
G-OnRamp is a Galaxy workflow that turns a genome assembly and RNA-Seq data into a Genome Browser with multiple <br />
evidence tracks. Because G-OnRamp is based on the Galaxy platform, developing some familiarity with the key concepts<br />
and functionalities of Galaxy would be beneficial prior to using G-OnRamp. Here is a link to their <br />
[http://gonramp.wustl.edu/?page_id=32 instruction page] that gives an overview of their process.<br />
<br />
===MakeHub===<br />
MakeHub is a command line tool for the fully automatic generation of track data hubs for visualizing genomes with the UCSC genome browser.<br />
https://github.com/Gaius-Augustus/MakeHub<br />
<br />
===Example NCBI assembly hubs===<br />
<br />
There are a collection of assembly hubs built by an automatic script that can be viewed on our development server (links default to the '''genome-test site''') or if the link to the hub.txt is copied and pasted, it can be manually changed to load on the public site. <br />
<br />
The following table provides links pages to launch various assembly hubs grouped by species subset, where if you scroll down on the page you will find rows for each assembly hub (or groups of further assembly hubs for the bacteria page) allowing one to load individual assemblies by clicking the "common name" hyperlink such as "African bush elephant" on the Vertebrate Mammalian page (please note the statistics in this table below may change as more hubs are added in the future). <br />
<br />
<table class="sortable" border="1"><tr><br />
<th>species<br>subset</th><br />
<th>number&nbsp;of<br>species</th><br />
<th>number&nbsp;of<br>assemblies</th><br />
<th>total&nbsp;contig<br>count</th><br />
<th>total&nbsp;nucleotide<br>count</th><br />
<th>average<br>contig&nbsp;size</th><br />
<th>average<br>assembly&nbsp;size</th><br />
</tr><tr><td align=right>[http://genome-test.gi.ucsc.edu/gbdb/hubs/genbank/vertebrate_other/vertebrate_other.ncbi.html non-Mammalian other Vertebrate assembly hub]</td><br />
<td align=right>156</td><br />
<td align=right>172</td><br />
<td align=right>18,548,615</td><br />
<td align=right>193,684,015,605</td><br />
<td align=right>10,441</td><br />
<td align=right>1,126,069,858</td><br />
</tr><tr><td align=right> [http://genome-test.gi.ucsc.edu/~hiram/hubs/genbank/vertebrate_mammalian/vertebrate_mammalian.ncbi.html Vertebrate Mammalian assembly hub]</td><br />
<td align=right>118</td><br />
<td align=right>204</td><br />
<td align=right>30,643,657</td><br />
<td align=right>498,264,459,566</td><br />
<td align=right>16,259</td><br />
<td align=right>2,442,472,841</td><br />
</tr><tr><td align=right>[http://genome-test.gi.ucsc.edu/gbdb/hubs/genbank/plant/plant.ncbi.html Plant assembly hub]</td><br />
<td align=right>190</td><br />
<td align=right>269</td><br />
<td align=right>34,577,423</td><br />
<td align=right>145,341,422,954</td><br />
<td align=right>4,203</td><br />
<td align=right>540,302,687</td><br />
</tr><tr><td align=right>[http://genome-test.gi.ucsc.edu/gbdb/hubs/genbank/protozoa/protozoa.ncbi.html Protozoa assembly hub]</td><br />
<td align=right>282</td><br />
<td align=right>338</td><br />
<td align=right>3,939,128</td><br />
<td align=right>16,816,724,183</td><br />
<td align=right>4,269</td><br />
<td align=right>49,753,621</td><br />
</tr><tr><td align=right>[http://genome-test.gi.ucsc.edu/gbdb/hubs/genbank/invertebrate/invertebrate.ncbi.html Invertebrates assembly hub]</td><br />
<td align=right>392</td><br />
<td align=right>492</td><br />
<td align=right>32,264,511</td><br />
<td align=right>170,439,035,382</td><br />
<td align=right>5,282</td><br />
<td align=right>346,420,803</td><br />
</tr><tr><td align=right>[http://genome-test.gi.ucsc.edu/gbdb/hubs/genbank/fungi/fungi.ncbi.html Fungi assembly hub]</td><br />
<td align=right>1,106</td><br />
<td align=right>1,215</td><br />
<td align=right>4,143,097</td><br />
<td align=right>38,677,096,556</td><br />
<td align=right>9,335</td><br />
<td align=right>31,833,001</td><br />
</tr><tr><td align=right>[http://genome-test.gi.ucsc.edu/gbdb/hubs/genbank/archaea/archaea.ncbi.html Archaea assembly hub]</td><br />
<td align=right>688</td><br />
<td align=right>742</td><br />
<td align=right>57,569</td><br />
<td align=right>2,010,246,046</td><br />
<td align=right>34,918</td><br />
<td align=right>2,709,226</td><br />
</tr><tr><td align=right>[http://genome-test.gi.ucsc.edu/gbdb/hubs/genbank/bacteria/bacteria.ncbi.html Bacteria assembly hub]</td><br />
<td align=right>34,005</td><br />
<td align=right>58,658</td><br />
<td align=right>8,397,216</td><br />
<td align=right>234,147,691,500</td><br />
<td align=right>27,883</td><br />
<td align=right>3,991,743</td><br />
</tr><br />
</table><br />
<br />
<br />
These assemblies use the '''NCBI accession naming patterns''' on chromosomes. '''Please note his is a prototype work in progress.''' Not all assemblies are represented here yet. Prototype gene tracks from the NCBI gene predictions delivered with the assembly are available on a few assemblies. There are no blat servers on these assemblies. Users could copy the hub skeleton structure of a specific assembly to local systems and run a blat server at their location with their own assembly hub of that specific genome, brief instructions exist on each assembly gateway page under the "Download files for this assembly hub:" section.<br />
<br />
====Example loading African bush elephant assembly hub and looking at the related genomes.txt and trackDb.txt====<br />
Here are some quick steps to load an example hub from this collection, and an attempt to explain how to look at the files behind the hub.<br />
# Click the above [http://genome-test.gi.ucsc.edu/~hiram/hubs/genbank/vertebrate_mammalian/vertebrate_mammalian.ncbi.html Vertebrate Mammalian assembly hub] link.<br />
# Scroll down and find the "common name" column and click the hyperlink for '''"African bush elephant"''' after looking at the other information on that row.<br />
# Note that you have arrived a gateway page that has ''"African bush elephant Genome Browser - GCA_000001905.1_Loxafr3.0 assembly"'' displayed, where you can see a '''"Download files for this assembly hub:"'''' section if you desired to access these specific files and notably a http://genome-test.gi.ucsc.edu/gbdb/hubs/genbank/vertebrate_mammalian/GCA_000001905.1_Loxafr3.0/ link. <br />
# Click "Go" or the top "Genome Browser" blue bar menu to arrive at viewing this assembly hub (note this is on our '''genome-test site''').<br />
# To load this hub on our public site, at the earlier step you can copy the hyperlink for '''"African bush elephant"''' and paste it in a browser and change the very first "http://genome-test.gi.ucsc.edu/cgi-bin/..." to "http://genome.ucsc.edu/cgi-bin/..." instead.<br />
<br />
Now to investigate the files behind the hub to understand the process involved.<br />
# Click the [http://genome-test.gi.ucsc.edu/gbdb/hubs/genbank/vertebrate_mammalian/GCA_000001905.1_Loxafr3.0/ http://genome-test.gi.ucsc.edu/gbdb/hubs/genbank/vertebrate_mammalian/GCA_000001905.1_Loxafr3.0/] link found in the ""Download files for this assembly hub:" section on a loaded assembly hub's gateway page.<br />
# Note the "GCA_000001905.1_Loxafr3.0.ncbi.2bit" file, this is the binary indexed remote file that is allowing the Browser to display this genome.<br />
# Find the "GCA_000001905.1_Loxafr3.0.genomes.ncbi.txt" file and click the [http://genome-test.gi.ucsc.edu/gbdb/hubs/genbank/vertebrate_mammalian/GCA_000001905.1_Loxafr3.0/GCA_000001905.1_Loxafr3.0.genomes.ncbi.txt link] to look at it.<br />
# Review this genomes.txt file, which be used if copied in a new hub to show where the to find the above 2bit on the "twoBitPath" line and also defines where to find all track database to display data on this genome in the "trackDb " line (the [http://genome-test.gi.ucsc.edu/gbdb/hubs/genbank/vertebrate_mammalian/genomes.ncbi.txt real genomes.txt] for this massive hub is up one directory as this hub has 204 assemblies -where you will find this stanza included). Note how the genomes.txt has the "organism" and "scientificName" lines that help annotate how to display this assembly hub, and "groups" line that points to a further file that helps define the grouping of tracks that will be fully described in the trackDb.txt.<br />
# From the earlier link to all the files, click the [http://genome-test.gi.ucsc.edu/gbdb/hubs/genbank/vertebrate_mammalian/GCA_000001905.1_Loxafr3.0/GCA_000001905.1_Loxafr3.0.trackDb.ncbi.txt GCA_000001905.1_Loxafr3.0.trackDb.ncib.txt]<br />
# Review this trackDb.txt file which defines the tracks to display on this hub, and also has "bigDataUrl" lines to tell the Browser where to find the data to display for each track, as well as other features such on some tracks as "searchIndex" and "searchTrix" lines to help support finding data in the hub and "url" and "urlLabel" lines on some tracks to help create links out on items in the hub to other external resources and "html" lines to a file that will have information to display about the data for users who click into tracks.<br />
<br />
==Adding BLAT servers==<br />
<br />
By running your own blat server with [http://genomewiki.gi.ucsc.edu/index.php/Running_your_own_gfServer gfServer] you can add lines to the [http://genomewiki.ucsc.edu/index.php?title=Assembly_Hubs#genomes.txt genomes.txt] file of your assembly hub to enable the browser to access the server and activate blat searches.<br><br />
* First run two instances of gfServer from http://yourLab.yourInstitution.edu at the location of yourAssembly.2bit file, specifying a port that the gfServer will be accessible from for amino acid (<code>-trans</code> option) and DNA searches. Please note the <code>-mask</code> option will ignore all lower-case assembly sequence, which is the convention the UCSC Browser uses for masked sequence, so you may not want to include it from the example below. <br />
<br />
<br />
:Selecting a port:<br />
::When picking a port number, stick with numbers between 1024 and 49151. Anything less than 1024 is considered a system port and you'll need to be root in order to open it. Anything above 49151 is considered dynamic and randomly assigned. If you're starting a server that you will use a web browser to connect to, it is suggested to choose something with 8's in it, since that's the tradition. 8080, 8000, and 8888 are all popular, but other open ports will work just fine.<br />
<br />
<br />
For example, these two lines will specify port 17777 for amino acid searches and 17779 for DNA searches and are run from the publicly accessibly directory location of yourAssembly.2bit file: <br />
<br />
<pre><br />
gfServer start localhost 17777 -trans -mask yourAssembly.2bit &<br />
gfServer start localhost 17779 -stepSize=5 yourAssembly.2bit &<br />
</pre><br />
<br />
* Next edit your genomes.txt stanza that references yourAssembly to have two lines to inform the browser of where the blat servers are located and what ports to use. See an example of commented out lines [http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/genomes.txt here]. Please note the capital "B" in transBlat.<br />
<pre><br />
transBlat yourLab.yourInstitution.edu 17777<br />
blat yourLab.yourInstitution.edu 17779<br />
</pre><br />
<br />
* You should now be able to load and perform blat operations on your assembly. For example a URL such as the following would bring up the blat CGI and have your assembly listed at the bottom of the "Genome:" drop-down menu: http://genome.ucsc.edu/cgi-bin/hgBlat?hubUrl=http://yourLab.yourInstitution.edu/myHub/hub.txt<br />
<br />
* Some institutions have firewalls that will prevent the browser from sending multiple inquiries to your blat servers, in which case you may need to request your admins add this IP range as exceptions that are not limited: <code>128.114.119.*</code> That will cover the U.S. [http://genome.ucsc.edu genome.ucsc.edu] site. In case you may wish the requests to work from our European Mirror [http://genome-euro.ucsc.edu genome-euro.ucsc.edu] site, you would want to include <code>129.70.40.120</code> also to the exception list.<br />
<br />
Please see more about [https://genome.ucsc.edu/FAQ/FAQblat.html#blat5 configuring your blat gfServer here] to replicate the UCSC Browser's settings. The [http://hgdownload.gi.ucsc.edu/downloads.html#source_downloads Source Downloads] page offers access to utilities with pre-compiled binaries such as gfserver found in a blat/ directory for your machine type [http://hgdownload.gi.ucsc.edu/admin/exe/ here] and further blat documentation [http://genome.ucsc.edu/goldenPath/help/blatSpec.html here], and the gfServer usage statement for further options.<br />
<br />
Please also know you can set up gfservers on a [http://genome.ucsc.edu/goldenpath/help/gbib.html GBiB] and run it locally. Please see this [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib GBiB assembly blat step-by-step set up] page for details.<br />
<br />
Note: You can stop your instance of gfServer with a command. For example:<br />
<pre><br />
gfServer stop localhost 17860 <br />
</pre><br />
<br />
===Troubleshooting BLAT servers for your hub===<br />
<br />
You can see this error if you have the translatedBlat / nucleotideBlat port numbers the wrong way around:<br />
<br />
Expecting 6 words from server got 2<br />
<br />
The following is an example of an error message<br> <br />
when attempting to run a DNA sequence query <br> <br />
via the web-based BLAT tool after loading a hub,<br> <br />
after starting a gfServer instance (from the same dir as the .2bit file). <br> <br />
For example, a command to start an instance of gfServer:<br> <br />
<pre><br />
$ gfServer start localhost 17779 -stepSize=5 contigsRenamed.2bit &<br />
</pre><br />
<br />
Example of a possible error message, from web-based BLAT <br><br />
after attempting a web-based BLAT query:<br />
<br />
<pre><br />
Error in TCP non-blocking connect() 111 - Connection refused<br />
Operation now in progress<br />
Sorry, the BLAT/iPCR server seems to be down. Please try again later.<br />
</pre><br />
<br />
Check the following:<br />
<br />
====Process check====<br />
First, make sure your gfServer instance is running. <br><br />
Type the following command to check for your running gfServer process:<br />
<br />
<pre><br />
$ ps aux | grep gfServer<br />
</pre><br />
<br />
====Check for correct path/filename====<br />
In your genomes.txt file, does your twoBitPath/filename match what you specified in your command to start gfServer?<br />
<br />
:In your genomes.txt file, is the location of the instance to your gfServer correct?<br />
::To check this, you can cd into the directory where you started your gfServer, then type the command:<br />
<br />
<pre><br />
$ hostname -i<br />
</pre><br />
<br />
Your result should be an IP address, for example, "132.249.245.79".<br />
<br />
Now you can test the connection to your port that you specified, with a simple telnet command.<br />
:Type in the following command: ''telnet yourIP yourPort''. For example:<br />
<pre><br />
$ telnet 132.249.245.79 17777<br />
</pre><br />
::The results should read, "Connected to 132.249.245.79".<br />
::Otherwise, if gfServer isn't running or if you typed the wrong location in your telnet command, telnet will say, "Connection refused."<br />
::In this example, check your genomes.txt file, and make sure your blat line reads, "blat 132.249.245.79 17777".<br />
::You may need to change your genomes.txt file from, for example, "blat ''localhost'' 17777" to "blat ''132.249.245.79'' 17777" (use your specific IP/host name where gfServer is running).<br />
<br />
====Check "gfServer status" check====<br />
In the directory of your .2bit flle (should be the same dir where you started gfServer), type: ''gfServer status yourLocation yourPort'' . <br><br />
For example: <br><br />
<br />
<pre><br />
$ gfServer status 132.249.245.79 17777<br />
</pre><br />
You should see output like this:<br />
<pre><br />
version 36x2<br />
type nucleotide<br />
host localhost<br />
port 17777<br />
tileSize 11<br />
stepSize 5<br />
minMatch 2<br />
pcr requests 0<br />
blat requests 0<br />
bases 0<br />
misses 0<br />
noSig 1<br />
trimmed 0<br />
warnings 0<br />
</pre><br />
<br />
====Testing with gfClient====<br />
The best troubleshooting test is to take the webpage out of the equation, and use the command line utility, [http://hgdownload.soe.ucsc.edu/admin/exe/ gfClient], to run the query on your instance of gfServer. If you can successfully connect gfClient to gfServer, you will know that your location and port specification are correct. <br />
<br />
:From the directory that holds your hub's .2bit file (should be the same directory where your instance of gfServer was launched), perform a query using gfClient:<br />
::You can type "gfClient" on your command line to see the usage statement.<br />
::Use the following command: ''gfClient yourLocation yourPort pathOf2bitFile yourFastaQuery.fa nameOfOutputFile.psl'' <br><br />
::FYI: For testing with gfClient, you only need the gfServer binary on your server, not blat.<br />
<br />
'''For example:'''<br />
<pre><br />
$ gfClient localhost 17777 . query.fa gfOutput.psl<br />
</pre><br />
::Note the " . " after the port, to specify that the query will use the .2bit file in the current directory.<br />
::After running this command, take a look at the gfOutput.psl file. If successful, you will see BLAT results.<br />
<br />
'''Another example:'''<br />
* Note: In the example below, "yourLab.yourInstitution.edu" is the name of their machine where you run the gfServer command.<br />
<br />
From the test machine: ''Test the DNA alignment'', where test.fa is some sequence to find:<br />
<pre><br />
gfClient yourLab.yourInstitution.edu 17779 `pwd` test.fa dnaTestOut.psl<br />
</pre><br />
<br />
From the test machine: ''Test the protein alignment'', where proteinSequence.fa is the sequence to find:<br />
<pre><br />
gfClient -t=dnax -q=prot yourLab.yourInstitution.edu 17777 `pwd` proteinSequence.fa proteinOutput.psl<br />
</pre><br />
* NOTE: the ourAssembly.2bit file needs to be on this test machine also.<br />
* The `pwd` says to find the ourAssembly.2bit file in this directory.<br />
<br />
===RAM requirements for BLAT servers===<br />
<br />
The gfServers that provide responses for blat queries can take some amount of memory.<br />
Here is some information that might help in approximating the required amount for genomes of different sizes.<br />
<br />
::The human hg19 genome requires ~2.2GB for the translated amino acid gfServer queries<br />
::and ~2.2GB for the untranslated DNA gfServer queries representing ~3,137,161,000 bp. <br />
<br />
::The zebrafish danRer7 genome requires ~1.2GB for the translated amino acid gfServer queries<br />
::and ~1.1GB for the untranslated DNA gfServer queries representing ~1,412,465,000 bp. <br />
<br />
::The D. melanogaster dm6 genome requires ~300MB for the translated amino acid gfServer queries<br />
::and ~250MB for the untranslated DNA gfServer queries representing ~143,726,000 bp.<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
[[Category:Technical FAQ]]<br />
[[Category:Assembly/Track Hubs]]</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Cloud-storage_providers_and_byte-range_requests_of_UCSC_big*_files&diff=25213Cloud-storage providers and byte-range requests of UCSC big* files2019-02-28T09:48:02Z<p>Max: </p>
<hr />
<div>This page explains why Cloud-backup providers are not good for storing UCSC track hubs or bigBed/bigWig/bam custom tracks. I'm not 100% sure of everything below, as I'm not an expert on storage either, feel free to check or correct.<br />
<br />
Commercial data centers are pretty big: https://www.youtube.com/watch?v=XZmGGAbHqa0 They use thousands of servers.<br />
<br />
Cloud storage providers store data over these thousands of servers using distributed blob storage. Most of these systems are proprietary, like Amazon S3 of Azure Storage or Google Cloud Storage. A famous open source version of such a system is CEPH, developed for his PhD at UCSC in 2007 by Sage Weil, I believe in Eng2 on floor 4. https://en.wikipedia.org/wiki/Ceph_(software) Sage is now rich and works for Redhat. Ceph is one of the storage options, I believe, in the IRODS system which is used by the acedemic project Cyverse, which uses not a single data center, but spreads its data servers across various US campuses.<br />
<br />
You can see that distributed storage servers use thousands of cheap servers in a data center which is quite different from a webserver like our RR or hgwdev. Each server is connected to many cheap-ish spinning disks. Each incoming file is split into 64MB chunks (this number can vary, e.g. 8MB or 1MB). Each server stores a certain number of chunks. Each file is stored multiple times, typically three times, to make up for breaking hard disks. Humans on roller skates (?) replace broken hard disks. After a replacement, the software then restores the chunks from the other two copies. I believe that many storage servers have two network connections, one for data balancing/replacement and one for serving data to the outside world. Storage servers are "dumb", in the way that they only store pieces of data and they do nothing else. Other servers send them the identifier of the piece and they send back the piece.<br />
<br />
A second type of server, meta data servers, store the file names and on which servers the chunks are stored. The meta data servers are the weak spot of the system, so there are much more than three copies and they probably use SSDs or RAM, not spinning disks. The meta data servers store the file names, the access rights, where the files are stored, the owner, file size and a lot of other meta data. <br />
<br />
There also are internet-facing servers, they act as the gateway between the internal servers and the outside internet.<br />
<br />
When a request for a file comes in from a client on the internet, the gateway asks the meta data servers where the files are stored. The meta data servers try to pick storage servers that are close to the client, not used too much and that have a working hard disk. The meta data servers also count usage and block the request is a file has been requested too often (it may be an illegal video and it's a backup provider, not a webspace provider). This also cost, as the cloud provider has to pay per TB transferred. <br />
<br />
As an additional layer of security, Backup cloud storage providers make sure that the client is not an internet browser trying to stream a cat video or MP3, so instead of sending the file as it is, it replies with an HTTP redirect to the client or shows an HTML page where the user must do something to get the file. All links to the file include a secure token that is only valid for a few minutes and they include the location of the storage servers, so this is typically a very long link.<br />
<br />
Backup cloud providers have no interest to fulfill video or audio streaming, so most do not support byte range requests. Those that do do it only for paid accounts or fulfill them relatively slowly, or have added the feature only recently and only for some customers (e.g. box.com).<br />
<br />
Overall, one can see why backup storage providers cannot be as fast as a normal webserver. Every request has to go through at least one or two redirection layers to find the right storage server, do the authentication, look up if the data is cached somewhere, etc. Chunks have to found across servers and put together. The system is intentionally not built for speed but for low cost of storage and uses redirects and tokens to protect against abuse. <br />
<br />
We could make our UDC work with most of these systems, if we tolerate a single redirect, which we currently don't do. Making our UDC work would mean, however, that it impacts the performance of all of hgTracks, as currently the slowest track holds up the whole display. Also, potentially, we may get blocked by some of these providers if we issue many requests to them.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Cloud-storage_providers_and_byte-range_requests_of_UCSC_big*_files&diff=25211Cloud-storage providers and byte-range requests of UCSC big* files2019-02-26T14:32:23Z<p>Max: </p>
<hr />
<div>This page explains why Cloud-backup providers are not good for storing UCSC track hubs or bigBed/bigWig/bam custom tracks. I'm not 100% sure of everything below, as I'm not an expert on storage either, feel free to check or correct.<br />
<br />
Commercial data centers are pretty big: https://www.youtube.com/watch?v=XZmGGAbHqa0 They use thousands of servers.<br />
<br />
Cloud storage providers store data over these thousands of servers using distributed blob storage. Most of these systems are proprietary, like Amazon S3 of Azure Storage or Google Cloud Storage. A famous open source version of such a system is CEPH, developed for his PhD at UCSC in 2007 by Sage Weil, I believe in Eng2 on floor 4. https://en.wikipedia.org/wiki/Ceph_(software) Sage is now rich and works for Redhat. Ceph is one of the storage options, I believe, in the IRODS system which is used by the acedemic project Cyverse, which uses not a single data center, but spreads its data servers across various US campuses.<br />
<br />
You can see that distributed storage servers use thousands of cheap servers in a data center which is quite different from a webserver like our RR or hgwdev. Each server is connected to many cheap-ish spinning disks. Each incoming file is split into 64MB chunks (this number can vary, e.g. 8MB or 1MB). Each server stores a certain number of chunks. Each file is stored multiple times, typically three times, to make up for breaking hard disks. Humans on roller skates (?) replace broken hard disks. After a replacement, the software then restores the chunks from the other two copies. I believe that many storage servers have two network connections, one for data balancing/replacement and one for serving data to the outside world.<br />
<br />
A second type of server, meta data servers, store the file names and on which servers the chunks are stored. The meta data servers are the weak spot of the system, so there are much more than three copies and they probably use SSDs or RAM, not spinning disks. The meta data servers store the file names, the access rights, where the files are stored, the owner, file size and a lot of other meta data. <br />
<br />
There also are internet-facing servers, they act as the gateway between the internal servers and the outside internet.<br />
<br />
When a request for a file comes in from a client on the internet, the gateway asks the meta data servers where the files are stored. The meta data servers try to pick storage servers that are close to the client, not used too much and that have a working hard disk. The meta data servers also count usage and block the request is a file has been requested too often (it may be an illegal video and it's a backup provider, not a webspace provider). This also cost, as the cloud provider has to pay per TB transferred. <br />
<br />
As an additional layer of security, Backup cloud storage providers make sure that the client is not an internet browser trying to stream a cat video or MP3, so instead of sending the file as it is, it replies with an HTTP redirect to the client or shows an HTML page where the user must do something to get the file. All links to the file include a secure token that is only valid for a few minutes and they include the location of the storage servers, so this is typically a very long link.<br />
<br />
Backup cloud providers have no interest to fulfill video or audio streaming, so most do not support byte range requests. Those that do do it only for paid accounts or fulfill them relatively slowly, or have added the feature only recently and only for some customers (e.g. box.com).<br />
<br />
Overall, one can see why backup storage providers cannot be as fast as a normal webserver. Every request has to go through at least one or two redirection layers to find the right storage server, do the authentication, look up if the data is cached somewhere, etc. Chunks have to found across servers and put together. The system is intentionally not built for speed but for low cost of storage and uses redirects and tokens to protect against abuse. <br />
<br />
We could make our UDC work with most of these systems, if we tolerate a single redirect, which we currently don't do. Making our UDC work would mean, however, that it impacts the performance of all of hgTracks, as currently the slowest track holds up the whole display. Also, potentially, we may get blocked by some of these providers if we issue many requests to them.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Cloud-storage_providers_and_byte-range_requests_of_UCSC_big*_files&diff=25210Cloud-storage providers and byte-range requests of UCSC big* files2019-02-26T14:23:10Z<p>Max: Created page with "This page explains why Cloud-backup providers are not good for storing UCSC track hubs or bigBed/bigWig/bam custom tracks. I'm not 100% sure of everything below, as I'm not an..."</p>
<hr />
<div>This page explains why Cloud-backup providers are not good for storing UCSC track hubs or bigBed/bigWig/bam custom tracks. I'm not 100% sure of everything below, as I'm not an expert on storage either, feel free to check or correct.<br />
<br />
Cloud storage providers use distributed blob storage on normal servers. Most of these systems are proprietary, like Amazon S3 of Azure Storage or Google Cloud Storage. A famous open source version of such a system is CEPH, developed for his PhD at UCSC in 2007 by Sage Weil, I believe in Eng2 on floor 4. https://en.wikipedia.org/wiki/Ceph_(software) Sage is now rich and works for Redhat. Ceph is one of the storage options, I believe, in the IRODS system which is used by Cyverse.<br />
<br />
Distributed storage servers have thousands of cheap servers in a data center. Each server is connected to many cheap-ish spinning disks. Each incoming file is split into 64MB chunks (this number can vary, e.g. 8MB or 1MB). Each server stores a certain number of chunks. Each file is stored multiple times, typically three times, to make up for breaking hard disks. Humans on roller skates (?) replace broken hard disks. After a replacement, the software then restores the chunks from the other two copies. <br />
<br />
A second type of server, meta data servers, store the file names and on which servers the chunks are stored. The meta data servers are the weak spot of the system, so there are much more than three copies and they probably use SSDs or RAM, not spinning disks. <br />
<br />
There also are internet-facing servers, they act as the gateway between the internal servers and the outside internet.<br />
<br />
When a request for a file comes in from a client on the internet, the gateway asks the meta data servers where the files are stored. The meta data servers try to pick storage servers that are close to the client, not used too much and that have a working hard disk. The meta data servers also count usage and block the request is a file has been requested too often (it may be an illegal video and it's a backup provider, not a webspace provider). This also cost, as the cloud provider has to pay per TB transferred. <br />
<br />
As an additional layer of security, Backup cloud storage providers make sure that the client is not an internet browser trying to stream a cat video or MP3, so instead of sending the file as it is, it replies with an HTTP redirect to the client or shows an HTML page where the user must do something to get the file. All links to the file include a secure token that is only valid for a few minutes and they include the location of the storage servers, so this is typically a very long link.<br />
<br />
Backup cloud providers have no interest to fulfill video or audio streaming, so most do not support byte range requests. Those that do do it only for paid accounts or fulfill them relatively slowly, or have added the feature only recently and only for some customers (e.g. box.com).<br />
<br />
Overall, one can see why backup storage providers cannot be as fast as a normal webserver. Every request has to go through at least one or two redirection layers to find the right storage server, do the authentication, look up if the data is cached somewhere, etc. Chunks have to found across servers and put together. The system is intentionally not built for speed but for low cost of storage and uses redirects and tokens to protect against abuse. <br />
<br />
We could make our UDC work with most of these systems, if we tolerate a single redirect, which we currently don't do. Making our UDC work would mean, however, that it impacts the performance of all of hgTracks, as currently the slowest track holds up the whole display. Also, potentially, we may get blocked by some of these providers if we issue many requests to them.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Custom_Track_Storage&diff=25127Custom Track Storage2018-12-07T16:23:35Z<p>Max: </p>
<hr />
<div>Users can add their own tracks to the genome browser. The preferred system for doing this is called "track hubs", stored on user's own webserver. The older system for adding tracks is called "custom tracks" and the data is stored at UCSC, in the directory "trash/ct".<br />
<br />
* sessions with custom tracks have an entry "ctfile_<db>" for every database with at least one custom track in the cart, e.g. "ctfile_hg19". An example file name is "ctfile_hg19 ../trash/ct/ct_hgwdev_1e21a_910820.bed" (these values are actually urlencoded in the cart - is this true? It's actually &#2F; for slashes ... )<br />
* the ctfile has one line per custom track. Each line starts with "track". It contains a list of key-value entries. <br />
* the entries contain at least these keys:<br />
** dbTrackType<br />
** dbTableName<br />
** db<br />
* special case: for named sessions, the ctfile files are moved to /userdata/ct/rr at some point<br />
<br />
How to find the ctfiles:<br />
<br />
* remember that you can use /cgi-bin/cartDump to show the cart = the genome browser session information. <br />
* each cart (=hgsid) has a list of key=value entries<br />
* the session carts are stored in the "session" table in hgcentral</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Custom_Track_Storage&diff=25126Custom Track Storage2018-12-07T16:22:44Z<p>Max: </p>
<hr />
<div>Users can add their own tracks to the genome browser. The preferred system for doing this is called "track hubs", stored on user's own webserver. The older system for adding tracks is called "custom tracks" and the data is stored at UCSC, in the directory "trash/ct".<br />
<br />
* sessions with custom tracks have an entry "ctfile_<db>" for every database with at least one custom track in the cart, e.g. "ctfile_hg19". An example file name is "ctfile_hg19 ../trash/ct/ct_hgwdev_1e21a_910820.bed" (these values are actually urlencoded in the cart)<br />
* the ctfile has one line per custom track. Each line starts with "track". It contains a list of key-value entries. <br />
* the entries contain at least these keys:<br />
** dbTrackType<br />
** dbTableName<br />
** db<br />
* special case: for named sessions, the ctfile files are moved to /userdata/ct/rr at some point<br />
<br />
How to find the ctfiles:<br />
<br />
* remember that you can use /cgi-bin/cartDump to show the cart = the genome browser session information. <br />
* each cart (=hgsid) has a list of key=value entries<br />
* the session carts are stored in the "session" table in hgcentral</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Custom_Track_Storage&diff=25125Custom Track Storage2018-12-07T11:46:01Z<p>Max: </p>
<hr />
<div>Users can add their own tracks to the genome browser. The new system for doing this is called "track hubs", these are stored on any webserver. The older system for adding tracks is called "custom tracks" where the data is stored at UCSC, in the directory "trash/ct".<br />
<br />
* sessions with custom tracks have an entry "ctfile_<db>" for every database with at least one custom track in the cart, e.g. "ctfile_hg19". An example file name is "ctfile_hg19 ../trash/ct/ct_hgwdev_1e21a_910820.bed" (these values are actually urlencoded in the cart)<br />
* the ctfile has one line per custom track. Each line starts with "track". It contains a list of key-value entries. <br />
* the entries contain at least these keys:<br />
** dbTrackType<br />
** dbTableName<br />
** db<br />
* special case: for named sessions, the ctfile files are moved to /userdata/ct/rr at some point<br />
<br />
How to find the ctfiles:<br />
<br />
* remember that you can use /cgi-bin/cartDump to show the cart = the genome browser session information. <br />
* each cart (=hgsid) has a list of key=value entries<br />
* the session carts are stored in the "session" table in hgcentral</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Custom_Track_Storage&diff=25124Custom Track Storage2018-12-07T11:34:17Z<p>Max: </p>
<hr />
<div>Users can add their own tracks to the genome browser. The new system for doing this is called "track hubs", these are stored on any webserver. The older system for adding tracks is called "custom tracks" where the data is stored at UCSC, in the directory "trash/ct".<br />
<br />
* sessions with custom tracks have an entry "ctfile_<db>" for every database with at least one custom track in the cart, e.g. "ctfile_hg19". An example file name is "ctfile_hg19 ../trash/ct/ct_hgwdev_1e21a_910820.bed"<br />
* the ctfile has one line per custom track. Each line starts with "track". It contains a list of key-value entries. <br />
* the entries contain at least these keys:<br />
** dbTrackType<br />
** dbTableName<br />
** db<br />
* special case: for named sessions, the ctfile files are moved to /userdata/ct/rr at some point<br />
<br />
How to find the ctfiles:<br />
<br />
* remember that you can use /cgi-bin/cartDump to show the cart = the genome browser session information. <br />
* each cart (=hgsid) has a list of key=value entries<br />
* the session carts are stored in the "session" table in hgcentral</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Custom_Track_Storage&diff=25123Custom Track Storage2018-12-07T11:33:43Z<p>Max: </p>
<hr />
<div>Users can add their own tracks to the genome browser. The new system for doing this is called "track hubs", these are stored on any webserver. The older system for adding tracks is called "custom tracks" where the data is stored at UCSC, in the directory "trash/ct".<br />
<br />
* sessions with custom tracks have an entry "ctfile_<db>" for every database with at least one custom track in the cart, e.g. "ctfile_hg19". An example file name is "ctfile_hg19 ../trash/ct/ct_hgwdev_1e21a_910820.bed"<br />
* the ctfile has one line per custom track. Each line starts with "track". It contains a list of key-value entries. <br />
* the entries contain at least these keys:<br />
** dbTrackType<br />
** dbTableName<br />
** db<br />
* special case: for named sessions, the ctfile files are moved to /userdata/ct/rr at some point<br />
<br />
How to find the ctfiles:<br />
<br />
** remember that you can use /cgi-bin/cartDump to show the cart<br />
* each session (=hgsid) has a list of key=value entries that are called the "cart"<br />
* the session carts are stored in the "session" table in hgcentral</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Custom_Track_Storage&diff=25122Custom Track Storage2018-12-07T11:33:08Z<p>Max: </p>
<hr />
<div>Users can add their own tracks to the genome browser. The new system for doing this is called "track hubs", these are stored on any webserver. The older system for adding tracks is called "custom tracks" where the data is stored at UCSC, in the directory "trash/ct".<br />
<br />
* sessions with custom tracks have an entry "ctfile_<db>" for every database with at least one custom track in the cart, e.g. "ctfile_hg19". An example file name is "ctfile_hg19 ../trash/ct/ct_hgwdev_1e21a_910820.bed"<br />
* the ctfile has one line per custom track. Each line starts with "track". It contains a list of key-value entries. <br />
* the entries contain at least these keys:<br />
** dbTrackType<br />
** dbTableName<br />
** db<br />
* special case: for named sessions, the ctfile files are moved to /userdata/ct/rr at some point<br />
<br />
** remember that you can use /cgi-bin/cartDump to show the cart<br />
* each session (=hgsid) has a list of key=value entries that are called the "cart"<br />
* the session carts are stored in the "session" table in hgcentral</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Custom_Track_Storage&diff=25121Custom Track Storage2018-12-07T11:32:52Z<p>Max: </p>
<hr />
<div>Users can add their own tracks to the genome browser. The new system for doing this is called "track hubs", these are stored on any webserver. The older system for adding tracks is called "custom tracks" where the data is stored at UCSC, in the directory "trash/ct".<br />
<br />
* sessions with custom tracks have an entry "ctfile_<db>" for every database with at least one custom track in the cart, e.g. "ctfile_hg19". An example file name is "ctfile_hg19 ../trash/ct/ct_hgwdev_1e21a_910820.bed"<br />
* the ctfile has one line per custom track. Each line starts with "track". It contains a list of key-value entries. <br />
* the entries contain at least these keys:<br />
** dbTrackType<br />
** dbTableName<br />
** db<br />
* special case: for named sessions, the ctfile files are moved to /userdata/ct/rr at some point<br />
<br />
** remember that you can use /cgi-bin/cartDump to show the cart<br />
* each session (=hgsid) has a list of key=value entries that are called the "cart"<br />
* the session carts are stored in the "session" table in hgcentral</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Public_Hub_Guidelines&diff=24985Public Hub Guidelines2018-09-04T09:17:42Z<p>Max: updating hgwdev IP address</p>
<hr />
<div>This page is intended to lay out guidelines for those who are trying to create [https://genome.ucsc.edu/cgi-bin/hgHubConnect Public Hubs]. If you’ve created a hub that you feel meets these requirements and is of general interest to the research community, please contact us at [mailto:genome-www@soe.ucsc.edu genome-www@soe.ucsc.edu] to have it added to the list. <br />
<br />
(As a reference for interpreting trackDb.txt settings use the ''Hub Track Database Definition'' [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#toc glossary])<br />
<br />
= Required Guidelines =<br />
The following guidelines must be met before your hub will be added to our public list:<br />
* Required for both track and assembly hubs:<br />
** You must have a description page for every configuration page (composite, superTrack or stand alone track). Note that multiple tracks and/or composites can use the same description page with the “html” setting. You can find more information on creating track description pages in the [[#Track description page recommendations | Track description page recommendations]] section below.<br />
** All of your description pages MUST have a contact email address prominently displayed.<br />
** Try to have no more than 10 tracks with [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#visibility visibility] set to display (in full, pack, dense, or squish) as default upon first connecting your hub.<br />
** A descriptionUrl html page specified in your hub.txt. This should be a URL to a description page for your entire hub and should include key search terms for the hub contents or a full-text paper that describes that the data in your hub. The descriptionUrl webpage is indexed to enable users to find your hub with our hub search function, so the more terms on your descriptionUrl page the more likely interested users will find your hub.<br />
* Required for only assembly hubs:<br />
** Add a gateway page for each assembly by having a ''htmlPath'' line for each genome not already hosted by UCSC in genomes.txt. [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hubs Wiki]<br />
** The following settings should properly set in your genomes.txt (The last 3 settings will make it easier to find assembly hub species in hgGateway by UI search):<br />
*** defaultPos<br />
*** scientificName<br />
*** organism<br />
*** description<br />
<br />
= Recommended Guidelines =<br />
<br />
These guidelines in the following sections are recommended to improve user experience, but are not required to be implemented before the hub is added to our list of Public Hubs.<br />
<br />
== Track organization recommendations ==<br />
Related tracks can be grouped in a few different ways, namely [https://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#superTrack superTracks], [https://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#aggregate multiWigs], and [https://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#compositeTrack composites]. If your hub includes a large number of tracks, the grouping of tracks may be necessary. This will prevent your hub’s track group from being an overwhelming mess of individual tracks and can make user configuration of your tracks easier.<br />
<br />
=== Composite tracks ===<br />
Related tracks of the same data type (e.g. a set of related bigBed tracks) should be combined into [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#compositeTrack composites] where appropriate. <br />
<br />
* Have [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#view multi-view] only when there is more than one view. Views ideally give alternate access to the same data (e.g. signals and called peaks). Keep in mind that the value of views is that they allow for more than one data/configuration type (e.g. bigBed and bigWig) in a single composite. All subtracks of a view must have the same data type. Likewise, all subtracks of a non-multi-view composite must be the same type.<br />
* Recommendations for using dimensions with your composite tracks:<br />
** There should be no [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#dimensions dimensions] with a single entry (do not have only one cell line represented in dimX=cell), unless data growth is expected to fill in additional entries.<br />
** Using only one dimension: preferably use dimX (e.g. dimensions dimX=cell). This saves vertical User Interface space, but is not always the best choice.<br />
** Using two dimensions: use dimX and dimY (e.g. dimensions dimX=cell dimY=mark)<br />
** Using more than two: use dimX, dimY on the most important dimensions. Then use dimA,B,C as needed on lesser dimensions. (e.g. dimensions dimX=cell dimY=mark dimA=donor_id)<br />
** The A,B,C.. dimensions should probably use [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#filterComposite filterComposite] (e.g. filterComposite dimA)<br />
** Each dimension and views should be represented in [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#sortOrder sortOrder], ideally in order of dimX, dimY, dimA,B,C, view (e.g. sortOrder cell_type=+ mark=+ donor_id=+ view=+). But the hub user may wish for a different sortOrder, which is fine.<br />
** Tags of subGroup/dimension should be short and sweet with no special chars. Also labels can have HTML codes embedded (e.g. NOT CPG_methylation_%=CPG_methylation_% RATHER mpct=CPG_methylation_&_#37)<br />
** Never represent the same [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#subGroups subgroup] in both view and as a dimension (e.g. NOT dimensions dimX=view). For that matter a subgroup should never be in two dimensions (e.g. NOT dimensions dimX=cell dimY=mark dimA=cell). The composite will appear to function but multiple ways of selecting the same thing will create a confusing and inconsistent User Interface.<br />
<br />
=== Super tracks ===<br />
<br />
Extremely large hubs may use [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#superTrack superTracks] as well to achieve a meaningful hierarchy. Super tracks can be used to group together any type of related tracks; for example, you could combine a multiWig, a composite and a bigBed track together into a single superTrack.<br />
<br />
== Track display recommendations ==<br />
* Avoid setting a composite track and all of its subtracks to the same visibility. When you have composite tracks that are hidden by default, it is best to still designate some subtracks to display when the composite track is turned on (visibility dense, versus the default of hide). This provides an example of your track data to users who turn on your composite track. If no subtracks are turned on by default, a user who changes your composite track visibility to "show" won't see anything.<br />
* The [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#shortLabel shortLabel] text should be under 17 characters, or meaningful information may be cut off from display when tracks are set to "dense" visibility.<br />
* The length for a [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#longLabel longLabel] should be limited to around 75 characters.<br />
<br />
== Track description page recommendations ==<br />
* The description page should preferably contain UCSC's standard ''Description'', ''Methods'', ''Contacts''... sections as defined [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#commonSettings here under "html"] and here is [http://genome.ucsc.edu/goldenpath/help/examples/hubExamples/templatePage.html an example template].<br />
** Here are some examples of well done track description pages from various public hubs:<br />
*** [https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=susScr3&hubUrl=http://public.hpcagrogenomics.wur.nl/ABGC/Track_Hubs/Chankyu/hub.txt&g=hub_141191_Liver_RRBS Liver DNA Methylation] track in the Porcine DNA methylation hub - provides a nice example of how you can use colors on your description pages<br />
*** [https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&hubUrl=http://apprisws.bioinfo.cnio.es/trackHub/hub.txt&g=hub_67585_3DStructureAPPRIS APPRIS - protein structural information] track in the Principal Splice Isoforms APPRIS hub - provides and example how you might integrate images into your track description pages and use the same description page for multiple tracks<br />
* Your track description pages should provide meaningful documentation for your tracks<br />
** If you are creating a hub based on a paper, use the paper's abstract as a starting point for your track's ''Description'' section<br />
** The ''Methods'' section expand upon the overview of the ''Description'' section and provide more details about how the data for the track was produced<br />
** You should assume a broad audience of students and researchers will use your hubs. You should spell out common acronyms for those who may be new to genomics. For example, you might write out a term and it’s acronym as follows “Fluorescent in situ hybridization (FISH)” which spells it out and then provides the acronym that you can use throughout the rest of your description page.<br />
* It might be a good idea to include a “Data Access” section on your track description page which describes how to access the data in your hub and where to download the raw data for the tracks in your hub. You can see some examples of “Data Access” sections on the [https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=refSeqComposite NCBI RefSeq] and [https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=spUniprot UniProt] track description pages.<br />
<br />
== Miscellaneous recommendations ==<br />
* Please note that hosting hub files on HTTP tends to work even better than FTP because of the difference in the number of open tcp connections needed. <br />
* The use of ''metadata lines'' can be supported, users need to be well aware that '''support may be replaced''' by another system in the future.<br />
* Create a [https://genome.ucsc.edu/cgi-bin/hgPublicSessions Public Session] that highlights the different data available in your hub in a biologically interesting area of the genome. Be sure to include a "Description" for your session. More about sessions can be found [https://genome.ucsc.edu/goldenPath/help/hgSessionHelp.html here].<br />
<br />
= Connection issues? =<br />
<br />
Sometimes the servers hosting public hubs will experience administrative changes and no longer successfully serve up hub files. In most cases it is likely that new firewalls are limiting the access at the institution and are causing these connection problems. One can please ask their institution's admins to add this IP range as exceptions that are not limited:<br />
<br />
These IP addresses are currently used by official genome browser mirrors: <br />
<br />
* 128.114.119.* = genome.ucsc.edu<br />
* 129.70.40.99 = european mirror, genome-euro.ucsc.edu<br />
* 134.160.84.67 = asian mirror, genome-asia.ucsc.edu<br />
* 128.114.198.32 = genome-test.soe.ucsc.edu, used by developers and for debugging<br />
<br />
Although our site is creating many requests to an institution, each is small and quickly satisfied by the server, so the total load on your webserver should be limited and system administrators will likely not have an issue with adding this exception.<br />
<br />
= Public Hub Examples =<br />
<br />
Many of the [http://genome.ucsc.edu/cgi-bin/hgHubConnect public hubs] in the Genome Browser provide excellent examples or templates for creating your own hub! As a reference for interpreting trackDb.txt lines used in these example hubs, please refer to the ''Hub Track Database Definition'' [http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#toc glossary].<br />
<br />
== Example Track Hubs ==<br />
<br />
=== Example 1 ===<br />
<br />
The [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hubUrl=http://apprisws.bioinfo.cnio.es/trackHub/hub.txt Principal Splice Isoforms APPRIS hub] provides a good example of basic hub that includes a few different annotation tracks. Each track includes its own description page and is colored in such a way that distinguishes it from the other tracks in the hub and native track in the UCSC Genome Browser.<br />
<br />
Here some links to their configuration files and some description pages:<br />
* [http://apprisws.bioinfo.cnio.es/trackHub/hub.txt hub.txt]<br />
* [http://apprisws.bioinfo.cnio.es/trackHub/genomes.txt genomes.txt]<br />
* [http://apprisws.bioinfo.cnio.es/trackHub/trackDb.hg38.txt trackDb.txt] for the default hub assembly, hg38<br />
* Description page for [http://apprisws.bioinfo.cnio.es/trackHub/docs/APPRIS.html APPRIS - Principal Isoforms] track (see track for hg38 in the Genome Browser [http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&hubUrl=http://apprisws.bioinfo.cnio.es/trackHub/hub.txt&g=hub_67585_PrincipalIsoformsAPPRIS here])<br />
<br />
=== Example 2 ===<br />
<br />
The [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hubUrl=http://vizhub.wustl.edu/VizHub/RoadmapIntegrative.txt Roadmap Epigenomics Integrative Analysis Hub] provides a great example of how you might use organize your track if you have thousands of different tracks. The hub uses composites with dimensions to organize thousands of different tracks across a number of cell lines and the uses supertracks to group these tracks even further.<br />
<br />
Here some links to their configuration files and some description pages:<br />
* [http://vizhub.wustl.edu/VizHub/RoadmapIntegrative.txt hub.txt] named “RoadmapIntegrative.txt”<br />
* [http://vizhub.wustl.edu/VizHub/roadmapintegrativeall.txt genomes.txt] named “roadmapintegrativeall.txt”<br />
* [http://vizhub.wustl.edu/VizHub/hg19/roadmap_both_02182015_trackDb.txt trackDb.txt] named “roadmap_both_02182015_trackDb.txt” for hg19<br />
<br />
== Example Assembly Hub ==<br />
<br />
The [http://genome.ucsc.edu/cgi-bin/hgTracks?genome=CB4856Princeton_JR-contig&hubUrl=http://waterston.gs.washington.edu/trackhubs/isolates/hub.txt C elegans isolates hub] provides an excellent example of your assembly hub could look like. The hub creators provide a detailed description page for each assembly, many different annotations tracks each with their own description page, and clearly defined track groups with those related tracks grouped together.<br />
<br />
Here some links to their configuration files and some description pages:<br />
* [http://waterston.gs.washington.edu/trackhubs/isolates/hub.txt hub.txt]<br />
* [http://waterston.gs.washington.edu/trackhubs/isolates/genomes.txt genomes.txt]<br />
* [http://waterston.gs.washington.edu/trackhubs/isolates/CB4856Princeton_JR-contig/trackDb.txt trackDb.txt] for the primary genome in the hub, CB4856Princeton_JR-contig<br />
* [http://waterston.gs.washington.edu/trackhubs/isolates/CB4856Princeton_JR-contig/groups.txt groups.txt] that defines track groups for CB4856Princeton_JR-contig<br />
* Description pages for CB4856Princeton_JR-contig<br />
** [http://waterston.gs.washington.edu/trackhubs/isolates/CB4856Princeton_JR-contig/description.html Assembly gateway] (see gateway page [https://genome.ucsc.edu/cgi-bin/hgGateway?genome=CB4856Princeton_JR-contig&hubUrl=http://waterston.gs.washington.edu/trackhubs/isolates/hub.txt here])<br />
** [http://waterston.gs.washington.edu/trackhubs/isolates/CB4856Princeton_JR-contig/Rajewsky.description.html Rajewsky Mixed Stage RNAseq] (see track [http://genome.ucsc.edu/cgi-bin/hgTrackUi?genome=CB4856Princeton_JR-contig&hubUrl=http://waterston.gs.washington.edu/trackhubs/isolates/hub.txt&g=hub_17367_Rajewsky here])<br />
** [http://waterston.gs.washington.edu/trackhubs/isolates/CB4856Princeton_JR-contig/blat_N2_cDNA_models.description.html WS230 cDNA blat Annotations] (see track [http://genome.ucsc.edu/cgi-bin/hgTrackUi?genome=CB4856Princeton_JR-contig&hubUrl=http://waterston.gs.washington.edu/trackhubs/isolates/hub.txt&g=hub_17367_blat_N2_cDNA_models here])</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Assembly_Hubs&diff=24921Assembly Hubs2018-08-27T10:19:55Z<p>Max: /* Web Server */</p>
<hr />
<div>==Overview==<br />
<br />
The Assembly Hub function allows you to display your novel genome sequence using the UCSC Genome Browser.<br />
<br />
==Web Server==<br />
<br />
To display your novel genome sequence, use a web server at your institution (or free services like [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html#Hosting Cyverse], for usage behind a firewall you can also load them locally through [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib GBiB]) to supply your files to the UCSC Genome Browser. Note that hosting hub files on HTTP is highly recommended and much more efficient than FTP. You then establish a hierarchy of directories and files to host your novel genome sequence. For example:<br />
<pre><br />
myHub/ - directory to organize your files on this hub<br />
hub.txt – primary reference text file to define the hub, refers to:<br />
genomes.txt – definitions for each genome assembly on this hub<br />
newOrg1/ - directory of files for this specific genome assembly<br />
newOrg1.2bit – ‘2bit’ file constructed from your fasta sequence<br />
description.html – information about this assembly for users<br />
trackDb.txt – definitions for tracks on this genome assembly<br />
groups.txt – definitions for track groups on this assembly<br />
bigWig and bigBed files – data for tracks on this assembly<br />
external track hub data tracks can be displayed on this assembly<br />
</pre><br />
<br />
The URL to reference this hub would be: <nowiki>http://yourLab.yourInstitution.edu/myHub/hub.txt</nowiki><br />
<br />
You can view a working example hierarchy of files at:<br />
[http://genome-test.cse.ucsc.edu/~hiram/hubs/Plants/ Plants]<br><br />
A smaller slice of this hub is represented in a [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html Quick Start Guide to Assembly Hubs].<br />
<br />
== Linking to your assembly hub ==<br />
<br />
You can build direct links to the genome(s) in your assembly hub:<br />
<br />
* The hub connect page<br />
** http://genome.ucsc.edu/cgi-bin/hgHubConnect?hgHub_do_redirect=on&hgHubConnect.remakeTrackHub=on&hgHub_do_firstDb=1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt<br />
* The genome gateway page<br />
** http://genome.ucsc.edu/cgi-bin/hgGateway?genome=araTha1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt<br />
* Directly to the genome browser<br />
** http://genome.ucsc.edu/cgi-bin/hgTracks?genome=araTha1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt<br />
<br />
==hub.txt==<br />
<br />
The initial file [http://genome-test.cse.ucsc.edu/~hiram/hubs/Plants/hub.txt hub.txt] is the primary URL reference for your assembly hub. The format of the file:<br />
<pre><br />
hub hubName<br />
shortLabel genome<br />
longLabel Comment describing this hub contents<br />
genomesFile genomes.txt<br />
email contactEmail@institution.edu<br />
descriptionUrl aboutHub.html<br />
</pre><br />
<br />
The ''shortLabel'' is the name that will appear in the ''genome'' pull-down menu at the UCSC gateway page. Example: '''Plants'''<br />
<br />
The ''genomesFile'' is a reference to the next definition file in this chain that will describe the assemblies and tracks available<br />
at this hub. Typically '''genomes.txt''' is at the same directory level as this '''hub.txt''', however it can also be a relative path<br />
reference to a different directory level.<br />
<br />
The ''email'' address provides users a contact point for queries related to this assembly hub.<br />
<br />
The ''descriptionUrl'' provides a relative path or URL link to a webpage describing the overall hub.<br />
<br />
==genomes.txt==<br />
The [http://genome-test.cse.ucsc.edu/~hiram/hubs/Plants/genomes.txt genomes.txt] file provides the references to the<br />
genome assemblies and tracks available at this assembly hub. The example file indicates the typical contents:<br />
<pre><br />
genome ricCom1<br />
trackDb ricCom1/trackDb.txt<br />
groups ricCom1/groups.txt<br />
description July 2011 Castor bean<br />
twoBitPath ricCom1/ricCom1.2bit<br />
organism Ricinus communis<br />
defaultPos EQ973772:1000000-2000000<br />
orderKey 4800<br />
scientificName Ricinus communis<br />
htmlPath ricCom1/description.html<br />
</pre><br />
<br />
There can be multiple assembly definitions in this single file. Separate these stanzas with blank lines. The references to other files are relative path references. In this example there is a sub-directory here called '''ricCom1''' which contains the files for this specific assembly.<br />
<br />
* The ''genome'' name is the equivalent to the UCSC database name. The genome browser displays this database name in title pages in the genome browser.<br />
* The ''trackDb'' refers to a file which defines the tracks to place on this genome assembly. The format of this file is described in the [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hub] help reference documentation.<br />
* The ''groups'' refers to a file which defines the track groups on this genome browser. Track groups are the sections of related tracks grouped together under the primary genome browser graphics display image.<br />
* The ''description'' will be displayed for user information on the gateway page and most title pages of this genome assembly browser. It is the name displayed in the ''assembly'' pull-down menu on the browser gateway page.<br />
* The ''twoBitPath'' refers to the '''.2bit''' file containing the sequence for this assembly. Typically this file is constructed from the original fasta files for the sequence using the kent program '''faToTwoBit'''. This line can also point to a URL, for example, if you are duplicating an existing Assembly Hub, you can use the original hub's 2bit file's URL location here.<br />
* The ''organism'' string is displayed along with the ''description'' on most title pages in the genome browser. Adjust your names in ''organism'' and ''description'' until they are appropriate. This example is very close to what the genome browser normally displays. This ''organism'' name is the name that appears in the ''genome'' pull-down menu on the browser gateway page.<br />
* The ''defaultPos'' specifies the default position the genome browser will open when a user first views this assembly. This is usually selected to highlight a popular gene or region of interest in the genome assembly.<br />
* The ''orderKey'' is used with other genome definitions at this hub to order the pull-down menu ordering the ''genome'' pull-down menu.<br />
* The ''htmlPath'' refers to an html file that is used on the gateway page to display information about the assembly.<br />
<br />
Note that it is strongly encouraged to give each of your genomes stanza's a line for ''defaultPos, scientificName, organism, description'' (along with other above settings) so that when your hub is attached it will load a specified default location and have text to be more easily searched from the Gateway page.<br />
<br />
==2bit file==<br />
The '''.2bit''' file is constructed from the fasta sequence for the assembly. The ''kent'' source program ''faToTwoBit'' is used to construct this file. Download the progrem from the [http://hgdownload.soe.ucsc.edu/admin/exe/ downloads] section of the Browser. <br />
For example:<br />
faToTwoBit ricCom1.fa ricCom1.2bit<br />
<br />
Use the ''twoBitInfo'' to verify the sequences in this assembly and create a ''chrom.sizes'' file which is not used in the hub, but is useful in later processing to<br />
construct the ''big*'' files:<br />
twoBitInfo ricCom1.2bit stdout | sort -k2rn > ricCom1.chrom.sizes<br />
<br />
The ''.2bit'' commands can function with the ''.2bit'' file at a URL:<br />
twoBitInfo -udcDir=. <nowiki>http://genome-test.cse.ucsc.edu/~hiram/hubs/Plants/ricCom1/ricCom1.2bit</nowiki> stdout | sort -k2nr > ricCom1.chrom.sizes<br />
<br />
Sequence can be extracted from the ''.2bit'' file with the ''twoBitToFa'' command, for example:<br />
twoBitToFa -seq=chrCp -udcDir=. <nowiki>http://genome-test.cse.ucsc.edu/~hiram/hubs/Plants/ricCom1/ricCom1.2bit</nowiki> stdout > ricCom1.chrCp.fa<br />
<br />
==groups.txt==<br />
<br />
The [http://genome-test.cse.ucsc.edu/~hiram/hubs/Plants/ricCom1/groups.txt groups.txt] file defines the grouping of track controls under the primary genome browser image display. The example referenced here has the usual definitions as found in the UCSC Genome Browser.<br />
<br />
Each group is defined, for example the '''Mapping''' group:<br />
<pre><br />
name map<br />
label Mapping<br />
priority 2<br />
defaultIsClosed 0<br />
</pre><br />
<br />
* The ''name'' is used in the ''trackDb.txt'' track definition ''group'', to assign a particular track to this group.<br />
* The ''label'' is displayed on the genome browser as the title of this group of track controls<br />
* The ''priority'' orders this track group with the other track groups<br />
* The ''defaultIsClosed'' determines if this track group is expanded or closed by default. Values to use are '''0''' or '''1'''<br />
<br />
==Building Tracks==<br />
<br />
Tracks are defined in the '''trackDb.txt''' where each stanza describes how tracks are displayed (shortLabel/longLabel/color/visibility) and other information such as what group the track should belong to (referencing the groups.txt) and if any additional html should display when one clicks into the track or a track item:<br />
<br />
track gap_<br />
longLabel Gap<br />
shortLabel Gap<br />
priority 11<br />
visibility dense<br />
color 0,0,0<br />
bigDataUrl bbi/ricCom1.gap.bb<br />
type bigBed 4<br />
group map<br />
html ../trackDescriptions/gap<br />
<br />
For more informations about the syntax of the '''trackDb.txt''' file, use [https://genome.ucsc.edu/goldenpath/help/trackDb/trackDbHub.html UCSC's Hub Track Database Definition page]<br />
<br />
It helps to have a cluster super computer to process the genomes to construct tracks. It can be done for small genomes<br />
on single computers that have multiple cores. The process for each track is unique. Please note the continuing<br />
document: [[Browser Track Construction]] for a discussion of constructing tracks for your assembly hub.<br />
<br />
===Cytoband Track===<br />
<br />
Assembly hubs can have a Cytoband track that can allow for quicker navigation of individual chromosomes and display banding pattern information if known. <br />
<br />
A quick version of the track can be built using the existing chrom.sizes files for your assembly<br />
(the banding options include <tt>gneg, gpos25, gpos50, gpos75, gpos100, acen, gvar, or stalk</tt>).<br />
<br />
<tt>cat araTha1.chrom.sizes | sort -k1,1 -k2,2n | awk '{print $1,0,$2,$1,"gneg"}' > cytoBandIdeo.bed</tt><br />
<br />
The resulting bed file can be turned into a big bed and given a .as file (example [http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/cytoBand.as here])<br />
to inform the browser it is not a normal bed.<br />
<br />
<tt> bedToBigBed -type=bed4 cytoBandIdeo.bed -as=cytoBand.as araTha1.chrom.sizes cytoBandIdeo.bigBed</tt><br />
<br />
In the trackDb, as long as the track is named cytoBandIdeo (<tt>track cytoBandIdeo</tt><br />
[http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/trackDb.txt example])<br />
it will load in the assembly hub.<br />
<br />
==Assembly Hub Resources==<br />
There are resources for automatically building assembly hubs available from [http://gonramp.wustl.edu/ G-OnRamp]. <br />
<br />
There is also a collection of Example NCBI assembly hubs that are already working and can either be used or copied as<br />
a template to build further hubs. <br />
<br />
===G-OnRamp===<br />
G-OnRamp is a Galaxy workflow that turns a genome assembly and RNA-Seq data into a Genome Browser with multiple <br />
evidence tracks. Because G-OnRamp is based on the Galaxy platform, developing some familiarity with the key concepts<br />
and functionalities of Galaxy would be beneficial prior to using G-OnRamp. Here is a link to their <br />
[http://gonramp.wustl.edu/?page_id=32 instruction page] that gives an overview of their process.<br />
<br />
===Example NCBI assembly hubs===<br />
<br />
There are a collection of assembly hubs built by an automatic script that can be viewed on our development server (links default to the '''genome-test site''') or if the link to the hub.txt is copied and pasted, it can be manually changed to load on the public site. <br />
<br />
The following table provides links pages to launch various assembly hubs grouped by species subset, where if you scroll down on the page you will find rows for each assembly hub (or groups of further assembly hubs for the bacteria page) allowing one to load individual assemblies by clicking the "common name" hyperlink such as "African bush elephant" on the Vertebrate Mammalian page (please note the statistics in this table below may change as more hubs are added in the future). <br />
<br />
<table class="sortable" border="1"><tr><br />
<th>species<br>subset</th><br />
<th>number&nbsp;of<br>species</th><br />
<th>number&nbsp;of<br>assemblies</th><br />
<th>total&nbsp;contig<br>count</th><br />
<th>total&nbsp;nucleotide<br>count</th><br />
<th>average<br>contig&nbsp;size</th><br />
<th>average<br>assembly&nbsp;size</th><br />
</tr><tr><td align=right>[http://genome-test.cse.ucsc.edu/gbdb/hubs/genbank/vertebrate_other/vertebrate_other.ncbi.html non-Mammalian other Vertebrate assembly hub]</td><br />
<td align=right>156</td><br />
<td align=right>172</td><br />
<td align=right>18,548,615</td><br />
<td align=right>193,684,015,605</td><br />
<td align=right>10,441</td><br />
<td align=right>1,126,069,858</td><br />
</tr><tr><td align=right> [http://genome-test.cse.ucsc.edu/~hiram/hubs/genbank/vertebrate_mammalian/vertebrate_mammalian.ncbi.html Vertebrate Mammalian assembly hub]</td><br />
<td align=right>118</td><br />
<td align=right>204</td><br />
<td align=right>30,643,657</td><br />
<td align=right>498,264,459,566</td><br />
<td align=right>16,259</td><br />
<td align=right>2,442,472,841</td><br />
</tr><tr><td align=right>[http://genome-test.cse.ucsc.edu/gbdb/hubs/genbank/plant/plant.ncbi.html Plant assembly hub]</td><br />
<td align=right>190</td><br />
<td align=right>269</td><br />
<td align=right>34,577,423</td><br />
<td align=right>145,341,422,954</td><br />
<td align=right>4,203</td><br />
<td align=right>540,302,687</td><br />
</tr><tr><td align=right>[http://genome-test.cse.ucsc.edu/gbdb/hubs/genbank/protozoa/protozoa.ncbi.html Protozoa assembly hub]</td><br />
<td align=right>282</td><br />
<td align=right>338</td><br />
<td align=right>3,939,128</td><br />
<td align=right>16,816,724,183</td><br />
<td align=right>4,269</td><br />
<td align=right>49,753,621</td><br />
</tr><tr><td align=right>[http://genome-test.cse.ucsc.edu/gbdb/hubs/genbank/invertebrate/invertebrate.ncbi.html Invertebrates assembly hub]</td><br />
<td align=right>392</td><br />
<td align=right>492</td><br />
<td align=right>32,264,511</td><br />
<td align=right>170,439,035,382</td><br />
<td align=right>5,282</td><br />
<td align=right>346,420,803</td><br />
</tr><tr><td align=right>[http://genome-test.cse.ucsc.edu/gbdb/hubs/genbank/fungi/fungi.ncbi.html Fungi assembly hub]</td><br />
<td align=right>1,106</td><br />
<td align=right>1,215</td><br />
<td align=right>4,143,097</td><br />
<td align=right>38,677,096,556</td><br />
<td align=right>9,335</td><br />
<td align=right>31,833,001</td><br />
</tr><tr><td align=right>[http://genome-test.cse.ucsc.edu/gbdb/hubs/genbank/archaea/archaea.ncbi.html Archaea assembly hub]</td><br />
<td align=right>688</td><br />
<td align=right>742</td><br />
<td align=right>57,569</td><br />
<td align=right>2,010,246,046</td><br />
<td align=right>34,918</td><br />
<td align=right>2,709,226</td><br />
</tr><tr><td align=right>[http://genome-test.cse.ucsc.edu/gbdb/hubs/genbank/bacteria/bacteria.ncbi.html Bacteria assembly hub]</td><br />
<td align=right>34,005</td><br />
<td align=right>58,658</td><br />
<td align=right>8,397,216</td><br />
<td align=right>234,147,691,500</td><br />
<td align=right>27,883</td><br />
<td align=right>3,991,743</td><br />
</tr><br />
</table><br />
<br />
<br />
These assemblies use the '''NCBI accession naming patterns''' on chromosomes. '''Please note his is a prototype work in progress.''' Not all assemblies are represented here yet. Prototype gene tracks from the NCBI gene predictions delivered with the assembly are available on a few assemblies. There are no blat servers on these assemblies. Users could copy the hub skeleton structure of a specific assembly to local systems and run a blat server at their location with their own assembly hub of that specific genome, brief instructions exist on each assembly gateway page under the "Download files for this assembly hub:" section.<br />
<br />
====Example loading African bush elephant assembly hub and looking at the related genomes.txt and trackDb.txt====<br />
Here are some quick steps to load an example hub from this collection, and an attempt to explain how to look at the files behind the hub.<br />
# Click the above [http://genome-test.cse.ucsc.edu/~hiram/hubs/genbank/vertebrate_mammalian/vertebrate_mammalian.ncbi.html Vertebrate Mammalian assembly hub] link.<br />
# Scroll down and find the "common name" column and click the hyperlink for '''"African bush elephant"''' after looking at the other information on that row.<br />
# Note that you have arrived a gateway page that has ''"African bush elephant Genome Browser - GCA_000001905.1_Loxafr3.0 assembly"'' displayed, where you can see a '''"Download files for this assembly hub:"'''' section if you desired to access these specific files and notably a http://genome-test.cse.ucsc.edu/gbdb/hubs/genbank/vertebrate_mammalian/GCA_000001905.1_Loxafr3.0/ link. <br />
# Click "Go" or the top "Genome Browser" blue bar menu to arrive at viewing this assembly hub (note this is on our '''genome-test site''').<br />
# To load this hub on our public site, at the earlier step you can copy the hyperlink for '''"African bush elephant"''' and paste it in a browser and change the very first "http://genome-test.cse.ucsc.edu/cgi-bin/..." to "http://genome.ucsc.edu/cgi-bin/..." instead.<br />
<br />
Now to investigate the files behind the hub to understand the process involved.<br />
# Click the [http://genome-test.cse.ucsc.edu/gbdb/hubs/genbank/vertebrate_mammalian/GCA_000001905.1_Loxafr3.0/ http://genome-test.cse.ucsc.edu/gbdb/hubs/genbank/vertebrate_mammalian/GCA_000001905.1_Loxafr3.0/] link found in the ""Download files for this assembly hub:" section on a loaded assembly hub's gateway page.<br />
# Note the "GCA_000001905.1_Loxafr3.0.ncbi.2bit" file, this is the binary indexed remote file that is allowing the Browser to display this genome.<br />
# Find the "GCA_000001905.1_Loxafr3.0.genomes.ncbi.txt" file and click the [http://genome-test.cse.ucsc.edu/gbdb/hubs/genbank/vertebrate_mammalian/GCA_000001905.1_Loxafr3.0/GCA_000001905.1_Loxafr3.0.genomes.ncbi.txt link] to look at it.<br />
# Review this genomes.txt file, which be used if copied in a new hub to show where the to find the above 2bit on the "twoBitPath" line and also defines where to find all track database to display data on this genome in the "trackDb " line (the [http://genome-test.cse.ucsc.edu/gbdb/hubs/genbank/vertebrate_mammalian/genomes.ncbi.txt real genomes.txt] for this massive hub is up one directory as this hub has 204 assemblies -where you will find this stanza included). Note how the genomes.txt has the "organism" and "scientificName" lines that help annotate how to display this assembly hub, and "groups" line that points to a further file that helps define the grouping of tracks that will be fully described in the trackDb.txt.<br />
# From the earlier link to all the files, click the [http://genome-test.cse.ucsc.edu/gbdb/hubs/genbank/vertebrate_mammalian/GCA_000001905.1_Loxafr3.0/GCA_000001905.1_Loxafr3.0.trackDb.ncbi.txt GCA_000001905.1_Loxafr3.0.trackDb.ncib.txt]<br />
# Review this trackDb.txt file which defines the tracks to display on this hub, and also has "bigDataUrl" lines to tell the Browser where to find the data to display for each track, as well as other features such on some tracks as "searchIndex" and "searchTrix" lines to help support finding data in the hub and "url" and "urlLabel" lines on some tracks to help create links out on items in the hub to other external resources and "html" lines to a file that will have information to display about the data for users who click into tracks.<br />
<br />
==Adding BLAT servers==<br />
<br />
By running your own blat server with [http://genomewiki.cse.ucsc.edu/index.php/Running_your_own_gfServer gfServer] you can add lines to the [http://genomewiki.ucsc.edu/index.php?title=Assembly_Hubs#genomes.txt genomes.txt] file of your assembly hub to enable the browser to access the server and activate blat searches.<br><br />
* First run two instances of gfServer from http://yourLab.yourInstitution.edu at the location of yourAssembly.2bit file, specifying a port that the gfServer will be accessible from for amino acid (<code>-trans</code> option) and DNA searches. Please note the <code>-mask</code> option will ignore all lower-case assembly sequence, which is the convention the UCSC Browser uses for masked sequence, so you may not want to include it from the example below. <br />
<br />
<br />
:Selecting a port:<br />
::When picking a port number, stick with numbers between 1024 and 49151. Anything less than 1024 is considered a system port and you'll need to be root in order to open it. Anything above 49151 is considered dynamic and randomly assigned. If you're starting a server that you will use a web browser to connect to, it is suggested to choose something with 8's in it, since that's the tradition. 8080, 8000, and 8888 are all popular, but other open ports will work just fine.<br />
<br />
<br />
For example, these two lines will specify port 17777 for amino acid searches and 17779 for DNA searches and are run from the publicly accessibly directory location of yourAssembly.2bit file: <br />
<br />
<pre><br />
gfServer start localhost 17777 -trans -mask yourAssembly.2bit &<br />
gfServer start localhost 17779 -stepSize=5 yourAssembly.2bit &<br />
</pre><br />
<br />
* Next edit your genomes.txt stanza that references yourAssembly to have two lines to inform the browser of where the blat servers are located and what ports to use. See an example of commented out lines [http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/genomes.txt here]. Please note the capital "B" in transBlat.<br />
<pre><br />
transBlat yourLab.yourInstitution.edu 17777<br />
blat yourLab.yourInstitution.edu 17779<br />
</pre><br />
<br />
* You should now be able to load and perform blat operations on your assembly. For example a URL such as the following would bring up the blat CGI and have your assembly listed at the bottom of the "Genome:" drop-down menu: http://genome.ucsc.edu/cgi-bin/hgBlat?hubUrl=http://yourLab.yourInstitution.edu/myHub/hub.txt<br />
<br />
* Some institutions have firewalls that will prevent the browser from sending multiple inquiries to your blat servers, in which case you may need to request your admins add this IP range as exceptions that are not limited: <code>128.114.119.*</code> That will cover the U.S. [http://genome.ucsc.edu genome.ucsc.edu] site. In case you may wish the requests to work from our European Mirror [http://genome-euro.ucsc.edu genome-euro.ucsc.edu] site, you would want to include <code>129.70.40.120</code> also to the exception list.<br />
<br />
Please see more about [https://genome.ucsc.edu/FAQ/FAQblat.html#blat5 configuring your blat gfServer here] to replicate the UCSC Browser's settings. The [http://hgdownload.cse.ucsc.edu/downloads.html#source_downloads Source Downloads] page offers access to utilities with pre-compiled binaries such as gfserver found in a blat/ directory for your machine type [http://hgdownload.cse.ucsc.edu/admin/exe/ here] and further blat documentation [http://genome.ucsc.edu/goldenPath/help/blatSpec.html here], and the gfServer usage statement for further options.<br />
<br />
Please also know you can set up gfservers on a [http://genome.ucsc.edu/goldenpath/help/gbib.html GBiB] and run it locally. Please see this [http://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html#blatGbib GBiB assembly blat step-by-step set up] page for details.<br />
<br />
Note: You can stop your instance of gfServer with a command. For example:<br />
<pre><br />
gfServer stop localhost 17860 <br />
</pre><br />
<br />
===Troubleshooting BLAT servers for your hub===<br />
<br />
The following is an example of an error message<br> <br />
when attempting to run a DNA sequence query <br> <br />
via the web-based BLAT tool after loading a hub,<br> <br />
after starting a gfServer instance (from the same dir as the .2bit file). <br> <br />
For example, a command to start an instance of gfServer:<br> <br />
<pre><br />
$ gfServer start localhost 17779 -stepSize=5 contigsRenamed.2bit &<br />
</pre><br />
<br />
Example of a possible error message, from web-based BLAT <br><br />
after attempting a web-based BLAT query:<br />
<br />
<pre><br />
Error in TCP non-blocking connect() 111 - Connection refused<br />
Operation now in progress<br />
Sorry, the BLAT/iPCR server seems to be down. Please try again later.<br />
</pre><br />
<br />
Check the following:<br />
<br />
====Process check====<br />
First, make sure your gfServer instance is running. <br><br />
Type the following command to check for your running gfServer process:<br />
<br />
<pre><br />
$ ps aux | grep gfServer<br />
</pre><br />
<br />
====Check for correct path/filename====<br />
In your genomes.txt file, does your twoBitPath/filename match what you specified in your command to start gfServer?<br />
<br />
:In your genomes.txt file, is the location of the instance to your gfServer correct?<br />
::To check this, you can cd into the directory where you started your gfServer, then type the command:<br />
<br />
<pre><br />
$ hostname -i<br />
</pre><br />
<br />
Your result should be an IP address, for example, "132.249.245.79".<br />
<br />
Now you can test the connection to your port that you specified, with a simple telnet command.<br />
:Type in the following command: ''telnet yourIP yourPort''. For example:<br />
<pre><br />
$ telnet 132.249.245.79 17777<br />
</pre><br />
::The results should read, "Connected to 132.249.245.79".<br />
::Otherwise, if gfServer isn't running or if you typed the wrong location in your telnet command, telnet will say, "Connection refused."<br />
::In this example, check your genomes.txt file, and make sure your blat line reads, "blat 132.249.245.79 17777".<br />
::You may need to change your genomes.txt file from, for example, "blat ''localhost'' 17777" to "blat ''132.249.245.79'' 17777" (use your specific IP/host name where gfServer is running).<br />
<br />
====Check "gfServer status" check====<br />
In the directory of your .2bit flle (should be the same dir where you started gfServer), type: ''gfServer status yourLocation yourPort'' . <br><br />
For example: <br><br />
<br />
<pre><br />
$ gfServer status 132.249.245.79 17777<br />
</pre><br />
You should see output like this:<br />
<pre><br />
version 36x2<br />
type nucleotide<br />
host localhost<br />
port 17777<br />
tileSize 11<br />
stepSize 5<br />
minMatch 2<br />
pcr requests 0<br />
blat requests 0<br />
bases 0<br />
misses 0<br />
noSig 1<br />
trimmed 0<br />
warnings 0<br />
</pre><br />
<br />
====Testing with gfClient====<br />
The best troubleshooting test is to take the webpage out of the equation, and use the command line utility, [http://hgdownload.soe.ucsc.edu/admin/exe/ gfClient], to run the query on your instance of gfServer. If you can successfully connect gfClient to gfServer, you will know that your location and port specification are correct. <br />
<br />
:From the directory that holds your hub's .2bit file (should be the same directory where your instance of gfServer was launched), perform a query using gfClient:<br />
::You can type "gfClient" on your command line to see the usage statement.<br />
::Use the following command: ''gfClient yourLocation yourPort pathOf2bitFile yourFastaQuery.fa nameOfOutputFile.psl'' <br><br />
::FYI: For testing with gfClient, you only need the gfServer binary on your server, not blat.<br />
<br />
'''For example:'''<br />
<pre><br />
$ gfClient localhost 17777 . query.fa gfOutput.psl<br />
</pre><br />
::Note the " . " after the port, to specify that the query will use the .2bit file in the current directory.<br />
::After running this command, take a look at the gfOutput.psl file. If successful, you will see BLAT results.<br />
<br />
'''Another example:'''<br />
* Note: In the example below, "yourLab.yourInstitution.edu" is the name of their machine where you run the gfServer command.<br />
<br />
From the test machine: ''Test the DNA alignment'', where test.fa is some sequence to find:<br />
<pre><br />
gfClient yourLab.yourInstitution.edu 17779 `pwd` test.fa dnaTestOut.psl<br />
</pre><br />
<br />
From the test machine: ''Test the protein alignment'', where proteinSequence.fa is the sequence to find:<br />
<pre><br />
gfClient -t=dnax -q=prot yourLab.yourInstitution.edu 17777 `pwd` proteinSequence.fa proteinOutput.psl<br />
</pre><br />
* NOTE: the ourAssembly.2bit file needs to be on this test machine also.<br />
* The `pwd` says to find the ourAssembly.2bit file in this directory.<br />
<br />
===RAM requirements for BLAT servers===<br />
<br />
The gfServers that provide responses for blat queries can take some amount of memory.<br />
Here is some information that might help in approximating the required amount for genomes of different sizes.<br />
<br />
::The human hg19 genome requires ~2.2GB for the translated amino acid gfServer queries<br />
::and ~2.2GB for the untranslated DNA gfServer queries representing ~3,137,161,000 bp. <br />
<br />
::The zebrafish danRer7 genome requires ~1.2GB for the translated amino acid gfServer queries<br />
::and ~1.1GB for the untranslated DNA gfServer queries representing ~1,412,465,000 bp. <br />
<br />
::The D. melanogaster dm6 genome requires ~300MB for the translated amino acid gfServer queries<br />
::and ~250MB for the untranslated DNA gfServer queries representing ~143,726,000 bp.<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
[[Category:Technical FAQ]]<br />
[[Category:Assembly/Track Hubs]]</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Learn_about_the_Browser&diff=24914Learn about the Browser2018-08-21T11:55:21Z<p>Max: </p>
<hr />
<div>This page links to various pages that will help you to learn more about the UCSC Genome Browser. It is sorted by increasing technical complexity: the first steps require only a webbrowser, the last ones administrator access to a Linux webserver.<br />
<br />
== Use the browser website ==<br />
* Bob Kuhn is giving workshops throughout the year at several US conferences, in Europe, Australia and Asia<br />
** Read about how to [http://genome.ucsc.edu/training/ | host]<br />
* Video tutorials and slides by [http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml Openhelix]<br />
* An introduction on [http://www.nature.com/scitable/ebooks/guide-to-the-ucsc-genome-browser-16569863/contents Nature Education]<br />
* Basic materials written by the gurus: The [http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html User's guide] and the [http://genome.ucsc.edu/FAQ/ FAQs]. <br />
* View a [http://genome.ucsc.edu/cgi-bin/hgPublicSessions gallery] of browser sessions that highlight interesting data sets.<br />
* Type in (manually) a couple of [http://genome.ucsc.edu/FAQ/FAQcustom custom tracks] in [http://genome.ucsc.edu/FAQ/FAQformat.html different formats] (just try bed for a start, you mostly won't need the others)<br />
** Try typing "chr1 1 1000 Hello_World!" as a custom track<br />
** If you need to graph data, there are [[Selecting_a_graphing_track_data_format|different graphing formats]]<br />
* Subscribe to the [http://www.soe.ucsc.edu/mailman/listinfo/genome mailing list] or [http://genome.ucsc.edu/contacts.html search through it]. Most everyday questions have already been asked by someone else so searching gives you an answer usually faster than asking on the mailing list.<br />
<br />
== Download the data of the genome browser (sequences and annotations) ==<br />
* Be aware that internal coordinates (not those shown on the website) are [[Coordinate_Transforms|0-based]]! (only exception: wiggle)<br />
* Unlike Gbrowse and Ensembl, UCSC is storing the data partially in SQL (coordiantes, outline of x-y-plots) and partially in flat text files (sequences, alignments, details of x-y-plots)<br />
* Table Browser: The easiest way to access data (you don't have to care whether data is stored in MySQL or in textfiles):<br />
** [http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html Table Browser], [[Image:ISMB2008_UCSC.ppt|Example session]]<br />
* SQL-stored data ([http://genome.ucsc.edu/FAQ/FAQdownloads FAQ]):<br />
** [http://genome.ucsc.edu/FAQ/FAQdownloads#download29 Public mysql access]<br />
** [[Image:Kent_allJoiner.ppt|The all.joiner file]], describes relations between all database tables<br />
** For all genes-related tables, there is a [http://genome-test.cse.ucsc.edu/images/knownGeneSchema.gif graphical map] (sub-optimal layout :-)<br />
* Flat-file data: Download from the [http://genome.ucsc.edu/FAQ/FAQdownloads#download1 ftp server] (stored in /gbdb on browser servers)<br />
** [http://genome.ucsc.edu/FAQ/FAQformat.html Text file formats]<br />
<br />
== Install a copy of the browser on your own machine (Unix or Mac) ==<br />
* Create a mirror of the UCSC site:<br />
** [http://genome.ucsc.edu/admin/mirror.html Official FAQ: mirror a complete browser], the main reference for browser installation<br />
** [[Browser_Installation|Inofficial FAQ: Browser Installation]], on the wiki, a lot more info<br />
** [[Minimal Browser Installation|Mirror only selected genomes]]<br />
** Adapt your cgi-bin/[[Hg.conf]] file for your mirror<br />
** Be prepared that you'll need to create at least and also download parts of the hg18. At least mouse genomes still depend on it.<br />
** Make sure that you keep your own tracks separate, read this before loading any local data: [[Local_tracks_at_mirror_sites|Local TrackDB table]]<br />
** [[Browser_Mirrors|Updating the data automatically from UCSC]]<br />
** [[Using_custom_track_database|Pro and cons of storing custom tracks in MySQL or as flat files]]<br />
** [[Cookie_Session|How are cookies handled by the browser?]]<br />
** [[Category:Mirror_Site_FAQ|All other documents in this category]]<br />
** The [http://genome-test.cse.ucsc.edu/eng/ old documentation website with developer documentation]<br />
**[[Running your own gfServer]] (=BLAT server), needed in cases where you don't want or cannot use the UCSC BLAT servers.<br />
<br />
== Compile the UCSC source tree and analyze genomes yourself ==<br />
* The '''source tree''' (aka Kent source) is the collection of all tools used at the UCSC browser group<br />
* You could also call this an '''API''', as you can use these tools from your own programs<br />
* General introduction to the [[the source tree]] <br />
* What is documented to be available in the [[Kent source utilities]] <br />
** ... and what is [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/sources really available] in the source tree?<br />
* Some important tools are already compiled (even for windows!) [http://hgdownload.cse.ucsc.edu/admin/exe/ from here]<br />
* [http://genome.ucsc.edu/admin/jk-install.html Compile all tools (includes the browser webserver)] yourself<br />
** The best starting point is the main zipfile: http://hgdownload.cse.ucsc.edu/admin/jksrc.zip<br />
** You need to set some [[Build_Environment_Variables|environment variables]] before you start the compilation<br />
** Walkthroughs for...<br />
*** [http://bergmanlab.smith.man.ac.uk/?p=32 MacOS X users]<br />
*** [[Source tree compilation on Debian/Ubuntu|Debian and Ubuntu]]<br />
*** [[CentOS Notes]]<br />
*** Windows: see [https://lists.soe.ucsc.edu/pipermail/genome-mirror/2008-November/001059.html] but your mileage may vary to get everything to compile.<br />
<br />
** The most common problem on the mailing list are harmless warnings that trigger errors. "cc1: warnings being treated as errors". To ignore them, which is usually safe, [https://lists.soe.ucsc.edu/pipermail/genome-mirror/2006-November/000251.html remove the -Wall option from the makefile]<br />
** See also the textfile README.building.source [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/product/README.building.source README.building.source]<br />
** [http://genome.ucsc.edu/contacts.html Search through the archives] or the [[:Category:Technical FAQ| Technical-FAQ-Category of the wiki]] when you have problems<br />
** Send an email to the mailing list if you cannot find the answer yourself<br />
** Some more sophisticated genome pipelines<br />
*** Create your own [[Whole_genome_alignment_howto|whole-genome alignment]]<br />
*** Create your own [[LiftOver_Howto|liftOver file]]<br />
<br />
== Modify your own copy of the browser ==<br />
* You load track data into your mirror <br />
** with the hgLoadxxx utils (e.g. hgLoadBed)<br />
** and show meta data (name, type, possible controls, etc) on the browser by editing a textfile called trackDb.ra<br />
* TrackDB documentation:<br />
** [http://www.soe.ucsc.edu/~sugnet/doc/trackHowto/browserTalk.pdf Charles Sugnet's presentation about TrackDB] <br />
** [[How_to_add_a_track_to_a_mirror]]<br />
** [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/product/README.trackDb The structure of the trackDB]<br />
** [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/makeDb/trackDb/ Full TrackDB-Documentation: shows all possible trackDB statements]<br />
*** Examples: [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/makeDb/trackDb/human/hg19/trackDb.ra UCSC trackDb for hg19]<br />
*** Examples for the composite statements in trackDb, from the [Encode trackDb http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/makeDb/trackDb/human/hg19/trackDb.wgEncode.ra]<br />
* When you run into problems, search through the [http://genome.ucsc.edu/mirror.html mailing list] when you have problems and read the documentation in the directory [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/product/ kent/src/product]<br />
** [http://genome-test.cse.ucsc.edu/~hiram/rgbItemExamples.html#RGB color codes]<br />
** [http://genome-test.cse.ucsc.edu/admin/ a similar place with hgSearchSpec docs and statistics (the system to search for ids)] can be found completely off-track<br />
* Create a browser for a [[Building a new genome database|completely new genome]]<br />
** Slides of a class on [https://banana-slug.soe.ucsc.edu/_media/lecture_notes:genomebrowsersetup.pdf Creating a browser for new bacterial genome] <br />
* The most difficult part: [[Writing_a_new_track_type|Adding a completely new track type and visualisation code to the browser]]<br />
<br />
== Making Of: How the UCSC genome annotations are created ==<br />
* How the UCSC folks created their tracks:<br />
** The [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/makeDb/doc/ UCSC's makeDb-files] are a log of all commands that are necessary to re-create all annotations for a particular genome. <br />
** Some explanations how to read the makeDb files: [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/doc/bashVsCsh.txt bashVsCsh] <br />
** The [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/makeDb/trackDb/ trackDb-files] are the "track database" of the browser. They include all instructions on how annotations are displayed (e.g. the type of display, colors, settings, options, filenames, etc.)<br />
** [[Implementation_Notes|Notes on the history of the internal tools: Autosql. Blastz. Chains and nets]]<br />
* Whole-genome multiple alignments ([[Chains_Nets]])<br />
** [[Mm9_multiple_alignment]]<br />
** Whole-genome alignment pipeline: [[Chains_Nets|Angies mental model]] and [[Whole genome alignment howto|Max's howto]]<br />
* [http://genome.ucsc.edu/FAQ/FAQlicense#license4 How to create a browser for a new genome from scratch]<br />
* [http://users.soe.ucsc.edu/~markd/genbank-update/ Genbank Updates]<br />
* Cross-links to other databases<br />
** Outlinks for genes are copied from uniProt. The whole uniProt update process is rather complicated, see [[UCSC_Genes_Staging_Process]]<br />
<br />
== Developing with the UCSC API ==<br />
* [https://lists.soe.ucsc.edu/pipermail/genome-mirror/2010-March/001677.html Debug the cgi-scripts with GDB]<br />
* Understand the [http://biostar.stackexchange.com/questions/3669/is-there-such-a-thing-as-a-ucsc-api/3723#3723 binning scheme]<br />
<br />
== Statistics, overviews ==<br />
*[[Gene_Set_Summary_Statistics]]<br />
* [http://genome-test.cse.ucsc.edu/~hiram/WEBStats/ Web hit statistics of the UCSC browser] <br />
<br />
[[Category:Technical FAQ]]</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Learn_about_the_Browser&diff=24913Learn about the Browser2018-08-21T11:55:08Z<p>Max: </p>
<hr />
<div>This page links to various pages that will help you to learn more about the UCSC Genome Browser. It is sorted by increasing technical complexity: the first steps require only a webbrowser, the last ones administrator access to a Linux webserver.<br />
<br />
== Use the browser website ==<br />
* Bob Kuhn is giving workshops throughout the year at several US conferences, in Europe, Australia and Asia<br />
** Read about how to [http://genome.ucsc.edu/training/ | host]<br />
* Video tutorials and slides by [http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml Openhelix]<br />
* An introduction on [http://www.nature.com/scitable/ebooks/guide-to-the-ucsc-genome-browser-16569863/contents Nature Education]<br />
* Basic materials written by the gurus: The [http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html User's guide] and the [http://genome.ucsc.edu/FAQ/ FAQs]. <br />
* View a [http://genome.ucsc.edu/cgi-bin/hgPublicSessions|gallery] of browser sessions that highlight interesting data sets.<br />
* Type in (manually) a couple of [http://genome.ucsc.edu/FAQ/FAQcustom custom tracks] in [http://genome.ucsc.edu/FAQ/FAQformat.html different formats] (just try bed for a start, you mostly won't need the others)<br />
** Try typing "chr1 1 1000 Hello_World!" as a custom track<br />
** If you need to graph data, there are [[Selecting_a_graphing_track_data_format|different graphing formats]]<br />
* Subscribe to the [http://www.soe.ucsc.edu/mailman/listinfo/genome mailing list] or [http://genome.ucsc.edu/contacts.html search through it]. Most everyday questions have already been asked by someone else so searching gives you an answer usually faster than asking on the mailing list.<br />
<br />
== Download the data of the genome browser (sequences and annotations) ==<br />
* Be aware that internal coordinates (not those shown on the website) are [[Coordinate_Transforms|0-based]]! (only exception: wiggle)<br />
* Unlike Gbrowse and Ensembl, UCSC is storing the data partially in SQL (coordiantes, outline of x-y-plots) and partially in flat text files (sequences, alignments, details of x-y-plots)<br />
* Table Browser: The easiest way to access data (you don't have to care whether data is stored in MySQL or in textfiles):<br />
** [http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html Table Browser], [[Image:ISMB2008_UCSC.ppt|Example session]]<br />
* SQL-stored data ([http://genome.ucsc.edu/FAQ/FAQdownloads FAQ]):<br />
** [http://genome.ucsc.edu/FAQ/FAQdownloads#download29 Public mysql access]<br />
** [[Image:Kent_allJoiner.ppt|The all.joiner file]], describes relations between all database tables<br />
** For all genes-related tables, there is a [http://genome-test.cse.ucsc.edu/images/knownGeneSchema.gif graphical map] (sub-optimal layout :-)<br />
* Flat-file data: Download from the [http://genome.ucsc.edu/FAQ/FAQdownloads#download1 ftp server] (stored in /gbdb on browser servers)<br />
** [http://genome.ucsc.edu/FAQ/FAQformat.html Text file formats]<br />
<br />
== Install a copy of the browser on your own machine (Unix or Mac) ==<br />
* Create a mirror of the UCSC site:<br />
** [http://genome.ucsc.edu/admin/mirror.html Official FAQ: mirror a complete browser], the main reference for browser installation<br />
** [[Browser_Installation|Inofficial FAQ: Browser Installation]], on the wiki, a lot more info<br />
** [[Minimal Browser Installation|Mirror only selected genomes]]<br />
** Adapt your cgi-bin/[[Hg.conf]] file for your mirror<br />
** Be prepared that you'll need to create at least and also download parts of the hg18. At least mouse genomes still depend on it.<br />
** Make sure that you keep your own tracks separate, read this before loading any local data: [[Local_tracks_at_mirror_sites|Local TrackDB table]]<br />
** [[Browser_Mirrors|Updating the data automatically from UCSC]]<br />
** [[Using_custom_track_database|Pro and cons of storing custom tracks in MySQL or as flat files]]<br />
** [[Cookie_Session|How are cookies handled by the browser?]]<br />
** [[Category:Mirror_Site_FAQ|All other documents in this category]]<br />
** The [http://genome-test.cse.ucsc.edu/eng/ old documentation website with developer documentation]<br />
**[[Running your own gfServer]] (=BLAT server), needed in cases where you don't want or cannot use the UCSC BLAT servers.<br />
<br />
== Compile the UCSC source tree and analyze genomes yourself ==<br />
* The '''source tree''' (aka Kent source) is the collection of all tools used at the UCSC browser group<br />
* You could also call this an '''API''', as you can use these tools from your own programs<br />
* General introduction to the [[the source tree]] <br />
* What is documented to be available in the [[Kent source utilities]] <br />
** ... and what is [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/sources really available] in the source tree?<br />
* Some important tools are already compiled (even for windows!) [http://hgdownload.cse.ucsc.edu/admin/exe/ from here]<br />
* [http://genome.ucsc.edu/admin/jk-install.html Compile all tools (includes the browser webserver)] yourself<br />
** The best starting point is the main zipfile: http://hgdownload.cse.ucsc.edu/admin/jksrc.zip<br />
** You need to set some [[Build_Environment_Variables|environment variables]] before you start the compilation<br />
** Walkthroughs for...<br />
*** [http://bergmanlab.smith.man.ac.uk/?p=32 MacOS X users]<br />
*** [[Source tree compilation on Debian/Ubuntu|Debian and Ubuntu]]<br />
*** [[CentOS Notes]]<br />
*** Windows: see [https://lists.soe.ucsc.edu/pipermail/genome-mirror/2008-November/001059.html] but your mileage may vary to get everything to compile.<br />
<br />
** The most common problem on the mailing list are harmless warnings that trigger errors. "cc1: warnings being treated as errors". To ignore them, which is usually safe, [https://lists.soe.ucsc.edu/pipermail/genome-mirror/2006-November/000251.html remove the -Wall option from the makefile]<br />
** See also the textfile README.building.source [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/product/README.building.source README.building.source]<br />
** [http://genome.ucsc.edu/contacts.html Search through the archives] or the [[:Category:Technical FAQ| Technical-FAQ-Category of the wiki]] when you have problems<br />
** Send an email to the mailing list if you cannot find the answer yourself<br />
** Some more sophisticated genome pipelines<br />
*** Create your own [[Whole_genome_alignment_howto|whole-genome alignment]]<br />
*** Create your own [[LiftOver_Howto|liftOver file]]<br />
<br />
== Modify your own copy of the browser ==<br />
* You load track data into your mirror <br />
** with the hgLoadxxx utils (e.g. hgLoadBed)<br />
** and show meta data (name, type, possible controls, etc) on the browser by editing a textfile called trackDb.ra<br />
* TrackDB documentation:<br />
** [http://www.soe.ucsc.edu/~sugnet/doc/trackHowto/browserTalk.pdf Charles Sugnet's presentation about TrackDB] <br />
** [[How_to_add_a_track_to_a_mirror]]<br />
** [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/product/README.trackDb The structure of the trackDB]<br />
** [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/makeDb/trackDb/ Full TrackDB-Documentation: shows all possible trackDB statements]<br />
*** Examples: [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/makeDb/trackDb/human/hg19/trackDb.ra UCSC trackDb for hg19]<br />
*** Examples for the composite statements in trackDb, from the [Encode trackDb http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/makeDb/trackDb/human/hg19/trackDb.wgEncode.ra]<br />
* When you run into problems, search through the [http://genome.ucsc.edu/mirror.html mailing list] when you have problems and read the documentation in the directory [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/product/ kent/src/product]<br />
** [http://genome-test.cse.ucsc.edu/~hiram/rgbItemExamples.html#RGB color codes]<br />
** [http://genome-test.cse.ucsc.edu/admin/ a similar place with hgSearchSpec docs and statistics (the system to search for ids)] can be found completely off-track<br />
* Create a browser for a [[Building a new genome database|completely new genome]]<br />
** Slides of a class on [https://banana-slug.soe.ucsc.edu/_media/lecture_notes:genomebrowsersetup.pdf Creating a browser for new bacterial genome] <br />
* The most difficult part: [[Writing_a_new_track_type|Adding a completely new track type and visualisation code to the browser]]<br />
<br />
== Making Of: How the UCSC genome annotations are created ==<br />
* How the UCSC folks created their tracks:<br />
** The [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/makeDb/doc/ UCSC's makeDb-files] are a log of all commands that are necessary to re-create all annotations for a particular genome. <br />
** Some explanations how to read the makeDb files: [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/doc/bashVsCsh.txt bashVsCsh] <br />
** The [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/makeDb/trackDb/ trackDb-files] are the "track database" of the browser. They include all instructions on how annotations are displayed (e.g. the type of display, colors, settings, options, filenames, etc.)<br />
** [[Implementation_Notes|Notes on the history of the internal tools: Autosql. Blastz. Chains and nets]]<br />
* Whole-genome multiple alignments ([[Chains_Nets]])<br />
** [[Mm9_multiple_alignment]]<br />
** Whole-genome alignment pipeline: [[Chains_Nets|Angies mental model]] and [[Whole genome alignment howto|Max's howto]]<br />
* [http://genome.ucsc.edu/FAQ/FAQlicense#license4 How to create a browser for a new genome from scratch]<br />
* [http://users.soe.ucsc.edu/~markd/genbank-update/ Genbank Updates]<br />
* Cross-links to other databases<br />
** Outlinks for genes are copied from uniProt. The whole uniProt update process is rather complicated, see [[UCSC_Genes_Staging_Process]]<br />
<br />
== Developing with the UCSC API ==<br />
* [https://lists.soe.ucsc.edu/pipermail/genome-mirror/2010-March/001677.html Debug the cgi-scripts with GDB]<br />
* Understand the [http://biostar.stackexchange.com/questions/3669/is-there-such-a-thing-as-a-ucsc-api/3723#3723 binning scheme]<br />
<br />
== Statistics, overviews ==<br />
*[[Gene_Set_Summary_Statistics]]<br />
* [http://genome-test.cse.ucsc.edu/~hiram/WEBStats/ Web hit statistics of the UCSC browser] <br />
<br />
[[Category:Technical FAQ]]</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Gbib_updates&diff=24912Gbib updates2018-08-21T11:45:10Z<p>Max: </p>
<hr />
<div>This page is no longer maintained.<br />
<br />
See http://genomewiki.ucsc.edu/genecats/index.php/Gbib_updates</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Gbib_updates&diff=24911Gbib updates2018-08-21T11:45:02Z<p>Max: </p>
<hr />
<div>This page is no longer maintained.<br />
<br />
http://genomewiki.ucsc.edu/genecats/index.php/Gbib_updates</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Gbib_release&diff=24910Gbib release2018-08-21T11:44:47Z<p>Max: </p>
<hr />
<div>This page is no longer maintained.<br />
<br />
http://genomewiki.ucsc.edu/genecats/index.php/Gbib_release</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Gbib_auto_updates&diff=24909Gbib auto updates2018-08-21T11:44:21Z<p>Max: </p>
<hr />
<div>This page is no longer maintained.<br />
<br />
http://genomewiki.ucsc.edu/genecats/index.php/Gbib_auto_updates</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Gbib_development&diff=24908Gbib development2018-08-21T11:44:04Z<p>Max: </p>
<hr />
<div>This page is no longer maintained.<br />
<br />
See http://genomewiki.ucsc.edu/genecats/index.php/Gbib_development</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Starting_in_David%27s_group&diff=24894Starting in David's group2018-07-30T10:17:40Z<p>Max: /* Travel */</p>
<hr />
<div>* This page is more for post-docs and grad students, but maybe also for staff<br />
* An older, non-wiki version of the new staff page is at [http://genecats.cse.ucsc.edu/eng/index.html], the current page is at [http://genomewiki.ucsc.edu/genecats/index.php/Welcome_to_Browser_Staff]<br />
* You might want to print this page<br />
== If you don't read anything here, read at least this ==<br />
* Until the next crash of the IT economy (expected for 2018 or 2019), given our salaries, housing is the single most annoying problem while working for UCSC<br />
* Get your name now onto a waiting list! It takes five minutes and does not require a UCSC email address or appointment letter, they're very flexible. You'll arrive here and just have to pick up a key. They even provide furniture, if you need it.<br />
* Read [[#Housing]] now<br />
* Arriving at the right time makes all the difference. Avoid August/September, thousands of students are in town and snatch any apartment for insane prices. June-Sep is the busy time for single family homes, with most offers.<br />
* But it's easiest to find a temporary places during the time June-Sept, as the students are away. You may want to start in June, if you can.<br />
* You cannot rent a place from abroad here. Book an Airbnb for the first weeks and work up from there.<br />
* Bring rental references, even from abroad and a letter from your UCSC boss, it's a landlord market here and you will have to apply to a few places to get one.<br />
<br />
== If you are an alien: Before you leave ==<br />
* Visa at home: J-1 visas have high priority, you get them often one day after the embassy appointment. But don't underestimate the time it takes to get an appointment and all the paperwork. You need to pay a fee for the SEVIS website (=the database of foreign students/scholars), a fee for the embassy and a fee for the appointment scheduling system (=built by a federal subcontractor). Keep the SEVIS receipt PDF in a safe place, you'll need it every year afterwards during your postdoc. The appointment itself takes only a few minutes, plus some waiting time. The embassies are actually very well organized, so while they make a big fuss of every little detail in the application forms, in Munich and Paris at least you can show up without a picture (there is a machine), without the stamps (there is another machine for that), with a mobile phone (they do keep it) and even without the application printout (they have PCs and printers). In Paris it's a bit different, you absolutely need the visa fee receipt, as you cannot get into the building otherwise.<br />
* H1b visas are even quicker than J1s. Once you have the appointment, it takes only a few days.<br />
* Spouses: J1s come with a spouse working permit, but that can take 6 month to get, so apply very early for the J2 work permit. UCSC gives out J1s only for a year, which is horrible, it means your spouse can almost not get a job. Lobby the University to give you a longer contract, try to go through your PI, do anything you can, there is no rational reason for this limit. A longer contract does not mean that they cannot lay you off and it does not mean that they cannot revoke the visa. UCSD and Berkeley give out longer contracts, no problem, just UCSC is strange in that respect. Ask the visa office for the exception, it may have changed by now.<br />
* Don't bring too much stuff. Everything is cheap here on craigslist. Americans love to throw things away. Bikes, etc are easy to find especially at the end of the University term. Moving companies are very expensive and it will take them 2-3 months to get your stuff across any ocean, even if they say otherwise (once they have your boxes, they don't care anymore). Once your stuff has arrived, you'll have it already replaced (if it's important), most of the fragile things will be broken,<br />
<br />
== Housing ==<br />
* Grad students qualify for "Family and Student housing". Get your name onto the waitling list, now! [http://housing.ucsc.edu/family/]<br />
* Postdocs qualify for sometimes for Family and Student housing and some parts of Employee housing. <br />
** Contact [http://housing.ucsc.edu/family/] and ask if they can add you to the family housing list. Do it now!<br />
** Employee housing is a totally different system. Contact them and ask for your options, most likely it will be Laureate Court. [http://employeehousing.ucsc.edu]. Do it now! The waiting time is 3-9 months. No official appointment letter is needed AFAIK, just a contact on campus who can confirm that you'll be hired.<br />
** "UCSC community rentals" is a craiglist-like site with very few options. Don't count on finding something but it may be worth the five minutes to check.<br />
** Employeehousing also has houses on sale. This sounds crazy for a 2 year contract, but get your name also added to this list! [https://employeehousing.ucsc.edu/pdf/for-sale-housing-questionnaire.pdf] The reason is that these are not real sales, there is only one seller (UCSC) and one buyer (UCSC), there always is demand (there even was in 2008) and you have an almost guaranteed salesprice, because it goes up x% per year. You are not allowed to modify your house so it's basically like renting, except that you can get your money back. You should add your name to this list as soon as you can (once you're really here). You can do it online.<br />
* For temporary housing, try the Craigslist or this facebook group [https://www.facebook.com/groups/342322422616630/forsaleposts/]<br />
<br />
== For aliens in Santa Cruz ==<br />
** For general advice on how to start in the US and in Santa Cruz in particular, see [http://users.soe.ucsc.edu/~jill/postdoc.html Jill Bejerano's old page], tons of interesting stuff, unfortunately, not all of this is valid anymore (e.g. credit history is becoming less important, after the crash, not having credit might be an advantage actually)<br />
** Count on running around for 2-3 weeks to get all the paperwork done<br />
** There is no I94 stamp in a passport anymore. Get it from here https://i94.cbp.dhs.gov/I94/<br />
** validate your J1 Visa: No rush, you have at least 10 days. You need [http://oie.ucsc.edu/is3/scholar/j1/prearrival.html ISSS] to get your [http://www.ice.gov/sevis/students/ SEVIS] entry OKed before you can apply for a social security number. After this, you have to wait until a time that is at least 10 days after your entry into the US, and 48 hours after ISSS has made the entry. You need to bring your DS2019 to ISSS and your passport with the I94 inside. At the social security office, you can shorten the wait by filling out a "new SSN" forms next to the entrance, on the wall. If you show up at 9:15, then the waiting time is usually not longer than 10 minutes. Show up at 11:00 and you might wait for an hour.<br />
** You can get the SSN two days later by going to US Soc Sec Adm again<br />
** You can have the social security card sent to the lab, just make sure that you have the right address, with the "Mailstop: CBSE-ITI" mentioned on it. <br />
** Next step: Open a bank account <br />
*** For foreigners: Rather avoid credit unions and local banks: They are difficult to wire money to as often they don't have SWIFT BICs<br />
*** Bank of the West insisted on a proof of address, Bank of America, Chase, Wells Fargo, etc seem to be easier<br />
*** My recommendation: Go to BoA, Wells Fargo or Chase, so you can transfer money from abroad <br />
*** Don't believe what the banks tell you: You can open an account without a social security number. Sometimes they will restrict the account (no debit, no checks). In this case, you can go to a different bank<br />
*** Wells Fargo didn't have any restriction on non-SSN accounts<br />
*** to send money from/to overseas USForex.com has amazing service and branches in Europe, Australia, Canada and the US, with real people on the phone who route the transfer through their network. It takes around one week and is a lot easier to setup than via your bank, though not a lot cheaper. <br />
** Send the SSN to HR (Jolinda) and the bank account details to leeann@ucsc.edu<br />
** As an alien (not Canada), you don't need to pay tax for two years: http://www.ucop.edu/ucophome/cao/paycoord/taxstate.html<br />
** Depending on the Tax Treaty, you might have to reimburse taxes if you stay more than 2 years and 1 day (England) (not France, Germany, Spain)<br />
** Be very careful with the health insurance. You have 30 days to sign up for the insurance, it's close to impossible afterwards. The only time you can make changes is in late November, every year. If you're not used to the US, ask your colleagues. Don't go to the hospital unless you must and you know you have insurance. Calling ambulances is usually not a good idea, unless you're having a heart attack. Read up on Wikipedia about the difference between HMOs (like England/Italy/Spain) and PPOs (more like in Switzerland/Germany/Netherlands).<br />
** Technically, you don't need a California drivers license anymore if you have one from home. So, no rush. However, it is still very useful. I would get one. Get an appointment at the DMV. https://www.dmv.ca.gov/. Do not go without an appointment, it's like in a third world country there. Once you get to talk to someone, they're actually nice. Practice the drivers license questions using samples on the internet. Spending 1-2 hours on it should be enough, the drivers licenses here are a joke compared to most other rich countries. Make sure you bring a printout of your I94, they will not do it for you. Go to this website, https://i94.cbp.dhs.gov/I94/ download your I-94 and bring it with you, otherwise they cannot give you a driver's license. Make sure to practice the hand signs for the driving practice test. Once you're through it, you will get a temporary permit and a few weeks later the real license. <br />
** If you don't want to renew your license every year, you can get your driver's license under AB-60, as an illegal, undocumented immigrant. To become an undocumented immigrant officially, you need to submit lots of documents. Bring lots of papers that prove who you are (bank account, rental agreement, 3 paystubs, etc) and you will get a five-year driver's license, which is pretty convenient. It has a little mark though that it is not a federal piece of ID, but apparently it's accepted at airports by TSA. (Either way, never fly without your passport, even domestically.) It's too new to tell if there are any real inconveniences of this route.<br />
<br />
== Santa Cruz ==<br />
* Map with places we like, eating places, supermarkets, bars, etc [http://maps.google.com/maps/ms?ie=UTF&msa=0&msid=216293239069191276808.0004a4a863d034357e95f] , created by Thomas Juettemann and Max<br />
<br />
== Administrative ==<br />
* Postdoc office phone numer: +1 831 459 5232<br />
* Address<br />
** Mail Stop: CBSE<br />
** Mailing Address: 501 Engineering 2 Building, Mailstop CBSE/ITI, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064<br />
** FAX Number: 831-459-1809<br />
* You will have an appointment at Human Resources on the ground floor of the Engineering 2 building (=our building) where you will have to fill out and sign tons of forms. They also give you a campus map<br />
* Healthcare:<br />
** Medical: Do some research on the difference between PPO and HMO on Wikipedia. PPO means that you pay per service, but you can select and change doctors at your will, you can see specialists. With HMO you have to go to an assigned family doctor who might not be what you want and you cannot see specialists without his approval. The details are buried in PDF files [http://www.garnett-powers.com/postdoc/medical.htm here] and you can also look up the available doctors in the Santa Cruz area for each plan.<br />
** Number one rule of health care in the US: don't pay the bills!! I have double-checked my last four bills and three of them contained errors that meant I had to pay too much. They systematically overcharge you. The difference for me was 1200$ in just three bills. Always assume that any bill you receive from a hospital is wrong and too high. The bill itself does not allow you to check it, because they do not include the essential information (lab test or not, preventive or not, date of service, in or out of network provider) on the bill. Only the insurance knows the details. The bill also uses unreadable CPT codes which you have to google. There is zero legislation on how health care bills have to be written, and as a result, there is no way for you to figure out if the amount is correct. The only way to find out is to pick up the phone and call the health insurance and/or the hospital, which is difficult for foreigners at first, but you will have to learn it, if you want to save. One step is to not pay these bills for at least a few months, so someone in the hospital has to look at them at least quickly. <br />
** Some explanations of PPO versus HMO http://www.pamf.org/forpatients/billing/healthplans_coverage.html<br />
** Do *not* take the Dental HMO plan! The PPO plan costs you the same and you will be able to choose your dentist! The HMO dentists are at Watsonville, San Jose and Los Gatos, impossible to reach without a car, and you cannot choose one yourself, if you don't like your assigned dentist.<br />
* Use the map to get to the University ID office, they will print your ID card (it is located at the intersection of Steinhart / Hager roads) see also campus map http://interactivemap.ucsc.edu/<br />
* Go to see Al McGuire in the Baskin Engineering building (the old building just in front) in room 399 amcguire@soe.ucsc.edu, open from 1pm to 3pm, to get your card activated. Take the bridge, go upstairs one floor, it's in the big floor that crosses the whole building, on the right hand side when looking from Engineering 2<br />
* If you plant to cycle to work, ask him for a key to the showers (Eng, ground floor, next to bathrooms)<br />
* Once HR has entered your SSN into their system, one day later, you will be able to use the ucop website "atyourservice [https://atyourserviceonline.ucop.edu/ayso/] <br />
** You CAN apply for a direct salary transfer here, even though HR won't believe it, do this immediately, otherwise you will get a check with your salary.<br />
*** You can cash them in for free at Wells-Fargo, they will very very aggressively try to sign you up for a bank account. Don't believe them that you need to sign up for them to cash in the check, they supposedly have an agreement with the University to do it for free (and try to get new bank customers via that route). On the other hand, Wells Fargo is one of the better-organized banks and has decent service. <br />
*** Bank accounts: Be aware that transfers from abroad onto Union (Bay Area or Community Credit Unions) are quite difficult as they don't have Swift/BIC codes. <br />
** You need to enter your SSN without hyphens and your birthdate as mmddyy when you create your login<br />
** Sign up immediately for the health plan, you have only 30 days to do this and you can do it via ucop<br />
** Make sure that your address in ucop is correct: The health company will send you insurance card to it<br />
* You should have an SOE account / SOE email address, this should have been done automatically via the CBSE admin office<br />
* You will receive cruzID registration emails <br />
** define your passwords<br />
** redirect your cruzId emails to your SOE account<br />
* Order office supplies from gopalace.com, copy shopping cart into email and send to Danielle<br />
* Note the [https://cbse.soe.ucsc.edu/cbseinternal intranet webpage of CBSE], it allows you to books rooms and download reimbursement forms.<br />
<br />
== Travel ==<br />
* Get travel insurance for free from http://www.ucop.edu/riskmgt/uctrips/<br />
* Get travel reimbursement forms from https://cbse.soe.ucsc.edu/travel<br />
* If done early enough, there is usually no need to pay out of pocket for the University. Just ask fill out the travel advance request form.<br />
* The UCSC pretrip guide is actually not that bad: https://financial.ucsc.edu/Pages/Travel_PreTripGuide.aspx#before<br />
* you have only 45 days after your trip to get all paperwork to the UCSC office. Keep in mind that handing it in to CBSE is not enough, it has to be at the University's office. There are no exceptions, if you miss the deadline, you'll have to pay for the conference yourself. Double-check with the CBSE office and get a confirmation that the forms were forwarded, especially for bigger expenses.<br />
** UCSC will not pay for flexible ticket surcharges, e.g. when they allow you to rebook at a cheaper rate for a small fee<br />
** You can get driven to the airport by a friend and give them the reimbursement (around 75$ for SFO), as this is usually cheaper for the University than paying for your parking or paying your shuttle <br />
* federal grants have a requirement to use US airlines (Fly America Act), but this is almost a non-issue anymore when you fly to Japan or to Europe, as all Japanese and European carriers are accepted under the "open skies" agreement. Also, if any leg of the flight is code-shared with a US airline, this is also fine.<br />
* the recommended way to book travel is through [https://login.ucsc.edu/idp/profile/Shibboleth/SSO?shire=https://ucsso.travelprefs.com/Shibboleth.sso/SAML/POST&target=https://ucsso.travelprefs.com/profiler&providerId=https://ucsso.travelprefs.com UC Connexxus]. It's more expensive than the cheapest flight/hotel/car on Kayak, but the paperwork will be a looot easier and the booking includes some niceties, like automatic travel insurance signup, maximum rental car coverage and a few upgrades here and there.<br />
** Especially useful for rental cars<br />
** Booking though Connexus may include the "seat charges" some airlines now make you pay to get an aisle or window seat<br />
<br />
== Meetings and groups ==<br />
* Subscribe to any of these mailing lists (genecats and staff is recommended, other depending on your preferences)<br />
** genecats: https://lists.soe.ucsc.edu/mailman/listinfo/genecats<br />
** Staff: https://lists.soe.ucsc.edu/mailman/listinfo/staff<br />
** Genome reconstruction aka "David group meeting": http://lists.bx.psu.edu/options/rec<br />
** Haussler Wetlab aka "Salama group meeting" https://lists.soe.ucsc.edu/mailman/listinfo/hausslerwetlab<br />
** Cancer group: https://lists.ucsc.edu/mailman/listinfo/cancercats<br />
** CompBio (PSB compbio news+Kevin's son's theatre performances) https://lists.soe.ucsc.edu/mailman/listinfo/compbio<br />
* Other email lists here are aliases on the SOE mail system, /var/mail/aliases on the moondance machine, email bob kuhn to subscribe<br />
**browser-qa<br />
**browser-staff aka "Kent group meeting"<br />
**cluster-staff<br />
**encode<br />
**genome-www<br />
**push-request<br />
** these two are managed by the CBSE admin office and you should be subscribed automatically:<br />
**hausslergrads<br />
**hausslerwetlab<br />
* If you work on the browser sourccode, sign up for a redmine account, email Ann Zweig for details. Redmine is the bug tracker used here for internal browser communication<br />
* If you want to modify the browser source code, you need to pass the "git test", be added to the git group and need a pushqueue account, search for git on this wiki and learn how to modify a source file, then talk to Galt to pass your text<br />
* There are the following meetings during the week and their MCs:<br />
** Monday: Browser Staff Group 11 am (Ann Zweig)<br />
** Tuesday: Immuno Journal Club 11 am (Ngan and Hyunsung)<br />
** Wednesday: Wetlab 12:30 (Salama), very very rarely: Genecats 2 pm (Donna), Cancer Group 3:30 pm (Jing)<br />
** ~Wed/Thursday: CGL Group (Benedict), changing schedule<br />
** Friday: Nothing?<br />
<br />
== Technical ==<br />
<br />
* Closest scanner: Front office, second door on the right, just enter your email and press scan<br />
* Out-of-hours Scanner: Get copy card in postdoc room (Daniel's cube?), 2nd floor, media room, enter card, press "scan", enter email address of recipient, press copy button<br />
* Printer: Add the "oops" printer (yes this is the DNS name), select the IPP protocol if asked<br />
** Color printer is in the front office, HP Color LaserJet CP4520 Series, hostname "lollipop"<br />
** Other printers: http://support.soe.ucsc.edu/printing<br />
* Fax: Ask at front desk or see scanner<br />
* Reserve a room: https://cbse.soe.ucsc.edu/conf/reserve/form<br />
<br />
* Clusters<br />
** see [[Where_is_everything]]<br />
<br />
* Account config<br />
** use /cluster/install/utilities/chsh to change your shell to bash<br />
** copy ~hiram/.bashrc.hiram to your homedir and add the line "source .bashrc.hiram" to your own .bashrc file<br />
** Never put anything into your homedir when working on the cluster, create a new data store dir on /hive/users/<youname><br />
** If you have any problems with your account, email cluster-admin@soe.ucsc.edu<br />
** Read [[Cluster Jobs]] and [http://genecats.cse.ucsc.edu/eng/KiloKluster.html "Where is everything"] and [http://genecats.cse.ucsc.edu/eng/parasol.html "Parasol Manual"]<br />
** make sure that you are a member of the "genecats" and "protein" groups<br />
** You might want to add this statement as well to your bashrc, will add tab completion to the para command:<br />
complete -o default -W "create push try shove make check stop chill finished hung slow crashed failed status problems running hippos time recover priority maxJob resetCounts freeBatch showSickNodes clearSickNodes" para<br />
<br />
* Cancer stuff:<br />
** Install http://openvpn.net/index.php/open-source/downloads.html if you're on Windows or Mac<br />
** Get the .key and .cert files from Erich and place them into the openvpn/config directory.<br />
** place the .ovpn file into the same directory, adapt the paths<br />
** open the ovpn file with openvpn<br />
<br />
== Genome Browser ==<br />
** Email cluster-admin and ask them to setup your own browser<br />
** check out the kent source tree via git into /hive/users/<yourname>/kent and run make in lib, hg/lib and then "make cgi"<br />
** create a hg.conf in /usr/local/apache/cgi-bin-<yourname>/ copy an existing one from e.g. /usr/local/apache/cgi-bin-pauline<br />
** modify the line with db.trackDb to read db.trackDb=trackDb,trackDb_<yourname><br />
** modify your ~/.hg.conf: add username and password (ask someone else, anyone who has set up their own browser, to give you the write-access password for hgwdev's mysql), modify the line db.trackDb to point to trackDb_<yourname>, like this: db.trackDb=trackDb_<yourname><br />
** create a new trackDb directory structure in your homedir, something like ~haeussle/usr/trackDb<br />
** add a new track to hg19 trackdb, run "make human" to show the track and check on [http://hgwdev-<yourname>.cse.ucsc.edu http://hgwdev-<yourname>.cse.ucsc.edu] that everything works.<br />
<br />
== Restaurants ==<br />
<br />
Our favorites:<br />
<br />
* Mongolian: Oyunaa's (Seabright), excellent food<br />
* Breakfast: Silver Spoon (Soquel). Walnut Cafe (downtown), Linda's on Seabright.<br />
* Japanese: Kaito (41st), really good Sushi and Ramen. Alternative: Kauboi in Aptos.<br />
* Italian: Tramonti (Seabright), by far the best Pizza. Pasta OK.<br />
* Chinese: Nothing really. Maybe the one next to Kaito.<br />
* Indian: Nothing really. Sitar (Pacific) is OK and cheap.<br />
* Thai: Lots. Not sure. Pacific Thai?<br />
* Downtown: Malabar (veg), India Joe's (pseudo-asian), ...<br />
* Bar: Poet and Patriot? Unsure.<br />
* Mexican: Los Pinos ("upscale") (Pacific) or Los Perriquos (cheaper)<br />
* German: Tyrolian Inn (Ben Lomond) (pretty bad, if you're from Europe, but an experience)</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Jetlag&diff=24859Jetlag2018-06-26T15:49:33Z<p>Max: Created page with "Jetlag in general: * the body's timer is regulated by UV/sunshine and activity * in theory, going to bed early before you leave works, but no one is really able to do this in..."</p>
<hr />
<div>Jetlag in general:<br />
<br />
* the body's timer is regulated by UV/sunshine and activity<br />
* in theory, going to bed early before you leave works, but no one is really able to do this in practice<br />
* going to places with sunshine and/or in summer is easier than to grey places (like England in winter)<br />
* overall it takes around 1 day to get over 1 hour of jetlag, but you can influence this<br />
* the kidney, body heating etc is on the timer, so don't be surprised to feel hot or cold at strange hours and that you have to go the bathroom at unusual times. Always having a sweater, deo and shorts with you can be helpful for these reasons.<br />
* it's easier to go late to bed than to get up early - flying West is much easier than flying East<br />
* the following tips are mostly for going East. <br />
* When going West, you'll go to bed early and wake up early, which is actually quite nice, depending your usual habits <br />
<br />
Jetlag alone:<br />
* do not sleep on the plane, try to go as long as you reasonably can on the first day, e.g. until 7-8pm, then sleep as long as you can<br />
* do not get up right away when you wake up, just relax and stay in the dark. Computer work on the first night is not very efficient anyways<br />
* I have a belief that books are better than the computer screens, in the first night, but YMMV<br />
* as soon as you can reasonably get up, around 5am-6am, go jogging, swimming, any activity to signal to your body that you're awake<br />
* the first afternoon is tough. You can have a quick nap, coffee or just take it easy. Plan work breaks during the first afternoon, if possible.<br />
* at the end of the first day, do a little exercise again and/or get some sun, if possible.<br />
* the worst days are day 2 and 3, as on day 1 you'll be still tired enough from the plane so you'll sleep longer <br />
* with enough exercise and sun, you can reduce the jetlag to 3-4 days<br />
* some people swear by sleeping pills on the plane and melatonin afterwards:<br />
** Sleeping pills do not induce a normal sleep. In addition, using them on a plane means that your body does not move for hours in a sitting position. Not good for your veins and the thrombosis risk.<br />
** Melatonin had no effect in a placebo controlled trial. Just because it follow the circadian rhythm does not mean that it can change the rhythm.<br />
<br />
Jetlag with kids:<br />
* this is the opposite of what you do when traveling alone: you try to NOT get over the jetlag quickly, you just follow the 1-hour-per-day rule and go with the flow<br />
* whenever the kids sleep, at least one parent must also sleep<br />
* the reason is that while you can get over the jetlag quickly yourself, the kids won't, so chances are that you have not slept during the whole day, but the kids will be running around at 3am and you will suffer a lot if you only get 3-4 hours sleep for a few days<br />
* while kids cannot be forced to sleep, they can be convinced to lie in bed and listen to stories, read books, etc. Anything dark without UV at night time will help<br />
* the first day is somewhat easier, depending on well you managed to keep them awake on the plane<br />
* on the second day, they may not sleep at all during the night. This is OK, as long as you napped together with them during the day.<br />
* plan to need warm food during the night. The kids will be hungry. Many places do not have late-night restaurants and your AirBnB kitchen may be empty.<br />
* try to find a hotel where noise at night is not a huge issue. <br />
* I have heard of parents doing stopovers in Barcelona, London and Berlin, because these cities have a nightlife, especially that can be explored in part with kids. I have never done this but it sounds better than getting kicked out of playgrounds by guards at 3am.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Upload_onto_CIRM-01&diff=24854Upload onto CIRM-012018-06-21T21:24:33Z<p>Max: </p>
<hr />
<div>== For CIRM groups ==<br />
<br />
* Ask your UCSC wrangler contact person to set you up with a username and password for the cirmdcm.soe.ucsc.edu sftp upload<br />
* Use an sftp command line client like lftp or simply sftp and connect to sftp://cirmdcm.soe.ucsc.edu, port 6789, using this username<br />
** the "sftp" command is good for basic sftp transfer program of a few files and is installed on most servers and all Mac computers. lftp is using concurrent connections and can be up to 10-20x faster. Lftp is also better in restarting or updating old uploads.<br />
** see below for examples<br />
* If you're unsure which program to use and you want a graphical user interface, not a command line program, a good choice is CyberDuck on MacOS, see image below on how to connect using CyberDuck<br />
* Once you're done with the upload, let your wrangler know<br />
<br />
Example command line tool "sftp", usually installed on most linuxes and osx, to upload the current directory and all subdirectories:<br />
<br />
$ sftp -P 6789 maxSftp@cirmdcm.soe.ucsc.edu<br />
sftp> put -r *<br />
<br />
If this shows "No such file or directory", then your sftp is a bit outdated and it doesn't have the -P option yet, try this instead:<br />
<br />
sftp -oPort=6789 maxSftp@cirmdcm.soe.ucsc.edu<br />
sftp> put -r *<br />
<br />
But sftp is too slow for anything bigger than 100GB. Use lftp for bigger transfers. Here is an example that uploads the current directory and all subdirectories with 10 concurrent connections ("threads"):<br />
<br />
$ lftp sftp://myUsername@cirmdcm.soe.ucsc.edu:6789<br />
> mirror -R --parallel=10<br />
<br />
-R stands for "reverse", a reverse mirror is lftp-speak for "upload".<br />
<br />
Example graphical user interface, Cyberduck on Mac OSX:<br />
<br />
[[File:CyberDuck_CIRM-01.png|400px|Cyberduck screenshot]]<br />
<br />
== For UCSC Wranglers ==<br />
<br />
On cirm01, create the user:<br />
<br />
sudo /data/create-user username<br />
<br />
To change password of user:<br />
<br />
sudo /data/change-password username<br />
<br />
The new user is created with their homedir as:<br />
<br />
/data/sftp/user/incoming<br />
<br />
You have read access to those directories.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Upload_onto_CIRM-01&diff=24853Upload onto CIRM-012018-06-21T21:23:37Z<p>Max: /* For CIRM groups */</p>
<hr />
<div>== For CIRM groups ==<br />
<br />
* Ask your UCSC wrangler contact person to set you up with a username and password for the cirmdcm.soe.ucsc.edu sftp upload<br />
* Use an sftp command line client like lftp or simply sftp and connect to sftp://cirmdcm.soe.ucsc.edu, port 6789, using this username<br />
** the "sftp" command is good for basic sftp transfer program of a few files and is installed on most servers and all Mac computers. lftp is using concurrent connections and can be up to 10-20x faster. Lftp is also better in restarting or updating old uploads.<br />
** see below for examples<br />
* If you're unsure which program to use and you want a graphical user interface, not a command line program, a good choice is CyberDuck on MacOS, see image below on how to connect using CyberDuck<br />
* Once you're done with the upload, let your wrangler know<br />
<br />
Example command line tool "sftp", usually installed on most linuxes and osx, to upload the current directory and all subdirectories:<br />
<br />
$ sftp -P 6789 maxSftp@cirmdcm.soe.ucsc.edu<br />
sftp> put -r *<br />
<br />
If this shows "No such file or directory", then your sftp is a bit outdated and it doesn't have the -P option yet, try this instead:<br />
<br />
sftp -oPort=6789 maxSftp@cirmdcm.soe.ucsc.edu<br />
sftp> put -r *<br />
<br />
Example command line tool "lftp", upload current directory and all subdirectories with 10 concurrent connections ("threads"):<br />
<br />
$ lftp sftp://myUsername@cirmdcm.soe.ucsc.edu:6789<br />
> mirror -R --parallel=10<br />
<br />
-R stands for "reverse", a reverse mirror is lftp-speak for "upload".<br />
<br />
Example graphical user interface, Cyberduck on Mac OSX:<br />
<br />
[[File:CyberDuck_CIRM-01.png|400px|Cyberduck screenshot]]<br />
<br />
== For UCSC Wranglers ==<br />
<br />
On cirm01, create the user:<br />
<br />
sudo /data/create-user username<br />
<br />
To change password of user:<br />
<br />
sudo /data/change-password username<br />
<br />
The new user is created with their homedir as:<br />
<br />
/data/sftp/user/incoming<br />
<br />
You have read access to those directories.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Upload_onto_CIRM-01&diff=24850Upload onto CIRM-012018-06-21T14:06:52Z<p>Max: /* For CIRM groups */</p>
<hr />
<div>== For CIRM groups ==<br />
<br />
* Ask your UCSC wrangler contact person to set you up with a username and password for the cirmdcm.soe.ucsc.edu sftp upload<br />
* Use an sftp command line client like lftp or simply sftp and connect to sftp://cirmdcm.soe.ucsc.edu, port 6789, using this username<br />
** the "sftp" command is good for basic sftp transfer program of a few files and is installed on most servers and all Mac computers. lftp is using concurrent connections and can be up to 10-20x faster. Lftp is also better in restarting or updating old uploads.<br />
** see below for examples<br />
* If you're unsure which program to use and you want a graphical user interface, not a command line program, a good choice is CyberDuck on MacOS, see image below on how to connect using CyberDuck<br />
* Once you're done with the upload, let your wrangler know<br />
<br />
Example command line tool "sftp", usually installed on most linuxes and osx, to upload the current directory and all subdirectories:<br />
<br />
$ sftp -P 6789 maxSftp@cirmdcm.soe.ucsc.edu<br />
sftp> put -r *<br />
<br />
If this shows "No such file or directory", then your sftp is a bit outdated and it doesn't have the -P option yet, try this instead:<br />
<br />
sftp -oPort=24 maxSftp@cirmdcm.soe.ucsc.edu<br />
sftp> put -r *<br />
<br />
Example command line tool "lftp", upload current directory and all subdirectories with 10 concurrent connections ("threads"):<br />
<br />
$ lftp sftp://myUsername@cirmdcm.soe.ucsc.edu:6789<br />
> mirror -R --parallel=10<br />
<br />
-R stands for "reverse", a reverse mirror is lftp-speak for "upload".<br />
<br />
Example graphical user interface, Cyberduck on Mac OSX:<br />
<br />
[[File:CyberDuck_CIRM-01.png|400px|Cyberduck screenshot]]<br />
<br />
== For UCSC Wranglers ==<br />
<br />
On cirm01, create the user:<br />
<br />
sudo /data/create-user username<br />
<br />
To change password of user:<br />
<br />
sudo /data/change-password username<br />
<br />
The new user is created with their homedir as:<br />
<br />
/data/sftp/user/incoming<br />
<br />
You have read access to those directories.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Upload_onto_CIRM-01&diff=24716Upload onto CIRM-012018-04-06T20:02:06Z<p>Max: </p>
<hr />
<div>== For CIRM groups ==<br />
<br />
* Ask your UCSC wrangler contact person to set you up with a username and password for the cirmdcm.soe.ucsc.edu sftp upload<br />
* Use an sftp command line client like lftp or simply sftp and connect to sftp://cirmdcm.soe.ucsc.edu, port 6789, using this username<br />
** the "sftp" command is good for basic sftp transfer program of a few files and is installed on most servers and all Mac computers. lftp is using concurrent connections and can be up to 10-20x faster. Lftp is also better in restarting or updating old uploads.<br />
** see below for examples<br />
* If you're unsure which program to use and you want a graphical user interface, not a command line program, a good choice is CyberDuck on MacOS, see image below on how to connect using CyberDuck<br />
* Once you're done with the upload, let your wrangler know<br />
<br />
Example command line tool "sftp", usually installed on most linuxes and osx, to upload the current directory and all subdirectories:<br />
<br />
$ sftp -P 6789 maxSftp@cirmdcm.soe.ucsc.edu<br />
sftp> put -r *<br />
<br />
Example command line tool "lftp", upload current directory and all subdirectories with 10 concurrent connections ("threads"):<br />
<br />
$ lftp sftp://myUsername@cirmdcm.soe.ucsc.edu:6789<br />
> mirror -R --parallel=10<br />
<br />
-R stands for "reverse", a reverse mirror is lftp-speak for "upload".<br />
<br />
Example graphical user interface, Cyberduck on Mac OSX:<br />
<br />
[[File:CyberDuck_CIRM-01.png|400px|Cyberduck screenshot]]<br />
<br />
== For UCSC Wranglers ==<br />
<br />
On cirm01, create the user:<br />
<br />
sudo /data/create-user username<br />
<br />
To change password of user:<br />
<br />
sudo /data/change-password username<br />
<br />
The new user is created with their homedir as:<br />
<br />
/data/sftp/user/incoming<br />
<br />
You have read access to those directories.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Upload_onto_CIRM-01&diff=24715Upload onto CIRM-012018-04-06T20:01:46Z<p>Max: </p>
<hr />
<div>== For CIRM groups ==<br />
<br />
* Ask your UCSC wrangler contact person to set you up with a username and password for the cirmdcm.soe.ucsc.edu sftp upload<br />
* Use an sftp command line client like lftp or simply sftp and connect to sftp://cirmdcm.soe.ucsc.edu, port 6789, using this username<br />
** the "sftp" command is good for basic sftp transfer program of a few files and is installed on most servers and all Mac computers. lftp is using concurrent connections and can be up to 10-20x faster. Lftp is also better in restarting or updating old uploads.<br />
** see below for examples<br />
* If you're unsure which program to use, a good choice is CyberDuck on MacOS, see image below on how to connect using CyberDuck<br />
* Once you're done with the upload, let your wrangler know<br />
<br />
Example command line tool "sftp", usually installed on most linuxes and osx, to upload the current directory and all subdirectories:<br />
<br />
$ sftp -P 6789 maxSftp@cirmdcm.soe.ucsc.edu<br />
sftp> put -r *<br />
<br />
Example command line tool "lftp", upload current directory and all subdirectories with 10 concurrent connections ("threads"):<br />
<br />
$ lftp sftp://myUsername@cirmdcm.soe.ucsc.edu:6789<br />
> mirror -R --parallel=10<br />
<br />
-R stands for "reverse", a reverse mirror is lftp-speak for "upload".<br />
<br />
Example graphical user interface, Cyberduck on Mac OSX:<br />
<br />
[[File:CyberDuck_CIRM-01.png|400px|Cyberduck screenshot]]<br />
<br />
== For UCSC Wranglers ==<br />
<br />
On cirm01, create the user:<br />
<br />
sudo /data/create-user username<br />
<br />
To change password of user:<br />
<br />
sudo /data/change-password username<br />
<br />
The new user is created with their homedir as:<br />
<br />
/data/sftp/user/incoming<br />
<br />
You have read access to those directories.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Upload_onto_CIRM-01&diff=24636Upload onto CIRM-012018-03-20T20:28:37Z<p>Max: </p>
<hr />
<div>== For CIRM groups ==<br />
<br />
* Ask your UCSC wrangler contact person to set you up with a username and password for the cirmdcm.soe.ucsc.edu sftp upload<br />
* Use an sftp command line client like lftp or simply sftp and connect to sftp://cirmdcm.soe.ucsc.edu, port 6789, using this username<br />
** sftp is good for basic transfer of a few files and is installed on most servers and all Mac computers. lftp is using concurrent connections and can be up to 10-20x faster. Lftp is also better in restarting or updating old uploads.<br />
** see below for examples<br />
* If you're unsure which program to use, a good choice is CyberDuck on MacOS, see image below on how to connect using CyberDuck<br />
* Once you're done with the upload, let your wrangler know<br />
<br />
Example command line tool "sftp", usually installed on most linuxes and osx, to upload the current directory and all subdirectories:<br />
<br />
$ sftp -P 6789 maxSftp@cirmdcm.soe.ucsc.edu<br />
sftp> put -r *<br />
<br />
Example command line tool "lftp", upload current directory and all subdirectories with 10 concurrent connections ("threads"):<br />
<br />
$ lftp sftp://myUsername@cirmdcm.soe.ucsc.edu:6789<br />
> mirror -R --parallel=10<br />
<br />
-R stands for "reverse", a reverse mirror is lftp-speak for "upload".<br />
<br />
Example graphical user interface, Cyberduck on Mac OSX:<br />
<br />
[[File:CyberDuck_CIRM-01.png|400px|Cyberduck screenshot]]<br />
<br />
== For UCSC Wranglers ==<br />
<br />
On cirm01, create the user:<br />
<br />
sudo /data/create-user username<br />
<br />
To change password of user:<br />
<br />
sudo /data/change-password username<br />
<br />
The new user is created with their homedir as:<br />
<br />
/data/sftp/user/incoming<br />
<br />
You have read access to those directories.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Upload_onto_CIRM-01&diff=24635Upload onto CIRM-012018-03-20T17:43:23Z<p>Max: /* For CIRM groups */</p>
<hr />
<div>== For CIRM groups ==<br />
<br />
* Ask your UCSC wrangler contact person to set you up with a username and password for the cirmdcm.soe.ucsc.edu sftp upload<br />
* Use an sftp command line client like lftp or simply sftp and connect to sftp://cirmdcm.soe.ucsc.edu, port 6789, using this username<br />
** see below for an example<br />
* If you're unsure which program to use, a good choice is CyberDuck on MacOS, see image below on how to connect using CyberDuck<br />
* Once you're done with the upload, let your wrangler know<br />
<br />
Example command line tool "lftp", upload current directory and all subdirectories with 10 concurrent connections ("threads"):<br />
<br />
$ lftp sftp://myUsername@cirmdcm.soe.ucsc.edu:6789<br />
> mirror -R --parallel=10<br />
<br />
-R stands for "reverse", a reverse mirror is lftp-speak for "upload".<br />
<br />
Example graphical user interface, Cyberduck on Mac OSX:<br />
<br />
[[File:CyberDuck_CIRM-01.png|400px|Cyberduck screenshot]]<br />
<br />
== For UCSC Wranglers ==<br />
<br />
On cirm01, create the user:<br />
<br />
sudo /data/create-user username<br />
<br />
To change password of user:<br />
<br />
sudo /data/change-password username<br />
<br />
The new user is created with their homedir as:<br />
<br />
/data/sftp/user/incoming<br />
<br />
You have read access to those directories.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Upload_onto_CIRM-01&diff=24634Upload onto CIRM-012018-03-20T17:42:55Z<p>Max: /* For CIRM groups */</p>
<hr />
<div>== For CIRM groups ==<br />
<br />
* Ask your UCSC wrangler contact person to set you up with a username and password for the cirmdcm.soe.ucsc.edu sftp upload<br />
* Use an sftp command line client like lftp or simply sftp and connect to sftp://cirmdcm.soe.ucsc.edu, port 6789, using this username<br />
** see below for an example<br />
* If you're unsure which program to use, a good choice is CyberDuck on MacOS, see image below on how to connect using CyberDuck<br />
* Once you're done with the upload, let your wrangler know<br />
<br />
Example command line tool "lftp", upload current directory and all subdirectories with 10 concurrent connections ("threads"):<br />
<br />
$ lftp sftp://myUsername@cirmdcm.soe.ucsc.edu:6789<br />
> mirror -R --parallel=10<br />
<br />
-R stands for "reverse", a reverse mirror is lftp-speak for "upload".<br />
<br />
<br />
[[File:CyberDuck_CIRM-01.png|400px|Cyberduck screenshot]]<br />
<br />
== For UCSC Wranglers ==<br />
<br />
On cirm01, create the user:<br />
<br />
sudo /data/create-user username<br />
<br />
To change password of user:<br />
<br />
sudo /data/change-password username<br />
<br />
The new user is created with their homedir as:<br />
<br />
/data/sftp/user/incoming<br />
<br />
You have read access to those directories.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Upload_onto_CIRM-01&diff=24633Upload onto CIRM-012018-03-20T17:42:30Z<p>Max: /* For CIRM groups */</p>
<hr />
<div>== For CIRM groups ==<br />
<br />
* Ask your UCSC wrangler contact person to set you up with a username and password for the cirmdcm.soe.ucsc.edu sftp upload<br />
* Use an sftp command line client like lftp or simply sftp and connect to sftp://cirmdcm.soe.ucsc.edu, port 6789, using this username<br />
** see below for an example<br />
* If you're unsure which program to use, a good choice is CyberDuck on MacOS, see image below on how to connect using CyberDuck<br />
* Once you're done with the upload, let your wrangler know<br />
<br />
Example command line tool "lftp":<br />
<br />
$ lftp sftp://myUsername@cirmdcm.soe.ucsc.edu:6789<br />
> mirror -R --parallel=10<br />
<br />
-R stands for "reverse", a reverse mirror is lftp-speak for "upload".<br />
<br />
<br />
[[File:CyberDuck_CIRM-01.png|400px|Cyberduck screenshot]]<br />
<br />
== For UCSC Wranglers ==<br />
<br />
On cirm01, create the user:<br />
<br />
sudo /data/create-user username<br />
<br />
To change password of user:<br />
<br />
sudo /data/change-password username<br />
<br />
The new user is created with their homedir as:<br />
<br />
/data/sftp/user/incoming<br />
<br />
You have read access to those directories.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Upload_onto_CIRM-01&diff=24632Upload onto CIRM-012018-03-20T17:42:15Z<p>Max: /* For CIRM groups */</p>
<hr />
<div>== For CIRM groups ==<br />
<br />
* Ask your UCSC wrangler contact person to set you up with a username and password for the cirmdcm.soe.ucsc.edu sftp upload<br />
* Use an sftp command line client like lftp or simply sftp and connect to sftp://cirmdcm.soe.ucsc.edu, port 6789, using this username<br />
** see below for an example<br />
* If you're unsure which program to use, a good choice is CyberDuck on MacOS, see image below on how to connect using CyberDuck<br />
* Once you're done with the upload, let your wrangler know<br />
<br />
Example command line tool "lftp":<br />
<br />
$ lftp sftp://maxSftp@cirmdcm.soe.ucsc.edu:6789<br />
> mirror -R --parallel=10<br />
<br />
-R stands for "reverse", a reverse mirror is lftp-speak for "upload".<br />
<br />
<br />
[[File:CyberDuck_CIRM-01.png|400px|Cyberduck screenshot]]<br />
<br />
== For UCSC Wranglers ==<br />
<br />
On cirm01, create the user:<br />
<br />
sudo /data/create-user username<br />
<br />
To change password of user:<br />
<br />
sudo /data/change-password username<br />
<br />
The new user is created with their homedir as:<br />
<br />
/data/sftp/user/incoming<br />
<br />
You have read access to those directories.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Upload_onto_CIRM-01&diff=24631Upload onto CIRM-012018-03-20T17:40:04Z<p>Max: /* For CIRM groups */</p>
<hr />
<div>== For CIRM groups ==<br />
<br />
* Ask your UCSC wrangler contact person to set you up with a username and password for the cirmdcm.soe.ucsc.edu sftp upload<br />
* Use an sftp command line client like lftp or simply sftp and connect to sftp://cirmdcm.soe.ucsc.edu, port 6789, using this username<br />
* If you're unsure which program to use, a good choice is CyberDuck on MacOS, see image below on how to connect using CyberDuck<br />
* Once you're done with the upload, let your wrangler know<br />
<br />
[[File:CyberDuck_CIRM-01.png|400px|Cyberduck screenshot]]<br />
<br />
== For UCSC Wranglers ==<br />
<br />
On cirm01, create the user:<br />
<br />
sudo /data/create-user username<br />
<br />
To change password of user:<br />
<br />
sudo /data/change-password username<br />
<br />
The new user is created with their homedir as:<br />
<br />
/data/sftp/user/incoming<br />
<br />
You have read access to those directories.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Upload_onto_CIRM-01&diff=24630Upload onto CIRM-012018-03-20T17:39:44Z<p>Max: /* For CIRM groups */</p>
<hr />
<div>== For CIRM groups ==<br />
<br />
* Ask your UCSC wrangler contact person to set you up with a username and password for the cirmdcm.soe.ucsc.edu sftp upload<br />
* Use an sftp command line client like lftp or simply sftp and connect to sftp://cirmdcm.soe.ucsc.edu, port 6789, using this username<br />
* If you're unsure which program to use, a good choice is CyberDuck on MacOS, see image below on how to connect using CyberDuck<br />
<br />
[[File:CyberDuck_CIRM-01.png|400px|Cyberduck screenshot]]<br />
<br />
== For UCSC Wranglers ==<br />
<br />
On cirm01, create the user:<br />
<br />
sudo /data/create-user username<br />
<br />
To change password of user:<br />
<br />
sudo /data/change-password username<br />
<br />
The new user is created with their homedir as:<br />
<br />
/data/sftp/user/incoming<br />
<br />
You have read access to those directories.</div>Maxhttps://genomewiki.ucsc.edu/index.php?title=File:CyberDuck_CIRM-01.png&diff=24629File:CyberDuck CIRM-01.png2018-03-20T17:39:00Z<p>Max: </p>
<hr />
<div></div>Maxhttps://genomewiki.ucsc.edu/index.php?title=Upload_onto_CIRM-01&diff=24628Upload onto CIRM-012018-03-20T17:38:29Z<p>Max: /* For CIRM groups */</p>
<hr />
<div>== For CIRM groups ==<br />
<br />
* Ask your UCSC wrangler contact person to set you up with a username and password for the cirmdcm.soe.ucsc.edu sftp upload<br />
* Use an sftp command line client like lftp or simpley sftp and connect to sftp://cirmdcm.soe.ucsc.edu, port 6789, using this username<br />
* If you're unsure which program to use, a good choice is CyberDuck on MacOS, see image below on how to connect using CyberDuck<br />
<br />
[[File:CyberDuck_CIRM-01.png|400px|Cyberduck screenshot]]<br />
<br />
== For UCSC Wranglers ==<br />
<br />
On cirm01, create the user:<br />
<br />
sudo /data/create-user username<br />
<br />
To change password of user:<br />
<br />
sudo /data/change-password username<br />
<br />
The new user is created with their homedir as:<br />
<br />
/data/sftp/user/incoming<br />
<br />
You have read access to those directories.</div>Max