Training new Browser Engineers: Difference between revisions

From Genecats
Jump to navigationJump to search
(Links changed from .soe or .cse to .gi)
 
(10 intermediate revisions by 3 users not shown)
Line 5: Line 5:
on the cluster of computing machines.
on the cluster of computing machines.


parasol links:
There is a full page of parasol information and examples here: [[Parasol_how_to]]
 
This is the up-to-date parasol documentation
 
http://genecats.soe.ucsc.edu/eng/parasol.html
 
Please read it! I am not going to cover all that
information here.
 
Google returns this OLD link which is now out of date
and should NOT be used:
DO NOT USE! http:/users.soe.ucsc.edu/~donnak/eng/parasol.htm DO NOT USE!
 
A nice page written by Hiram on Cluster Jobs:
http://genomewiki.ucsc.edu/index.php/Cluster_Jobs
 
parasol is used in many established pipelines and genome-browser build
procedures.
<br>
<br>
parasol is the cluster job-running software written by Jim Kent around 2002
when Condor and other compute cluster software proved to be too slow.
<br>
<br>
Described as "embarrassingly parallelizable", genomics data
is often the perfect target for parallelization since
you can just chop up the input data into pieces and
then run a cluster job against each piece (or pair of pieces),
and then combine the final results.
There is software developed here for dividing genomes and FASTA and other
kinds on inputs. 
 
Sometimes there are also routines
for lifting the input or output from one coordinate system (e.g. single-chunk)
to another (e.g. its place on the full chromosome).
Data broken into blocks may simply divided into each pieces,
but sometimes the pieces are designed to overlap and then
the results are chained back together at the end.
 
Because of variability in input data, e.g. the amount or repeats,
or some other feature, some jobs have chunks of data that will end up running
quite a bit longer than others.  Jim Kent recommends
dividing the chunks up smaller to reduce the amount of variation
in job runtime.  Ideally the average runtime for a job would be
just a few minutes. parasol can handle quite a large input queue,
with a large number of jobs in a batch,
so do not be afraid to divide your work into smaller pieces if needed.
It also helps the cluster run more efficiently, and reach equilibrium more quickly,
with shorter-running jobs.
 
CRITICAL for performance and good cluster citizenship:
If you have jobs that do much input/output, copy the data needed
for them to the cluster nodes in /scratch/ so that they will be doing
mostly local IO.  Your job can copy the final result files back to your
batch directory when done.  Not doing this can result in slowness
or crashing of the cluster, the storage system, or even users home directories.
 
Another gotcha to avoid with parasol and cluster jobs, do not run your
batch and data from your home directory, as this may cause the
NFS server that supports the homedirs to crash or become unusably slow.
Instead, your parasol batches and their results should be located
on the /hive file system.  Typically for personal work or research,
use /hive/users/$USER.  For production browser work such as creating
data tracks, use /hive/data/genomes/$assembly/.
The makedocs in kent/src/hg/makeDb/doc/ have many examples.
 
The current cluster as of 2015-09-18 is "ku" and it has 32 machines or nodes,
each with 32 CPUs and 256GB ram which is 1024 CPUs total.
 
The original KiloKluster had 1024 single CPU machines.
There have been many clusters since then, and a general trend
to having fewer machines with more CPUs and RAM on a machine with
each generation. 
Now dead and gone heroic clusters of the past:
<pre>
kk
kk2
kk9
kki
encodek
pk
swarm
</pre>
 
Before starting as a new engineer, make sure that your umask is 2,
and that you can automatically login via SSH with no passphrase to
ku:
 
ssh ku
 
If the server cert has changed, you will get a warning that is trying
to protect you from man-in-the-middle attacks. Just delete the old
entry from $HOME/.ssh/knownhosts if needed.
 
Usually the parasol-system related files are installed in /parasol
on both the hub and nodes. However only the admins will have access
to changing the cluster, adding or removing machines, or other config changes.
 
parasol list batches is used so often that
you may wish to make an alias in your shell config:
alias plb "/parasol/bin/parasol list batches"
 
It may be handy to have /parasol/bin/ on your path.
 
To use the cluster you must also be a member of the unix group "kluster_access".
Approval from Jim is generally required for this, followed by a cluster_admin email request.
 
Because the cluster hub, e.g. "ku", and the machines or "nodes" in the cluster
mount both your home dirs and the /hive file system, it is easy to switch
back and from hgwdev to various machines and work.
 
parasol nodes run your jobs as you, if the parasol cluster has been correctly
started as root.  Each job in the batch is run from the shell, but the shell
may have different paths than your home shell.
If your job's script cannot find a program, you might need to specify the full path.
 
Temporary files created on the nodes will be deleted after they reach a certain age,
however, if you can have your own job cleanup unneeded files, that is helpful.
 
Here are some simple hands-on examples:
 
----
 
Example 1: Simple test jobs, every thing runs OK.
 
<pre>
ssh ku
cd /hive/users/$USER
mkdir testParasol1
cd testParasol1
</pre>
 
See what is already running:
parasol list batches
 
paraTestJob is a simple test program that comes with parasol.
It uses up CPU for 1/10 of a second for each count given.
So 99 will increment an integer for 9.9 seconds and then exit(0) normally.
We will make a jobList that repeats that simple test command many times.
 
echo "paraTestJob 99" | awk '{for(i=0;i<13320;i++)print}' > jobList
 
See the resulting jobList
head jobList
<pre>
paraTestJob 99
paraTestJob 99
paraTestJob 99
paraTestJob 99
paraTestJob 99
paraTestJob 99
paraTestJob 99
paraTestJob 99
paraTestJob 99
paraTestJob 99
</pre>
 
parasol is batch-oriented, at least the para client
which is used for most work here.
(jobTree written by Benedict Paten in python is unable
to use the usual "para" client because his dynamic job requests
are not known ahead of time, and are created dynamically
based on the output of previous jobs.  jobTree submits jobs to the hub directly.)
 
Now create your parasol batch FROM the jobList.
parasol and the "para" client will take care
of submitting all of the jobs in your batch
until they have all been run successfully.
But jobs are not added or removed from the batch.
 
para create jobList
 
This will create a file called "batch" which contains
one line for each line in jobList, plus it stores other
meta information like jobIds, results of run, results of
checking output, etc.
 
<pre>
[ku.sdsc.edu:testParasol1> ll
total 448
-rw-rw-r-- 1 galt genecats 652680 Sep 18 11:49 batch
-rw-rw-r-- 1 galt genecats 652680 Sep 18 11:49 batch.bak
drwxrwxr-x 2 galt genecats    512 Sep 18 11:49 err
-rw-rw-r-- 1 galt genecats 199800 Sep 18 11:48 jobList
-rw-rw-r-- 1 galt genecats      2 Sep 18 11:49 para.bookmark
 
 
The batch.bak is a backup file of batch.
err/ is a directory where para problems runs gather
the stderr of failed jobs.
jobList is the original list of jobs to put in the batch.
para.bookmark is a file that remembers how much of the results
file we have seen so far to speed up processing a little.
</pre>
 
Push the first 10 jobs to make sure the jobs are working
and do not have a bug.
 
para try
 
<pre>
plb
#user    run  wait  done crash pri max cpu  ram  plan min batch
galt        4      6      0    0  10  -1  1  8.0g  341  0 /hive/users/galt/testParasol1/
 
This shows that galt has run 4 jobs out of 10, 6 are still queued,
0 have crashed so far, and priority is default 10.
(1 is the "highest" priority and should not be used with permission.)
max = -1 means that there is no maximum number of nodes to use for this batch.
cpu = 1 means that the batch is using the default number of CPUs, which is 1.
ram = 8.0g means that it is using the default amount of RAM, 8GB which is 256GB/32CPUs.
plan = 341 means that the parasol planner will give me 341 default-size job slots,
if it runs long enough to reach equilibrium.
min = 0 shows the average number of minutes jobs in the batch are taking.
batch path is given so you can find your directory.
</pre>
 
<pre>
para check
[ku.sdsc.edu:testParasol1> para check
13320 jobs in batch
44182 jobs (including everybody's) in Parasol queue or running.
Checking finished jobs
unsubmitted jobs: 13310
ranOk: 10
total jobs in batch: 13320
</pre>
 
This shows that my jobs are not crashing, the 10 from para try ran ok.
 
At this point, we can just push the rest of the jobs manually.
Or, we can use para shove which will automatically repush failed
jobs up to the default of 4 retries.
 
<pre>
para push
plb
[ku.sdsc.edu:testParasol1> plb
#user    run  wait  done crash pri max cpu  ram  plan min batch
galt      316  2716  10288    0  10  -1  1  8.0g  341  0 /hive/users/galt/testParasol1/
</pre>
Over a few minutes my running, my running count has increased and almost
reached the plan count of 341, at which point it will reach "equilibrium",
where everybody has their fair share and the cluster is being used efficiently,
nothing is waiting on other jobs/batches to finish to free up resources on nodes.
 
All jobs finished now:
 
<pre>
[ku.sdsc.edu:testParasol1> para check
13320 jobs in batch
40812 jobs (including everybody's) in Parasol queue or running.
Checking finished jobs
ranOk: 13320
total jobs in batch: 13320
 
[ku.sdsc.edu:testParasol1> para time
13320 jobs in batch
40860 jobs (including everybody's) in Parasol queue or running.
Checking finished jobs
Completed: 13320 of 13320 jobs
CPU time in finished jobs:      23817s    396.96m    6.62h    0.28d  0.001 y
IO & Wait Time:                33602s    560.03m    9.33h    0.39d  0.001 y
Average job time:                  4s      0.07m    0.00h    0.00d
Longest finished job:              9s      0.15m    0.00h    0.00d
Submission to last job:          806s      13.43m    0.22h    0.01d
Estimated complete:                0s      0.00m    0.00h    0.00d
</pre>
You can run para time both to get an estimate of when your batch will be finished,
and also add to documents in the makeDb/doc/ how much time the run took,
which can be useful when people can compare to previous or future runs.
 
parasol can detect a problem or bug that makes all of your
jobs fail in a row, and thus automatically chill your batch,
flushing your batch's queue on the hub.  This means that you
will not waste cluster resources on a "sick" batch that will
fail. Why run 100,000 failing jobs?
 
parasol can also detect sick nodes in which all jobs from multiple batches
are failing.
 
 
----
 
Example 2: more testing, with Crashes
 
cd /hive/users/$USER
 
mkdir testParasol2
cd testParasol2
 
<pre>
Edit to create a new job list like this:
vi jobList
</pre>
<pre>
paraTestJob 00
paraTestJob 01
cat /this/file/name/probably/does/not/exist
paraTestJob 02
paraTestJob 03
paraTestJob 55 -input=spec
paraTestJob 04
paraTestJob 66 -crash
paraTestJob 77 -output=77.out
</pre>
 
This time we see that when we run paraTestJob with the -crash parameter,
it will cause the job to crash. Also, the "cat" statement will
fail because the file does not exist.  So we can test what happens
when things do NOT finish normally.
 
para create jobList
para try
plb
para check
 
<pre>
[ku.sdsc.edu:testParasol2> para check
9 jobs in batch
33118 jobs (including everybody's) in Parasol queue or running.
Checking finished jobs
crashed: 3
ranOk: 6
total jobs in batch: 9
</pre>
 
para push
 
para problems
 
<pre>
[ku.sdsc.edu:testParasol2> para problems
9 jobs in batch
32915 jobs (including everybody's) in Parasol queue or running.
Checking finished jobs
job: cat /this/file/name/probably/does/not/exist
id: 377753997
failure type: crash
host: ku-1-04.local
start time: Fri Sep 18 12:33:02 2015
return: 1
stderr:
cat: /this/file/name/probably/does/not/exist: No such file or directory
 
job: paraTestJob 55 -input=spec
id: 377754000
failure type: crash
host: ku-1-04.local
start time: Fri Sep 18 12:33:03 2015
return: 255
stderr:
Couldn't open spec , No such file or directory
 
job: paraTestJob 66 -crash
id: 377754002
failure type: crash
host: ku-1-11.local
start time: Fri Sep 18 12:33:03 2015
return: signal 11
stderr:
 
3 problems total
[ku.sdsc.edu:testParasol2> para problems
9 jobs in batch
32915 jobs (including everybody's) in Parasol queue or running.
Checking finished jobs
job: cat /this/file/name/probably/does/not/exist
id: 377753997
failure type: crash
host: ku-1-04.local
start time: Fri Sep 18 12:33:02 2015
return: 1
stderr:
cat: /this/file/name/probably/does/not/exist: No such file or directory
 
job: paraTestJob 55 -input=spec
id: 377754000
failure type: crash
host: ku-1-04.local
start time: Fri Sep 18 12:33:03 2015
return: 255
stderr:
Couldn't open spec , No such file or directory
 
job: paraTestJob 66 -crash
id: 377754002
failure type: crash
host: ku-1-11.local
start time: Fri Sep 18 12:33:03 2015
return: signal 11
stderr:
 
3 problems total
</pre>
 
para.results is created by the hub running as root.
It contains one line for each job that ran,
and contains result of execution or error, the node it ran on,
the runtime, etc.  para.bookmark keeps track of which
results have already been updated into your batch file
and do not need checking again. If you delete para.bookmark,
it will simply re-process the entire para.results again
and create a new bookmark.
<pre>
[ku.sdsc.edu:testParasol2> head para.results
256 ku-1-04.local 377753997 cat 0 0 1442604781 1442604782 1442604783 galt /tmp/para377753997.err 'cat /this/file/name/probably/does/not/exist'
0 ku-1-19.local 377753999 paraTestJob 5 0 1442604781 1442604782 1442604783 galt /tmp/para377753999.err 'paraTestJob 03'
0 ku-1-16.local 377753998 paraTestJob 3 0 1442604781 1442604782 1442604784 galt /tmp/para377753998.err 'paraTestJob 02'
0 ku-1-11.local 377753996 paraTestJob 1 0 1442604781 1442604782 1442604784 galt /tmp/para377753996.err 'paraTestJob 01'
0 ku-1-25.local 377754001 paraTestJob 7 0 1442604781 1442604783 1442604784 galt /tmp/para377754001.err 'paraTestJob 04'
0 ku-1-02.local 377753995 paraTestJob 0 0 1442604781 1442604782 1442604787 galt /tmp/para377753995.err 'paraTestJob 00'
11 ku-1-11.local 377754002 paraTestJob 117 0 1442604781 1442604783 1442604787 galt /tmp/para377754002.err 'paraTestJob 66 -crash'
65280 ku-1-04.local 377754000 paraTestJob 0 0 1442604781 1442604783 1442604787 galt /tmp/para377754000.err 'paraTestJob 55 -input=spec'
0 ku-1-14.local 377754003 paraTestJob 140 0 1442604781 1442604783 1442604788 galt /tmp/para377754003.err 'paraTestJob 77 -output=77.out'
</pre>
 
paraTestJob 77 -output=77.out
 
This causes it to create a small output file called 77.out
 
paraTestJob 55 -input=spec
 
This causes it to read file "spec" which did not actually exist here,
so it caused an error.
 
 
----
 
Example 3: more testing, with gensub2
 
gensub2 is a handy utility for creating a parasol jobList with
many lines by applying a template to one or two lists.
Refer to the parasol document listed at the top of this page for details.
 
cd /hive/users/$USER
 
mkdir testParasol3
cd testParasol3
 
Make an output dir out/
 
mkdir out
 
Make a template
<pre>
vi gsub-template
</pre>
<pre>
#LOOP
paraTestJob $(path1) -output={check out line+ out/job$(num1).out}
#ENDLOOP
</pre>
 
Clearly $(path1) will refer to the each line in input.list.
$(num1) refers to the line # of the input.list line being read.
 
vi input-list
<pre>
99
24
65
76
34
23
11
1
2
3
77
34
56
9
</pre>
 
Usually this is a list of file-paths, not numbers.
 
Now that we have a template and a data list,
we can create the jobList
 
gensub2 input.list single gsub-template jobList
 
This takes each line of input.list and applies
it to the template, substituting whatever special substitution words
you have used.
 
<pre>
[ku.sdsc.edu:testParasol3> head jobList
paraTestJob 99 -output={check out line+ out/job0.out}
paraTestJob 24 -output={check out line+ out/job1.out}
paraTestJob 65 -output={check out line+ out/job2.out}
paraTestJob 76 -output={check out line+ out/job3.out}
paraTestJob 34 -output={check out line+ out/job4.out}
paraTestJob 23 -output={check out line+ out/job5.out}
paraTestJob 11 -output={check out line+ out/job6.out}
paraTestJob 1 -output={check out line+ out/job7.out}
paraTestJob 2 -output={check out line+ out/job8.out}
paraTestJob 3 -output={check out line+ out/job9.out}
</pre>
 
If you want to use a second file input list, simply
place it where the word "single" appears above.
 
Note that we are showing off a parasol "para" feature,
that it can automatically check that the output exists and contains
at least one line.
 
para create jobList
para try
para check
para shove
 
Wait for it to finish.
Or hit Ctrl-C to return to commandline for
further para client commands, but automatic shoving will stop.
You can either shove or push again as needed.
 
para check
para time
 
Note that if we had stuck a line with "0" in input.list above,
paraTestJob 0 -output=out/N.out
would create an empty output file,
and then para client would complain and show it
as a failure because of the check output clause in the template, jobList, and batch.
 
<pre>
[ku.sdsc.edu:testParasol3> para check
14 jobs in batch
23005 jobs (including everybody's) in Parasol queue or running.
Checking finished jobs
ranOk: 14
total jobs in batch: 14
</pre>


== software engineering best practices ==
== software engineering best practices ==
Line 637: Line 104:


* Jim's Software Sermon (2002): [http://genomewiki.ucsc.edu/images/7/7c/SoftwareSermon2002_Jim_Kent.ppt PPT] [http://www.cse.ucsc.edu/~donnak/eng/softsermon2002.htm HTML]
* Jim's Software Sermon (2002): [http://genomewiki.ucsc.edu/images/7/7c/SoftwareSermon2002_Jim_Kent.ppt PPT] [http://www.cse.ucsc.edu/~donnak/eng/softsermon2002.htm HTML]
* Jim's brief CGI programming intro (2007) - see esp. pages 6 & 7 about the cart [https://hgwdev.sdsc.edu/~kent/cgiProgramming.ppt PPT]
* Jim's brief CGI programming intro (2007) - see esp. pages 6 & 7 about the cart [https://hgwdev.gi.edu/~kent/cgiProgramming.ppt PPT]
* Jim's Software Engineering & testing presentation (2012): [http://genomewiki.ucsc.edu/images/c/c7/SoftwareEngTesting.pptx PPTX]
* Jim's Software Engineering & testing presentation (2012): [http://genomewiki.ucsc.edu/images/c/c7/SoftwareEngTesting.pptx PPTX]
* Jim's Locality & Modularity presentation (2012) [https://hgwdev.sdsc.edu/~kent/locality.pptx PPTX]
* Jim's Locality & Modularity presentation (2012) [https://hgwdev.gi.edu/~kent/locality.pptx PPTX]


Others' overviews of kent/src:
Others' overviews of kent/src:
Line 647: Line 114:
* Angie's presentation to Bejerano Lab about working with and extending CGIs & utils (2008): [http://genomewiki.ucsc.edu/images/b/b5/Bejerano_Lab_2008_03_31.ppt PPT]
* Angie's presentation to Bejerano Lab about working with and extending CGIs & utils (2008): [http://genomewiki.ucsc.edu/images/b/b5/Bejerano_Lab_2008_03_31.ppt PPT]


== GBiB under the covers ==
== GBiB, release process, development tools ==
Trainer: Max Haeussler
Trainer: Max Haeussler
* "make search" in kent/src
* xcode anyone?
* [http://genomewiki.ucsc.edu/genecats/index.php/Genome_Browser_in_a_Box_config] gbib page
* [http://genomewiki.ucsc.edu/index.php/It%27s_a_long_way_to_the_RR release process]
* [http://genomewiki.ucsc.edu/index.php/Debugging_cgi-scripts debugging CGIs]


== debugging tools ==
== debugging tools ==
Line 654: Line 127:


Read the genomewiki page [http://genomewiki.ucsc.edu/index.php/Debugging_cgi-scripts Debugging cgi-scripts]
Read the genomewiki page [http://genomewiki.ucsc.edu/index.php/Debugging_cgi-scripts Debugging cgi-scripts]
== how not to break the build! ==
Trainer: Brian Raney


== background reading material ==
== background reading material ==

Latest revision as of 19:47, 24 September 2018

parasol

Trainer: Galt Barber

parasol is software for running batches of jobs on the cluster of computing machines.

There is a full page of parasol information and examples here: Parasol_how_to

software engineering best practices

Trainer: Kate Rosenbloom

Recommended reading:

  • The Art of UNIX Programming , second part of Chapter 1 (Basics of the Unix Philosophy, Eric Raymond). pp. 11-27
  • Beautiful Code, Chapter 13 (Design of the Gene Sorter, Jim Kent). pp. 217-228


Some useful acronyms:

  • DRY programming (also OAOO) - Don't Repeat Yourself, Once & Only Once
  • YAGNI - You Ain't Gonna Need It

expressions:

  • Software rot, bit decay
  • Technical debt

and aphorisms:

  • Do the simplest thing that can possibly work (Ward Cunningham, Extreme Programming)
  • Any fool can write code that a computer can understand. Good programmers write code that humans can understand. (Martin Fowler, Refactoring)


Some philosophers of good practice:

  • Brian Kernighan, PJ Plauger, Rob Pike, Yourdon (early UNIX)
  • Martin Fowler, Ward Cunningham, Kent Beck (XP crowd)
  • Christopher Alexander, Eric Gamma, Gang of Four (Design Patterns)
  • 'Uncle Bob' Martin

machine layout, clusters, data flow, etc

Trainer: Hiram Clawson

loading a track, making an assembly

Trainer: Hiram Clawson

C libraries and GB code gotchas

Trainer: Jim Kent

kent src coding standards & libraries overview

Trainer: Angie Hinrichs

When we write library code or CGI code, that code will (hopefully) be in use for many years, and readability is extremely important because somebody else may need to debug or extend your code some day. Code is easier to read if it looks like it was written by one person as opposed to a jumble of different styles and indentation rules. Try to make your code look like Jim wrote it, and readers will thank you (or at least refrain from cursing you ;).

Basics of kent/src C formatting conventions:

  • multi-word variable or function names: camelCase, camel123Case (numbers and acronyms treated as words)
  • tab = 8 spaces
  • indentation = 4 spaces
  • opening brace { is indented on next line (not at end of line)
  • function declaration is followed by a brief comment describing inputs, outputs, gotchas like memory allocation details

If you use emacs, get Chuck Sugnet's jkent-c.emacs and add this to your ~/.emacs:

(load-file "~/jkent-c.emacs")

Higher-level conventions:

  • try to keep functions short enough to view in one editor screenful
  • empty lines between function declarations (but rarely within functions)
  • define functions before their first use in the file (to avoid duplicate declarations)
  • comment sparingly -- don't repeat what the code says, but say why you're doing something non-obvious
  • errAbort when there's an error condition -- much easier to find bugs that way

Get to know src/inc/common.h well (and common.c), and use its utilities and error-checking wrappers around C lib functions:

  • strcpy --> safecpy, sprintf --> safef, strncpy --> safencpy, strcat --> safecat
  • malloc --> needMem (and variants)
  • free --> freez, freeMem
  • memcpy --> cloneString, cloneStringZ, cloneMem
  • read --> mustRead, similar for write and close
  • verbose() for debugging comments
  • string utilities: sameString, startsWith*, stringIn, wildMatch, count*, chop*
  • sl* functions for list operations
  • sl* variants: slName, slRef, slPair

Other absolutely fundamental src/{inc,lib} modules include:

  • dyString: dynamically allocated, expandable strings
  • errAbort: context-specific handling of warnings and errors
  • hash: associate strings with any type of data
  • linefile: read in a file (or URL! and automatically decompress compressed files!) line by line
  • obscure: despite the name, this companion to common.h holds a lot of useful util functions
  • options: command-line option parsing

Below are several slide sets that provide a good introduction to Jim's programming philosophy and other local conventions:

  • Jim's Software Sermon (2002): PPT HTML
  • Jim's brief CGI programming intro (2007) - see esp. pages 6 & 7 about the cart PPT
  • Jim's Software Engineering & testing presentation (2012): PPTX
  • Jim's Locality & Modularity presentation (2012) PPTX

Others' overviews of kent/src:

  • Hiram's BME 230 presentation on GB, TB & kent/src (2011) - see esp. pages 5-20 PPT
  • Robert Baertsch's BME 230 presentation (2008): PPT
  • Angie's presentation to Bejerano Lab about working with and extending CGIs & utils (2008): PPT

GBiB, release process, development tools

Trainer: Max Haeussler

debugging tools

Trainers: Angie Hinrichs & Max Haeussler

Read the genomewiki page Debugging cgi-scripts

how not to break the build!

Trainer: Brian Raney


background reading material

hgFindSpec: http://genomewiki.cse.ucsc.edu/index.php/HgFindSpec

Our csh - bash equivalence document: $HOME/kent/src/hg/doc/bashVsCsh.txt

VI: http://genomewiki.ucsc.edu/genecats/index.php/VI_quick_start

Cluster Jobs: http://genomewiki.ucsc.edu/index.php/Cluster_Jobs