DoBlastzChainNet.pl: Difference between revisions
(adding working directory) |
|||
Line 60: | Line 60: | ||
This entire discussion assumes the [https://en.wikipedia.org/wiki/Bash_(Unix_shell) bash shell] | This entire discussion assumes the [https://en.wikipedia.org/wiki/Bash_(Unix_shell) bash shell] | ||
is the user's unix shell. | is the user's unix shell. | ||
==Working directory hierarchy== | |||
It is best to organize your work in a directory hierarchy. For example maintain all your | |||
genome sequences in: | |||
/data/genomes/ | |||
/data/genomes/hg38/ | |||
/data/genomes/mm10/ | |||
/data/genomes/dm6/ | |||
/data/genomes/ce11/ | |||
... etc ... | |||
Where those database directories can have the '''2bit''' files, chrom.sizes, and | |||
track construction directories, for example: | |||
/data/genomes/dm6/dm6.2bit | |||
/data/genomes/dm6/dm6.chrom.sizes | |||
/data/genomes/dm6/trackData/ | |||
Such organizations are a personal preference custom. However you do this, keep | |||
it consistent to make it easier to use scripts on multiple sequences. | |||
[[Category:Cluster FAQ]] | [[Category:Cluster FAQ]] | ||
[[Category:Technical FAQ]] | [[Category:Technical FAQ]] |
Revision as of 02:50, 6 April 2018
Prerequisites
This discussion assumes you are familiar with Unix shell command line programming and scripting. You will be encountering and interacting with csh/tcsh, bash, perl, and python scripting languages. You will need at least one computer with several CPU cores, preferably a multiple compute cluster system or equivalent in a cloud computing environment.
Parasol Job Control System
The scripts and programs used here expect to find the Parasol_job_control_system in place and operational.
Install scripts and kent command line utilities
This is a bit of a kludge at this time (April 2018), we are working on a cleaner distribution of these scripts. As was mentioned in the Parasol_job_control_system setup, the kent command line binaries and these scripts are going to reside in /data/bin/ and /data/scripts/. This is merely a style custom to keep scripts separate from binaries, this is not strictly necessary to keep them separate.
mkdir -p /data/scripts /data/bin chmod 755 /data/scripts /data/bin rsync -a rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/ /data/bin/ git archive --remote=git://genome-source.soe.ucsc.edu/kent.git \ --prefix=kent/ HEAD src/hg/utils/automation \ | tar vxf - -C /data/scripts --strip-components=5 \ --exclude='kent/src/hg/utils/automation/incidentDb' \ --exclude='kent/src/hg/utils/automation/configFiles' \ --exclude='kent/src/hg/utils/automation/ensGene' \ --exclude='kent/src/hg/utils/automation/genbank' \ --exclude='kent/src/hg/utils/automation/lastz_D' \ --exclude='kent/src/hg/utils/automation/openStack'
PATH setup
Add or verify the two directories /data/bin and /data/scripts are added to the shell PATH environment. This can be added simply to the .bashrc file in the your home directory:
echo 'export PATH=/data/bin:/data/scripts:$PATH' >> $HOME/.bashrc
Then, source that file to add that to this current shell:
. $HOME/.bashrc
Verify you see those pathnames on the PATH variable:
echo $PATH /data/bin:/data/scripts:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/centos/.local/bin:/home/centos/bin
This entire discussion assumes the bash shell is the user's unix shell.
Working directory hierarchy
It is best to organize your work in a directory hierarchy. For example maintain all your genome sequences in:
/data/genomes/ /data/genomes/hg38/ /data/genomes/mm10/ /data/genomes/dm6/ /data/genomes/ce11/ ... etc ...
Where those database directories can have the 2bit files, chrom.sizes, and track construction directories, for example:
/data/genomes/dm6/dm6.2bit /data/genomes/dm6/dm6.chrom.sizes /data/genomes/dm6/trackData/
Such organizations are a personal preference custom. However you do this, keep it consistent to make it easier to use scripts on multiple sequences.