Browser Installation: Difference between revisions

From genomewiki
Jump to navigationJump to search
m (Adding link to the minimal.hg.conf page)
 
(125 intermediate revisions by 10 users not shown)
Line 1: Line 1:
= Installing a Browser Mirror on Red Hat and its Derivatives =
Please read the big-picture overview of a Genome Browser installations first. It introduces the various options you have and introduces our installation script, which automates the process described below: http://genome.ucsc.edu/admin/mirror.html
The following How-to provides a step-by-step procedure for installing the Genome Browser on a Mandriva 2006/2007 system. It should also work on any other Red Hat derived Linux OS (and probably most distros for that matter).  


These instructions are based on the procedures outlined in the general Browser Mirror Procedures as well as various HOWTOs found in jksrc. I have tried to combine the various steps into an easy to follow procedure which does will not require a lot of specialized knowledge to successfully complete the installation.
'''NOTE''': This wiki page is not necessarily maintained with the most up-to-date information.  The information in the README files in the source tree is guaranteed to be current: [http://genome-source.soe.ucsc.edu/gitlist/kent.git/tree/master/src/product src/product]


The procedure described below will create a full mirror of the Genome Browser, but will comment on the steps that change if you only want to mirror some selected assemblies.
= Installing a Browser Mirror on Linux =


----
Please read the big-picture overview of UCSC Genome Browser installation: http://genome.ucsc.edu/admin/mirror.html


'''Prerequisites''':
There are also instructions and scripts in the source tree [http://genome-source.soe.ucsc.edu/gitlist/kent.git/tree/master/src/product/scripts;h=5691470cb59d2e89bb62640b465c4b8519cfbd43;hb=HEAD src/product/scripts] to aid in the mirroring process and perform many of these steps
periodically by setting up cron jobs. [http://en.wikipedia.org/wiki/Crontab Crontab]  These instructions and scripts are the best up to date
information on installing a mirror.  Instructions you find on the internet may be out of date.


*It is assumed that you have both an Apache2 web server and MySql 5.x installed.  
These instructions are based on [http://genome.ucsc.edu/admin/mirror.html the official mirror instructions] as well as various HOWTOs found in [http://genome-source.soe.ucsc.edu/gitlist/kent.git/tree/master/src/product src/product] of the source tree. A very similar description can be found on the wiki of UC Davis [http://wiki.bioinformatics.ucdavis.edu/index.php/Genome_Browser].
*A full mirror will require a lot of disk space. Currently (Dec 2006) a clean install of the Browser consumed close to 2T of storage. Additional space will also be required for future database expansion. A partial mirror will need less. Plan accordingly.  
*Note the rsync commands listed below must be executed with sufficient write permissions to create their assigned directories and files.
----


== Get Executables ==


More details for these methods can be found in the official mirror docs and README.building.source.
== Outline ==


*Option 1: Use rsync to get a copied of the compiled binaries.
* Download
** Copy all html files to /var/www/html (this path can be adapted to your system, but things are easier if you stick with this default)
** Copy all cgi-bin programs to /var/www/cgi-bin (this path can be adapted to your system, but see above)
** Choose the assemblies you want to mirror, and for these...
*** Copy the corresponding mysql databases to your mysql directory (e.g. /var/lib/mysql) and change their owner to the mysql user
*** Copy the corresponding /gbdb/<assembly> directories to the /gbdb directory (cannot be changed, create a link /gbdb if you have to store them somewhere else)
* Configuration
** Tweak your apache configuration to make it execute the cgi-bin scripts, activate XBitHack. Then restart apache.
** Create a trash directory (users can upload their custom tracks into this) and make it owned by the apache user
** Setup mysql permissions for the mysql-databases you downloaded and
** Add the mysql users you created to /var/www/cgi-bin/hg.conf
** Set a defaultGenome name in your cgi-bin/hg.conf file (see below)
** Tweak your hgcentral database (=user interface) to some reasonable defaults (e.g. default genome, blat servers, ...)


  rsync -avzP rsync://hgdownload.cse.ucsc.edu/cgi-bin/ /var/www/cgi-bin
== Prerequisites: software and diskspace ==
 
This command will grab the x86_64 binaries. These binaries work with any 64bit processor. If the binaries work for you they represent the easiest way forward.
 
*Option 2: Get the jksrc files and compile the executables yourself. Compiling the source can be a challenge if things don't work out of the box. However by compiling the jksrc tree you will get many useful tools and scripts which will be of value for bioinformatic and admin tasks.
 
*Suggestion: Try to Download jksrc. Attempt to compile the source files. If this initially proves to be problematic, use rsync to get the pre-compiled executables and compile the source later. You will find several useful scripts and some of the browser documentation in ''/src/product'' (in the jksrc archive). The scripts and docs in the source tree are available whether or not you successfully compile the entire tree.


* It is assumed that you have both an Apache2 web server and MySql 5.x installed.
* A full mirror will require a lot of disk space (Mar 2010: >4.5 Terrabyte) Additional space will also be required for future database expansion. A partial mirror will need less. Most people run partial mirrors for this reason.
* If you run a partial mirror, you select some genomes and serve only these. You will need to run rsync to find out how much space each genome takes.
* Note the rsync commands listed below must be executed with sufficient write permissions to create their assigned directories and files.
* You can check the space requirements of a genome by issueing these two commands (in this example, for the assembly hg18). You need to add the two reported "Total filesize"-lines:
<pre>
rsync -nah --stats rsync://hgdownload.soe.ucsc.edu/mysql/hg18
rsync -nah --stats rsync://hgdownload.soe.ucsc.edu/gbdb/hg18
</pre>
* If you want to mirror only a ''part'' of an assembly, see [[Minimal Browser Installation]]


== Configure Apache server ==
== Configure Apache server ==
Line 40: Line 53:
</pre>
</pre>


If you're already running a webserver, you have to run the genome browser as a virtual host, with its own domain name (ask your system administrators what that means). You have to adapt ScriptAlias and most other paths of your apache config to non-default directories, to avoid conflicts with your main apache files. Here is an example extract httpd.conf, with paths moved to subdirectories of /var/www:
If you're already running a webserver, you have to run the genome browser as a virtual host, with its own domain name (ask your system administrators what that means). You have to adapt ScriptAlias and most other paths of your apache config to non-default directories, to avoid conflicts with your main apache files. Here is an example extract httpd.conf, with paths moved to subdirectories of /var/www/genome running on the virtual host genome.myuniversity.com, which means that you can continue to run your existing webserver in /var/www under a different name:
<pre>
<pre>
<VirtualHost *:80>
<VirtualHost *:80>
     ServerAdmin mymail@somewhere.com
     ServerAdmin mymail@somewhere.com
     DocumentRoot /var/www/genome/html
     DocumentRoot /var/www/genome/html
     ServerName genome.myUniversity.com
     ServerName genome.myuniversity.com
     ErrorLog logs/genome-error_log
     ErrorLog logs/genome-error_log
     CustomLog logs/genome-access_log common
     CustomLog logs/genome-access_log common
Line 62: Line 75:
</pre>
</pre>


Find the location of your web pages. This should be ''/var/www/html'' by default (if you don't use virtual hosts). Set the enviromental variable if desired.
== Set environment variables ==


       export WEBROOT="/var/www/html"
Using the source tree scripts, there is a single file with all variables to set: [http://genome-source.soe.ucsc.edu/gitlist/kent.git/raw/master/src/product/scripts/browserEnvironment.txt src/product/scripts/browserEnvironment.txt].  See also: [http://genome-source.soe.ucsc.edu/gitlist/kent.git/raw/master/src/product/scripts/README src/product/scripts/README]
 
The UCSC source tree assumes the Apache system is located in <em>/usr/local/apache</em>  A very simple way to avoid complications is to create symlinks on your system:
/usr/local/apache/htdocs -> /var/www/html
/usr/local/apache/cgi-bin -> /var/www/cgi-bin
 
The information on the pages uses the following three variables:
 
Your apache directories change all following instructions a lot. We therefore define some variables now, so you can copy-paste and don't have to change the following commands.
 
Remeber your DocumentRoot directory from apache? This should be ''/var/www/html'' by default (if you don't use virtual hosts). Set the following enviromental variable.
 
       export WEBROOT=/var/www/html


Find the location of your cgi-bin directory. This should be /var/www/cgi-bin (if you don't use virtual hosts). Set the enviromental variable if desired.
Find the location of your cgi-bin directory. This should be /var/www/cgi-bin (if you don't use virtual hosts). Set the enviromental variable if desired.


       export CGI-BIN="/var/www/cgi-bin"
       export CGI-BIN=/var/www/cgi-bin


Next, find the location of your MySQL data. This should be located in ''/var/lib/mysql''. Set the enviromental variable if desired.
Next, find the location of your MySQL data. This should be located in ''/var/lib/mysql''. Set the enviromental variable if desired.


       export MYSQLDATA="/var/lib/mysql"
       export MYSQLDATA=/var/lib/mysql
 
Note: These variables can be set in ''/etc/profile'' so they will be available globally to all users.
 
== Get Executables ==
 
The source tree script: [http://genome-source.soe.ucsc.edu/gitlist/kent.git/blob/master/src/product/scripts/kentSrcUpdate.sh;h=65d1058f310867d715d16688ea0aab8800393aff;hb=HEAD src/product/scripts/kentSrcUpdate.sh] can be used to fetch and build the source tree.
 
You have two options to get the executables: Download pre-compiled 64bit-linux binaries or compile the whole UCSC source tree yourself.
 
* Option 1: Use rsync to get a copy of the pre-compiled binaries.
 
  rsync -avzP rsync://hgdownload.soe.ucsc.edu/cgi-bin/ $CGI-BIN
 
This command will grab the x86_64 binaries. These binaries work with any 64bit processor. If the binaries work for you they represent the easiest way forward.
 
* To make the binaries work in Debian lenny, execute these two commands in /usr/lib as root:
    ln -s libcrypto.so.0.9.8 libcrypto.so.6
    ln -s libssl.so.6 libssl.so.0.9.8
 
* Option 2: Get the jksrc files from the [[the source tree]] and compile the executables yourself. Compiling the source can be a challenge if things don't work out of the box. However by compiling the jksrc tree you will get many useful tools and scripts which will be of value for bioinformatic and admin tasks.


Note: These variables can be set in ''/etc/profile'' so they will be available globally to all users. Also they can be skipped entirely if absolute paths are used instead.
*Suggestion: Try to Download jksrc. Attempt to compile the source files. If this initially proves to be problematic, use rsync to get the pre-compiled executables and compile the source later. You will find several useful scripts and some of the browser documentation in ''/src/product'' (in the jksrc archive). The scripts and docs in the source tree are available whether or not you successfully compile the entire tree.


== Get all the html files ==
== Get all the html files ==
The source tree script: [http://genome-source.soe.ucsc.edu/gitweb/?p=kent.git;a=blob;f=src/product/scripts/updateHtml.sh;h=5e851780bc538565c48ac79b5b6b16bf2c142bc3;hb=HEAD src/product/scripts/updateHtml.sh] can be used to fetch the HTML hierarchy.


Test the rysnc connection:
Test the rysnc connection:


     rsync -navz --progress rsync://hgdownload.cse.ucsc.edu
     rsync -navz --progress rsync://hgdownload.soe.ucsc.edu


Determine the destination of the copy ($WEBROOT) and fire off the production copy. The trailing slash is important!  
Determine the destination of the copy ($WEBROOT) and fire off the production copy. The trailing slash is important!  


     rsync -avzP rsync://hgdownload.cse.ucsc.edu/htdocs/ /var/www/html/
     rsync -avzP rsync://hgdownload.soe.ucsc.edu/htdocs/ $WEBROOT/


== Obtain the /gbdb data file area ==
== Obtain the /gbdb data file area ==


You will need the portions of /gbdb used by the browser. (This is a large download):
The source tree scripts: [http://genome-source.soe.ucsc.edu/gitlist/kent.git/blob/master/src/product/scripts/fetchFullGbdb.sh src/product/scripts/fetchFullGbdb.sh] and [http://genome-source.soe.ucsc.edu/gitlist/kent.git/blob/master/src/product/scripts/fetchMinimalGbdb.sh src/product/scripts/fetchMinimalGbdb.sh] can be used to manage your gbdb downloads.


      rsync -avzP --delete --max-delete=20 rsync://hgdownload.cse.ucsc.edu/gbdb/ /gbdb/
You will need the portions of /gbdb used by the browser. Replace XXX with the assemblies you want to mirror. The trailing slash is important.


You can restrict this to only individual assemblies, like hg18 or mm8. In any case, you should always download the /gbdb/genbank directory, as many assemblies link into it.
      rsync -avzP --delete --max-delete=20 rsync://hgdownload.soe.ucsc.edu/gbdb/XXX/ /gbdb/XXX/


== Download database tables ==
== Download assembly database tables ==


These instructions should be followed in conjunction with ''/src/product/README.mysql.setup''.There are two ways to install the tables.The first involves building the tables from the assembly dumps (optionally downloaded above).The second and preferable method involves rsyncing the binary tables themselves. This second method is preferable since the mysql table build process using the assemblies is quite computationally intensive.
There are two ways to install the tables.
# '''Build from textfiles:''' The first involves building the tables from the assembly dumps (optionally downloaded above). This method is not covered here, please see the official mirror docs
# '''Direct syncing:''' The second and preferable method involves rsyncing the binary tables themselves. This is faster and a lot more convenient. Use this method if possible. You also do not need to create any databases in this case, they are automatically created by downloading the mysql files.


'''Caveats for direct syncing:'''
* Your MySql version must be compatible with the table version (currently 5.0.x)
*The hgcentral (and others?) table which is found in /var/lib/mysql/ must receive special handling (covered later).
*For your own locally created tracks loaded into the database, use the [http://hgwdev.gi.ucsc.edu/~kent/src/unzipped/product/README.trackDb trackDb_localTracks] table to avoid the UCSC updated trackDb tables.
*The actual download size of the tables is more than simply downloading the text dumps of the assemblies. This is because of the extensive use of indexes in the tables.


'''Caveats for direct syncing:'''
To proceed with syncing the tables directly issue the following command:
*Your MySql version must be compatible with the table version (currently 4.0.x) (comment max: I'm using mysql5 and that works fine with direct syncing)
*The hgcentral (and others?) table which is found in /var/lib/mysql/ must recieve special handling (covered later).
*The actual download size of the tables is more than simply downloading the assemblies. This is because of the extensive use of indexes in the tables.
*The method for installing the tables using the assemblies is covered in the official mirror docs and is not covered here


          rsync -avzP --delete --max-delete=20 rsync://hgdownload.soe.ucsc.edu/mysql/XXX/ $MYSQLDATA/XXX/


To proceed with syncing the tables directly issue the following command:
where XXX is the name of each database to be mirrored. You will need to generate a list of tables to be mirrored. Note you can NOT simply sync with hgdownload.soe.ucsc.edu/mysql since the mysql directory contains a number of files and sub directories which are specific to each instance of the mysql database.


           rsync -avzP --delete --max-delete=20 rsync://hgdownload.cse.ucsc.edu/mysql/XXX/ /var/lib/mysql/XXX/
An unedited list of potential tables to be mirrored can be found by issuing the command:
           rsync -v --dry-run rsync://hgdownload.soe.ucsc.edu/mysql


where XXX is the name of each database to be mirrored. You will need to generate a list of tables to be mirrored. Note you can NOT simply sync with hgdownload.cse.ucsc.edu/mysql since the mysql directory contains a number of files and sub directories which are specific to each instance of the mysql database.
This list will then have to be edited so that only the correct tables are mirrored.


You will need other databases in addition to just the genome assemly databases: hgFixed, proteome, genbank, proteins040315, proteins060115 and uniProt. You better download all of them, so avoid any error messages later on.
== Download other database tables ==


An unedited list of potential tables to be mirrored can be found by issuing the command:
You will usually need other databases in addition to just the genome assembly databases:
* hgcentral: primary database the browser uses to find everything else, also contains dynamic user/session "cart" data
** Essential
* hgTemp: empty database for table browser ID uploads
** just do a "create database hgTemp" on the mysql command line
* sp090821, etc ... - "Swiss-Prot" aka UniProt database obtained from files at ftp.expasy.org/databases/uniprot/
** used in UCSC genes track on various databases
* uniprot: the newest version of the Swiss-Prot databases, can simply be a symlink to the newest sp* database directory
** Used in UCSC genes track on various databases
* go - The Gene Ontology database, obtained from: http://www.godatabase.org/dev/database/
** used in the UCSC genes track
* proteins090821, etc. - a combination of the UniProt data mentioned above and data from HGNC http://www.genenames.org/
** Used in the human UCSC genes track and proteome browser
* visiGene: virtual microscope for mice sections
** Usually not needed
* proteome: should merely be a symlink to the most recent proteins090821 database.
Download them, like the assembly database tables above.


          rsync -v --dry-run rsync://hgdownload.cse.ucsc.edu/mysql
Depending upon the other genomes hosted and their interrelationship with other genome assemblies, you may need to create empty databases to at least satisfy the reference.  For example some tracks on the Mouse mm9 browser need to have the hg18 database existing.  To fix this, simply create an empty database: hgsql -e "create database hg18;" hg19.  You could populate these stub databases with a minimal set of tables.  Note the fetchMinimal* scripts in the source tree directory [http://genome-test.gi.ucsc.edu/~kent/src/unzipped/product/scripts/ src/product/scripts/].


This list will then have to be edited so that only the correct tables are mirrored. The script (TODO:sync_tables) can be used to download a complete list of eligable tables and to automatically sync with each.
You will usually need hgcentral, hgFixed, proteome (symlink), genbank, uniProt. You better download all of them now, to avoid any error messages later on.


== Grant Mysql rights ==
== Grant Mysql rights ==
These instructions should be followed in conjunction with ''[http://hgwdev.gi.ucsc.edu/~kent/src/unzipped/product/README.mysql.setup /src/product/README.mysql.setup]''.
After the tables have been created it is necessary to add the required users along with their associated permissions. The entire process of MySQL configuration is described in /src/product/README.mysql.setup as found in jksrc. In brief 3 users are required. These users are readonly, readwrite, browser. These users are configured as follows:
After the tables have been created it is necessary to add the required users along with their associated permissions. The entire process of MySQL configuration is described in /src/product/README.mysql.setup as found in jksrc. In brief 3 users are required. These users are readonly, readwrite, browser. These users are configured as follows:


Line 138: Line 208:
|}
|}


Do not forget the grant rights for the databases genbank, proteome, uniProt and proteinsxxxx as well (See also [[Example Mysql Grants]]).
Do not forget the grant rights for the databases hgFixed, genbank, proteome, uniProt and proteinsxxxx as well (See also [[Example Mysql Grants]]).


Each database must have these 3 users added with the associated permissions. The easiest way to accomplish this is to use the
Each database must have these 3 users added with the associated permissions. The easiest way to accomplish this is to use the
Line 152: Line 222:
After adding the MySql Users it is necessary to add '''hg.conf''' to the cgi-bin directory. '''hg.conf''' contains username/password information and is required by various cgi-bin programs.
After adding the MySql Users it is necessary to add '''hg.conf''' to the cgi-bin directory. '''hg.conf''' contains username/password information and is required by various cgi-bin programs.


A sample '''hg.conf''' can be found [http://genome-test.cse.ucsc.edu/~kent/src/unzipped/product/ex.hg.conf|here]. More discussion of this script can be in README.mysql.install which is located in ''/src/products in jksrc''. The default user/password combinations and permissions can be changed, however doing so will require editing of other scripts which have the user/passwords hardcoded in them (notably ex.MySQLUserPerms.sh). It is probably best to keep the defaults at least until one knows what one is doing.
A sample '''hg.conf''' can be found [http://genome-test.gi.ucsc.edu/~kent/src/unzipped/product/ex.hg.conf here]. Go to the cgi-bin directory and execute this command:
    sudo wget http://genome-test.gi.ucsc.edu/~kent/src/unzipped/product/ex.hg.conf -O hg.conf
 
The default user/password combinations and permissions can be changed, however doing so will require editing of other scripts which have the user/passwords hardcoded in them (notably ex.MySQLUserPerms.sh). It is probably best to keep the defaults at least until one knows what one is doing. To learn more about .hg.conf file setup and specifics, visit our [https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/product/minimal.hg.conf minimal.hg.conf help page]


In '''hg.conf''' you will need to set the document root:
In '''hg.conf''' you will need to set the document root:
Line 162: Line 235:
If you're not on the UCSC campus: Comment out any bottleneck statements from hg.conf, as they will break the MAF alignment view.
If you're not on the UCSC campus: Comment out any bottleneck statements from hg.conf, as they will break the MAF alignment view.


Make sure you activate the custom track database statements: [[Using_custom_track_database]]
Make sure you activate the custom track database statements: [[Using custom track database]]


== Set up the "hgcentral" tables ==
== Set up the "hgcentral" tables ==


Download the schema for the hgcentral database (hgcentral.sql) http://genome-test.cse.ucsc.edu/~kent/src/unzipped/product/ex.hg.conf
Download the schema for the hgcentral database (hgcentral.sql) http://genome-test.gi.ucsc.edu/~kent/src/unzipped/product/ex.hg.conf


Create a hgcentral database
Create a hgcentral database
Line 182: Line 255:
     mysql> USE hgcentral;
     mysql> USE hgcentral;


     mysql> UPDATE blatServers SET host=CONCAT(host,".cse.ucsc.edu");
     mysql> UPDATE blatServers SET host=CONCAT(host,".soe.ucsc.edu");


'''Please get permission before using Blat Servers!'''
'''Please get permission before using Blat Servers!'''


If you're mirroring only some assemblies, your hgcentral.dbDb should reflect this. E.g. if you have downloaded only dm3 and mm8, execute the following command:
If you are not mirroring the primary human database and would like to have a different default genome displayed on your gateway page, enter the <em>defaultGenome</em> specification in the <em>cgi-bin/hg.conf</em> file. For example if you have only dm3:
<pre>
defaultGenome=D. melanogaster
update dbDb set active=0 where name not in ('mm8', 'dm3');
</pre>


If you're not mirroring any human assembly, the hgGateway will still try to access it. To fix this and force hgGateway to always show mm8 by default:
Find the name to use from defaultDb in hgcentral:
<pre>
hgsql -e "select genome from defaultDb;" hgcentral
update defaultDb set name="mm8" where genome="Human";
</pre>


== Create a "trash" directory ==
== Create a "trash" directory ==
Line 200: Line 269:
The cgi programs use a temporary area to create and store images used by the browser. This directory is by default looked for in ''/var/www/trash''. You should make this directory and allow the user that runs the web server write access to it. As a point of maintenance this directory will need to be cleaned out from time to time.
The cgi programs use a temporary area to create and store images used by the browser. This directory is by default looked for in ''/var/www/trash''. You should make this directory and allow the user that runs the web server write access to it. As a point of maintenance this directory will need to be cleaned out from time to time.


<pre>
mkdir /var/www/trash
chown apache:apche var/www/trash
</pre>
If you have adapted your /var/www/ directory to something else (e.g. /var/www/genome/html) you should create your trash directory there.
You will also need a symlink from your document root html hierarchy to this trash directory.
The trash directory can actually be on any filesystem and /var/www/trash can be
a symlink to that location.


== Compiling static version of RNAplot ==


RNAplot is an auxiliary executable that generates theoretical fold diagrams of RNA sequences.  It is not supplied with the Genome Browser, but is available from the Institute for Theoretical Chemistry and Structural Biology, Theoretical Biochemistry Group, at the University of Vienna.  For minimal browsers, it is convenient to compile statically, without any need for external shared libraries.  This script [[File:INSTALL_RNAplot.sh]] is a guide for creating the static executable.  Once compiled, the executable can be made available to the browser using the hg.conf directive rnaPlotPath.


== Removed: Get the data for each individual genome assembly ==
== If it doesn't work... ==
* "'''It won't compile although I can see just a warning'''"
** Remove the -Wall statement from the main makefile
* "'''It won't compile and complains about missing MYSQL libraries'''"
** Make sure that you have installed the MySQL libraries (ubuntu/debian: "apt-get install libmysqlclient15-dev", RedHat/Centos/Fedora: "yum install mysql-devel")
* "'''Nothing shows up, not even the blue side menu'''"
** If they don't then the documentRoot statement in the Apache configuration file is wrong or you have forgotten to restart your Apache daemon ("/etc/init.d/httpd restart" or "/etc/init.d/apache restart")
* "'''There is something, but I cannot see the menu where I can select the right genome'''"
** Try to run the script [http://hgwdev.gi.ucsc.edu/~kent/src/unzipped/product/scripts/printEnv.pl printEnv.pl] from your browser, by typing a URL like http://genome.myUniversity.edu/cgi-bin/printEnv.pl into the address bar to see if scripts are executed at all
** Verify that Apache is really executing the cgi-bin programs and not just downloading them. When you access cgi-bin/hgGateway with a browser and they are not executed, check this:
*** Does your Apache Configuration file contain a ScriptAlias directive? This directive has to reference the cgi-bin directory.
*** There is no use in trying Apache's "AddHandler cgi-script .cgi" directive as often suggested by the Apache documentation. It doesn't work because the cgi scripts to not end in .cgi
*** An alternative is to create a file .htaccess in the cgi-bin directory and add the line "DefaultType application/x-httpd-cgi" (make sure that .htaccess interpretation is activated in apache for this directory, see the apache documentation)
** Have you restarted your web server after you updated your apache configuration file?
** Try to use a different internet browser or shift-click the reload button of firefox: If you have just fixed something, firefox will have cached the error although everything is fine
** Check if the tables genomes/clades in your hgcentral table make sense (todo: extend this section)
** If you're using CentOS or another linux distribution with SELinux: Try "setenforce 0" and test it again, to rule out any SELinux issues
* "'''I am seeing error messages relating to gbdb...''' "
** Make sure that the apache user can access the /gbdb directory
*** Check the file permissions of the /gbdb directory in respect to the user under which apache is running (debian/ubuntu: www-data)
*** If you are not sure how to interpret file permissions, login as the apache user ("sudo su apache") and try to read any file in /gbdb. Also check if you can write to your trash directory (echo test > test)
** If you're using CentOS or another linux distribution with SELinux: Try "setenforce 0" and test it again, to rule out any SELinux issues
* "'''Couldn't set connection database to hg18'''"
** Create an empty hg18 database: 'hgsql -e "create database hg18;" hg19'
* "'''Couldn't connect to database (null) on localhost as readwrite. Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock''''"
** try mysql --socket /var/lib/mysql/mysql.sock, if this doesn't work then you have to move your mysql socket (change /etc/mysql/my.cnf)
* "'''I see mysql-access errors everywhere'''"
** Verify that the Mysql user you added to hg.conf can read the mysql databases and can write to the customTracks database.
** To test this, log into mysql (mysql -u <your user from hg.conf> -p) and try to access the assembly that does not work ("select * from hg19.chromInfo;")
** For other Mysql error messages, set the environment variable "JKSQL_TRACE=on" and run the CGI from the command line (see below) to see what's going on
* "'''CGI-programs are crashing'''"
** Use the arguments from the URL and replace the & with spaces
** Call the CGI from a command line and copy the arguments just after the CGI, like this "/var/www/cgi-bin/hgc hgsid=337 o=14394787 t=14395353"
** You should get a coredump which you can analyse with gdb
** You can combine this with the JKSQL_TRACE=on (see above)
** If you're using CentOS or another linux distribution with SELinux: Try "setenforce 0" and test it again, to rule out any SELinux issues


Removed this section to the discussion page as it is not needed.
= see also =
* [[CentOS notes]]
* [[Source tree compilation on Debian/Ubuntu]]
* [[SUSE Linux notes]]
* [[The source tree]]
* [[Learn about the Browser]]


[[Category:Mirror Site FAQ]]
[[Category:Mirror Site FAQ]]
[[Category:Installation]]

Latest revision as of 22:46, 4 September 2019

Please read the big-picture overview of a Genome Browser installations first. It introduces the various options you have and introduces our installation script, which automates the process described below: http://genome.ucsc.edu/admin/mirror.html

NOTE: This wiki page is not necessarily maintained with the most up-to-date information. The information in the README files in the source tree is guaranteed to be current: src/product

Installing a Browser Mirror on Linux

Please read the big-picture overview of UCSC Genome Browser installation: http://genome.ucsc.edu/admin/mirror.html

There are also instructions and scripts in the source tree src/product/scripts to aid in the mirroring process and perform many of these steps periodically by setting up cron jobs. Crontab These instructions and scripts are the best up to date information on installing a mirror. Instructions you find on the internet may be out of date.

These instructions are based on the official mirror instructions as well as various HOWTOs found in src/product of the source tree. A very similar description can be found on the wiki of UC Davis [1].


Outline

  • Download
    • Copy all html files to /var/www/html (this path can be adapted to your system, but things are easier if you stick with this default)
    • Copy all cgi-bin programs to /var/www/cgi-bin (this path can be adapted to your system, but see above)
    • Choose the assemblies you want to mirror, and for these...
      • Copy the corresponding mysql databases to your mysql directory (e.g. /var/lib/mysql) and change their owner to the mysql user
      • Copy the corresponding /gbdb/<assembly> directories to the /gbdb directory (cannot be changed, create a link /gbdb if you have to store them somewhere else)
  • Configuration
    • Tweak your apache configuration to make it execute the cgi-bin scripts, activate XBitHack. Then restart apache.
    • Create a trash directory (users can upload their custom tracks into this) and make it owned by the apache user
    • Setup mysql permissions for the mysql-databases you downloaded and
    • Add the mysql users you created to /var/www/cgi-bin/hg.conf
    • Set a defaultGenome name in your cgi-bin/hg.conf file (see below)
    • Tweak your hgcentral database (=user interface) to some reasonable defaults (e.g. default genome, blat servers, ...)

Prerequisites: software and diskspace

  • It is assumed that you have both an Apache2 web server and MySql 5.x installed.
  • A full mirror will require a lot of disk space (Mar 2010: >4.5 Terrabyte) Additional space will also be required for future database expansion. A partial mirror will need less. Most people run partial mirrors for this reason.
  • If you run a partial mirror, you select some genomes and serve only these. You will need to run rsync to find out how much space each genome takes.
  • Note the rsync commands listed below must be executed with sufficient write permissions to create their assigned directories and files.
  • You can check the space requirements of a genome by issueing these two commands (in this example, for the assembly hg18). You need to add the two reported "Total filesize"-lines:
 rsync -nah --stats rsync://hgdownload.soe.ucsc.edu/mysql/hg18
 rsync -nah --stats rsync://hgdownload.soe.ucsc.edu/gbdb/hg18

Configure Apache server

In order to support SSI it is necessary to set the XBitHack. Add the following somewhere in /etc/httpd/conf/httpd.conf (redhat) or /etc/apache2/apache2.conf (debian/ubuntu)

      XBitHack on
      <Directory /var/www/html>
      Options +Includes
      </Directory>

If you're already running a webserver, you have to run the genome browser as a virtual host, with its own domain name (ask your system administrators what that means). You have to adapt ScriptAlias and most other paths of your apache config to non-default directories, to avoid conflicts with your main apache files. Here is an example extract httpd.conf, with paths moved to subdirectories of /var/www/genome running on the virtual host genome.myuniversity.com, which means that you can continue to run your existing webserver in /var/www under a different name:

<VirtualHost *:80>
    ServerAdmin mymail@somewhere.com
    DocumentRoot /var/www/genome/html
    ServerName genome.myuniversity.com
    ErrorLog logs/genome-error_log
    CustomLog logs/genome-access_log common
    ScriptAlias /cgi-bin/ /var/www/genome/cgi-bin/
    XBitHack on
    <Directory "/var/www/genome/cgi-bin">
       AllowOverride None
       Order allow,deny
       Allow from all
       Options +ExecCGI
    </Directory>
    <Directory /home/data/www/genome/html>
       Options +Includes
     </Directory>
</VirtualHost>

Set environment variables

Using the source tree scripts, there is a single file with all variables to set: src/product/scripts/browserEnvironment.txt. See also: src/product/scripts/README

The UCSC source tree assumes the Apache system is located in /usr/local/apache A very simple way to avoid complications is to create symlinks on your system:

/usr/local/apache/htdocs -> /var/www/html
/usr/local/apache/cgi-bin -> /var/www/cgi-bin

The information on the pages uses the following three variables:

Your apache directories change all following instructions a lot. We therefore define some variables now, so you can copy-paste and don't have to change the following commands.

Remeber your DocumentRoot directory from apache? This should be /var/www/html by default (if you don't use virtual hosts). Set the following enviromental variable.

     export WEBROOT=/var/www/html

Find the location of your cgi-bin directory. This should be /var/www/cgi-bin (if you don't use virtual hosts). Set the enviromental variable if desired.

     export CGI-BIN=/var/www/cgi-bin

Next, find the location of your MySQL data. This should be located in /var/lib/mysql. Set the enviromental variable if desired.

     export MYSQLDATA=/var/lib/mysql

Note: These variables can be set in /etc/profile so they will be available globally to all users.

Get Executables

The source tree script: src/product/scripts/kentSrcUpdate.sh can be used to fetch and build the source tree.

You have two options to get the executables: Download pre-compiled 64bit-linux binaries or compile the whole UCSC source tree yourself.

  • Option 1: Use rsync to get a copy of the pre-compiled binaries.
  rsync -avzP rsync://hgdownload.soe.ucsc.edu/cgi-bin/ $CGI-BIN

This command will grab the x86_64 binaries. These binaries work with any 64bit processor. If the binaries work for you they represent the easiest way forward.

  • To make the binaries work in Debian lenny, execute these two commands in /usr/lib as root:
   ln -s libcrypto.so.0.9.8 libcrypto.so.6
   ln -s libssl.so.6 libssl.so.0.9.8
  • Option 2: Get the jksrc files from the the source tree and compile the executables yourself. Compiling the source can be a challenge if things don't work out of the box. However by compiling the jksrc tree you will get many useful tools and scripts which will be of value for bioinformatic and admin tasks.
  • Suggestion: Try to Download jksrc. Attempt to compile the source files. If this initially proves to be problematic, use rsync to get the pre-compiled executables and compile the source later. You will find several useful scripts and some of the browser documentation in /src/product (in the jksrc archive). The scripts and docs in the source tree are available whether or not you successfully compile the entire tree.

Get all the html files

The source tree script: src/product/scripts/updateHtml.sh can be used to fetch the HTML hierarchy.

Test the rysnc connection:

   rsync -navz --progress rsync://hgdownload.soe.ucsc.edu

Determine the destination of the copy ($WEBROOT) and fire off the production copy. The trailing slash is important!

   rsync -avzP rsync://hgdownload.soe.ucsc.edu/htdocs/ $WEBROOT/

Obtain the /gbdb data file area

The source tree scripts: src/product/scripts/fetchFullGbdb.sh and src/product/scripts/fetchMinimalGbdb.sh can be used to manage your gbdb downloads.

You will need the portions of /gbdb used by the browser. Replace XXX with the assemblies you want to mirror. The trailing slash is important.

     rsync -avzP --delete --max-delete=20 rsync://hgdownload.soe.ucsc.edu/gbdb/XXX/ /gbdb/XXX/

Download assembly database tables

There are two ways to install the tables.

  1. Build from textfiles: The first involves building the tables from the assembly dumps (optionally downloaded above). This method is not covered here, please see the official mirror docs
  2. Direct syncing: The second and preferable method involves rsyncing the binary tables themselves. This is faster and a lot more convenient. Use this method if possible. You also do not need to create any databases in this case, they are automatically created by downloading the mysql files.

Caveats for direct syncing:

  • Your MySql version must be compatible with the table version (currently 5.0.x)
  • The hgcentral (and others?) table which is found in /var/lib/mysql/ must receive special handling (covered later).
  • For your own locally created tracks loaded into the database, use the trackDb_localTracks table to avoid the UCSC updated trackDb tables.
  • The actual download size of the tables is more than simply downloading the text dumps of the assemblies. This is because of the extensive use of indexes in the tables.

To proceed with syncing the tables directly issue the following command:

         rsync -avzP --delete --max-delete=20 rsync://hgdownload.soe.ucsc.edu/mysql/XXX/ $MYSQLDATA/XXX/

where XXX is the name of each database to be mirrored. You will need to generate a list of tables to be mirrored. Note you can NOT simply sync with hgdownload.soe.ucsc.edu/mysql since the mysql directory contains a number of files and sub directories which are specific to each instance of the mysql database.

An unedited list of potential tables to be mirrored can be found by issuing the command:

         rsync -v --dry-run rsync://hgdownload.soe.ucsc.edu/mysql

This list will then have to be edited so that only the correct tables are mirrored.

Download other database tables

You will usually need other databases in addition to just the genome assembly databases:

  • hgcentral: primary database the browser uses to find everything else, also contains dynamic user/session "cart" data
    • Essential
  • hgTemp: empty database for table browser ID uploads
    • just do a "create database hgTemp" on the mysql command line
  • sp090821, etc ... - "Swiss-Prot" aka UniProt database obtained from files at ftp.expasy.org/databases/uniprot/
    • used in UCSC genes track on various databases
  • uniprot: the newest version of the Swiss-Prot databases, can simply be a symlink to the newest sp* database directory
    • Used in UCSC genes track on various databases
  • go - The Gene Ontology database, obtained from: http://www.godatabase.org/dev/database/
    • used in the UCSC genes track
  • proteins090821, etc. - a combination of the UniProt data mentioned above and data from HGNC http://www.genenames.org/
    • Used in the human UCSC genes track and proteome browser
  • visiGene: virtual microscope for mice sections
    • Usually not needed
  • proteome: should merely be a symlink to the most recent proteins090821 database.

Download them, like the assembly database tables above.

Depending upon the other genomes hosted and their interrelationship with other genome assemblies, you may need to create empty databases to at least satisfy the reference. For example some tracks on the Mouse mm9 browser need to have the hg18 database existing. To fix this, simply create an empty database: hgsql -e "create database hg18;" hg19. You could populate these stub databases with a minimal set of tables. Note the fetchMinimal* scripts in the source tree directory src/product/scripts/.

You will usually need hgcentral, hgFixed, proteome (symlink), genbank, uniProt. You better download all of them now, to avoid any error messages later on.

Grant Mysql rights

These instructions should be followed in conjunction with /src/product/README.mysql.setup.

After the tables have been created it is necessary to add the required users along with their associated permissions. The entire process of MySQL configuration is described in /src/product/README.mysql.setup as found in jksrc. In brief 3 users are required. These users are readonly, readwrite, browser. These users are configured as follows:


User MySql Permission Databases Used by
browser SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER All except hgcentral developers
readonly SELECT All except hgcentral CGI scripts
readwrite SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER hgcentral browser(?)

Do not forget the grant rights for the databases hgFixed, genbank, proteome, uniProt and proteinsxxxx as well (See also Example Mysql Grants).

Each database must have these 3 users added with the associated permissions. The easiest way to accomplish this is to use the script ex.MySQLUserPerms.sh which can be found in src/products in jksrc. The script sets the permissions on each database listed by name. NOTE:This script must be edited before use!. The script handles each database explicitly by name. It is likely that the script does not contain the latest set of database names. A current list of database names must be generated and any which are missing will need to be added. Also future updates to the database may require additional changes to the script. As an alternative , it is possible at the cost of a small amount of security to set the permissions globally using *.* edits. An example of the required edit to the script so that permissions are added globally is:

   ${MYSQL} -e "GRANT SELECT, UPDATE on *.* to browser@localhost IDENTIFIED BY 'password';"mysql

After the edits are made the script will add these 3 users to all of the databases found used by the browser. These permissions are limited to localhost for security reasons. ex.MySQLUserPerms.sh is heavily documented and should be read to make sure that the changes discussed above are understood.

Setup hg.conf

After adding the MySql Users it is necessary to add hg.conf to the cgi-bin directory. hg.conf contains username/password information and is required by various cgi-bin programs.

A sample hg.conf can be found here. Go to the cgi-bin directory and execute this command:

   sudo wget http://genome-test.gi.ucsc.edu/~kent/src/unzipped/product/ex.hg.conf -O hg.conf

The default user/password combinations and permissions can be changed, however doing so will require editing of other scripts which have the user/passwords hardcoded in them (notably ex.MySQLUserPerms.sh). It is probably best to keep the defaults at least until one knows what one is doing. To learn more about .hg.conf file setup and specifics, visit our minimal.hg.conf help page

In hg.conf you will need to set the document root:

   browser.documentRoot=/var/www/html

The actual path could be different depending on your actual root directory. After an appropriate hg.conf has been created, it should be installed in /var/html/cgi-bin and the permissions set to 600 (my setup has the file owner/group of apache/apache).

If you're not on the UCSC campus: Comment out any bottleneck statements from hg.conf, as they will break the MAF alignment view.

Make sure you activate the custom track database statements: Using custom track database

Set up the "hgcentral" tables

Download the schema for the hgcentral database (hgcentral.sql) http://genome-test.gi.ucsc.edu/~kent/src/unzipped/product/ex.hg.conf

Create a hgcentral database

     mysql> create database hgcentral

Add the hgcentral tables

      mysql -youraccountoptions hgcentral < hgcentral.sql

Create a user/password with the ability to update and insert. This user is currently "readwrite". The script ex.MySQLUserPerms.sh will add this usercreate this user to hgcentral.

(optional) Set correct location of Blat Servers. By design the location of the Blat Servers is incomplete in hgCentral. This prevents their over use or abuse. In order to implement Blat you will either need to connect to the UCSC servers or setup your own. You will need permission to connect to the UCSC Blat Servers. Please see a discussion of the requirements and restrictions in the official docs (TODO: find correct link). The following sql command will update the table to point at the ucsc servers

    mysql> USE hgcentral;
    mysql> UPDATE blatServers SET host=CONCAT(host,".soe.ucsc.edu");

Please get permission before using Blat Servers!

If you are not mirroring the primary human database and would like to have a different default genome displayed on your gateway page, enter the defaultGenome specification in the cgi-bin/hg.conf file. For example if you have only dm3:

defaultGenome=D. melanogaster

Find the name to use from defaultDb in hgcentral:

hgsql -e "select genome from defaultDb;" hgcentral

Create a "trash" directory

The cgi programs use a temporary area to create and store images used by the browser. This directory is by default looked for in /var/www/trash. You should make this directory and allow the user that runs the web server write access to it. As a point of maintenance this directory will need to be cleaned out from time to time.

mkdir /var/www/trash
chown apache:apche var/www/trash

If you have adapted your /var/www/ directory to something else (e.g. /var/www/genome/html) you should create your trash directory there.

You will also need a symlink from your document root html hierarchy to this trash directory. The trash directory can actually be on any filesystem and /var/www/trash can be a symlink to that location.

Compiling static version of RNAplot

RNAplot is an auxiliary executable that generates theoretical fold diagrams of RNA sequences. It is not supplied with the Genome Browser, but is available from the Institute for Theoretical Chemistry and Structural Biology, Theoretical Biochemistry Group, at the University of Vienna. For minimal browsers, it is convenient to compile statically, without any need for external shared libraries. This script File:INSTALL RNAplot.sh is a guide for creating the static executable. Once compiled, the executable can be made available to the browser using the hg.conf directive rnaPlotPath.

If it doesn't work...

  • "It won't compile although I can see just a warning"
    • Remove the -Wall statement from the main makefile
  • "It won't compile and complains about missing MYSQL libraries"
    • Make sure that you have installed the MySQL libraries (ubuntu/debian: "apt-get install libmysqlclient15-dev", RedHat/Centos/Fedora: "yum install mysql-devel")
  • "Nothing shows up, not even the blue side menu"
    • If they don't then the documentRoot statement in the Apache configuration file is wrong or you have forgotten to restart your Apache daemon ("/etc/init.d/httpd restart" or "/etc/init.d/apache restart")
  • "There is something, but I cannot see the menu where I can select the right genome"
    • Try to run the script printEnv.pl from your browser, by typing a URL like http://genome.myUniversity.edu/cgi-bin/printEnv.pl into the address bar to see if scripts are executed at all
    • Verify that Apache is really executing the cgi-bin programs and not just downloading them. When you access cgi-bin/hgGateway with a browser and they are not executed, check this:
      • Does your Apache Configuration file contain a ScriptAlias directive? This directive has to reference the cgi-bin directory.
      • There is no use in trying Apache's "AddHandler cgi-script .cgi" directive as often suggested by the Apache documentation. It doesn't work because the cgi scripts to not end in .cgi
      • An alternative is to create a file .htaccess in the cgi-bin directory and add the line "DefaultType application/x-httpd-cgi" (make sure that .htaccess interpretation is activated in apache for this directory, see the apache documentation)
    • Have you restarted your web server after you updated your apache configuration file?
    • Try to use a different internet browser or shift-click the reload button of firefox: If you have just fixed something, firefox will have cached the error although everything is fine
    • Check if the tables genomes/clades in your hgcentral table make sense (todo: extend this section)
    • If you're using CentOS or another linux distribution with SELinux: Try "setenforce 0" and test it again, to rule out any SELinux issues
  • "I am seeing error messages relating to gbdb... "
    • Make sure that the apache user can access the /gbdb directory
      • Check the file permissions of the /gbdb directory in respect to the user under which apache is running (debian/ubuntu: www-data)
      • If you are not sure how to interpret file permissions, login as the apache user ("sudo su apache") and try to read any file in /gbdb. Also check if you can write to your trash directory (echo test > test)
    • If you're using CentOS or another linux distribution with SELinux: Try "setenforce 0" and test it again, to rule out any SELinux issues
  • "Couldn't set connection database to hg18"
    • Create an empty hg18 database: 'hgsql -e "create database hg18;" hg19'
  • "Couldn't connect to database (null) on localhost as readwrite. Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock'"
    • try mysql --socket /var/lib/mysql/mysql.sock, if this doesn't work then you have to move your mysql socket (change /etc/mysql/my.cnf)
  • "I see mysql-access errors everywhere"
    • Verify that the Mysql user you added to hg.conf can read the mysql databases and can write to the customTracks database.
    • To test this, log into mysql (mysql -u <your user from hg.conf> -p) and try to access the assembly that does not work ("select * from hg19.chromInfo;")
    • For other Mysql error messages, set the environment variable "JKSQL_TRACE=on" and run the CGI from the command line (see below) to see what's going on
  • "CGI-programs are crashing"
    • Use the arguments from the URL and replace the & with spaces
    • Call the CGI from a command line and copy the arguments just after the CGI, like this "/var/www/cgi-bin/hgc hgsid=337 o=14394787 t=14395353"
    • You should get a coredump which you can analyse with gdb
    • You can combine this with the JKSQL_TRACE=on (see above)
    • If you're using CentOS or another linux distribution with SELinux: Try "setenforce 0" and test it again, to rule out any SELinux issues

see also