Browser Installation: Difference between revisions

From genomewiki
Jump to navigationJump to search
Line 126: Line 126:


After the tables have been created it is necessary to add the required users along with their associated permissions. The entire process of MySQL configuration is described in /src/product/README.mysql.setup as found in jksrc. In brief 3 users are required. These users are readonly, readwrite, browser. These users are configured as follows:
After the tables have been created it is necessary to add the required users along with their associated permissions. The entire process of MySQL configuration is described in /src/product/README.mysql.setup as found in jksrc. In brief 3 users are required. These users are readonly, readwrite, browser. These users are configured as follows:




Line 140: Line 141:
|  SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER || hgcentral  ||  browser(?)   
|  SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER || hgcentral  ||  browser(?)   
|}
|}
Each database must have these 3 users added with the associated permissions. The easiest way to accomplish this is to use the
script ''ex.MYSQLUserPerms.sh'' which can be found in ''src/products'' in jksrc. The script sets the permissions on each database
listed by name. '''NOTE:'''This script must be edited before use!. The script handles each database explicitly by name. It is likely that the script does not contain the latest set of database names. A current list of database names must be generated and any which are missing will need to be added. Also future updates to the database may require additional changes to the script. As an alternative , it is possible at the cost of a small amount of security to set the permissions globally using *.* edits. An example of the required edit to the script so that permissions are added globally is:
    ${MYSQL} -e "GRANT SELECT, UPDATE on *.* to browser@localhost IDENTIFIED BY 'password';"mysql
After the edits are made the script will add these 3 users to all of the databases found used by the browser. These permissions are limited to localhost for security reasons. ''ex.MYSQLUserPerms.sh'' is heavily documented and should be read to make sure that the changes discussed above are understood.
After adding the MySql Users it is necessary to add '''hg.conf''' to the cgi-bin directory. Optionally add '''hg.conf''' to any developers' home directories.'''hg.conf''' contains username/password information and is required by various scripts which access the the databases, in particular the cgi-bin scripts. A sample '''hg.conf''' can be found (TODO:here). More discussion of this script can be in README.mysql.install which is located in ''/src/products in jksrc''. The default user/password combinations and permissions can be changed, however doing so will require editing of other scripts which have the user/passwords hardcoded in them (notably ex.MYSQLUserPerms.sh). It is probably best to keep the defaults at least until one knows what one is doing.
In '''hg.conf''' you will need to set the document root:
  '' browser.documentRoot=/var/www/html''
The actual path could be different depending on your actual root directory. After an appropriate '''hg.conf''' has been created, it should be installed in ''/var/html/cgi-bin'' and the permissions set to 600 (my setup has the file owner/group of apache/apache).

Revision as of 02:34, 9 December 2006

Installing a Browser Mirror on Red Hat and its Derivatives

The following How-to provides a step-by-step procedure for installing the Genome Browser on a Mandriva 2006/2007 system. It should also work on any other Red Hat derived Linux OS.

These instructions are based on the procedures outlined in the general Browser Mirror Procedures as well as various HOWTOs found in jksrc. I have tried to combine the various steps into an easy to follow procedure which does will not require a lot of specialized knowledge to successfully complete the installation.

The procedure described below will create a full mirror of the Genome Browser. Most users are probably only interested in creating a partial mirror. See the general Browser Mirror Procedures and the Partial Mirror document for more information on how to create and maintain a partial mirror.


Prerequisites:

It is assumed that you have both Apache 2 web-server and MySql 4.1.x installed. The Genome Browser is currently running under MySql 4 but a move to MySql 5 is being investigated. A full mirror will require a lot of disk space. Currently (Dec 2006) a clean install of the Browser consumed close to 2T of storage. Additional space will also be required for future database expansion. A partial mirror will need less (perhaps far less). Plan accordingly. (TODO: link for discussion of minimum configuration).


1. Get Executables

More details for these methods can be found in the official mirror docs and README.building.source. Download the released zipped version of the source files from here. Follow the instructions for how to compile the files.

  • Option 1: Use rsync to get a copied of the compiled binaries.
  rsync -avzP rsync://hgdownload.cse.ucsc.edu/cgi-bin/ /var/www/cgi-bin

This command will grab the AMD Opteron x86_64 binaries. These binaries do not seem to be Opteron specific and apparently work with any of the AMD Athlon processors as well. If the binaries work for you they represent the easiest way forward.

  • Option 2: Get the jksrc files and compile the executables yourself. Compiling the source can be a challenge if things don't work out of the box. However by compiling the jksrc tree you will get many useful tools and scripts which will be of value for bioinformatic and admin tasks.
  • Option 3: (Suggested method) Download jksrc. Attempt to compile the source files. If this initially proves to be problematic, use rsync to get the pre-compiled executables and compile the source later. You will find several useful scripts and some of the browser documentation in /src/product (in the jksrc archive). The scripts and docs in the source tree are available whether or not you successfully compile the entire tree. (TODO: link to compile howto)


2.Configure Apache server

In order to support SSI it is necessary to set the XBitHack. Add the following somewhere in /etc/httpd/conf/httpd.conf

     XBitHack on
     <Directory /var/www/html>
     Options +Includes
     </Directory>

Find the location of your web pages. This should be /var/www/html by default. Set the enviromental variable if desired.

     export WEBROOT="/var/www/html"

Find the location of your cgi-bin directory. This should be /var/www/cgi-bin. Set the enviromental variable if desired.

     export CGI-BIN="/var/www/cgi-bin"

Next, find the location of your MySQL data. This should be located in /var/lib/mysql. Set the enviromental variable if desired.

     export MYSQLDATA="/var/lib/mysql"

Note: These variables can be set in /etc/profile so they will be available globally to all users. Also they can be skipped entirely if absolute paths are used instead.


3. Get all the html files

Test the rysnc connection:

   rsync -navz --progress rsync://hgdownload.cse.ucsc.edu

Determine the destination of the copy ($WEBROOT) and fire off the production copy. The trailing slash is important!

   rsync -avzP rsync://hgdownload.cse.ucsc.edu/htdocs/ /var/www/html/


4. Optional: Get the data for each individual genome assembly (TODO: verify this entire section!!! Is it even necessary!!)

The individual assemblies are only required if the mysql tables are going to be build from scratch or if direct access to the assemblies is needed for research purposes (apparently unusual). It is generally preferable to download the mysql tables directly (covered later in this document) and so this step can probably be skipped.

Here is an example of how to rsync a single database assembly (Human)

Get Human March 2006 full data set (hg18: 13.0 Gb) by doing:

     mkdir -p $WEBROOT/goldenPath/hg18/database/
     rsync -avzP --delete --max-delete=20 \
     rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/hg18/database/ $WEBROOT/goldenPath/hg18/database/

This process must be repeated for each of the desired databases.


A list of all possible assemblies found on browser can be created by issuing the command

     rsync -v rsync://hgdownload.cse.ucsc.edu/genome/goldenPath > goldenPath.dat


A list of all mysql data tables can be created by issuing the command

     rsync -v rsync://hgdownload.cse.ucsc.edu/mysql/ > gb_tables.dat


The lists retrieved by these two commands will be quite similar. In both cases it will be necessary to edit the directory names returned to be certain only data tables are being included.


5. Obtain the /gbdb data file area

You will need the portions of /gbdb used by the browser. (This is a large download):

     rsync -avzP --delete --max-delete=20 rsync://hgdownload.cse.ucsc.edu/gbdb/ /gbdb/


6. Set up database tables:

These instructions should be followed in conjunction with /src/product/README.mysql.setup.There are two ways to install the tables.The first involves building the tables from the assembly dumps (optionally downloaded above).The second and preferable method involves rsyncing the binary tables themselves. This second method is preferable since the mysql table build process using the assemblies is quite computationally intensive.


Caveats for direct syncing:

  • Your MySql version must be compatible with the table version (currently 4.0.x)
  • The hgcentral (and others?) table which is found in /var/lib/mysql/ must recieve special handling (covered later).
  • The actual download size of the tables is more than simply downloading the assemblies. This is because of the extensive use of indexes in the tables.
  • The method for installing the tables using the assemblies is covered in the official mirror docs and is not covered here


To proceed with syncing the tables directly issue the following command:

         rsync -avzP --delete --max-delete=20 rsync://hgdownload.cse.ucsc.edu/mysql/XXX/ /var/lib/mysql/XXX/

where XXX is the name of each table to be mirrored. You will need to generate a list of tables to be mirrored. Note you can NOT simply sync with hgdownload.cse.ucsc.edu/mysql since the mysql directory contains a number of files and sub directories which are specific to each instance of the mysql database.

An unedited list of potential tables to be mirrored can be found by issuing the command:

         rsync -v --dry-run rsync://hgdownload.cse.ucsc.edu/mysql

This list will then have to be edited so that only the correct tables are mirrored. The script (TODO:sync_tables) can be used to download a complete list of eligable tables and to automatically sync with each.

After the tables have been created it is necessary to add the required users along with their associated permissions. The entire process of MySQL configuration is described in /src/product/README.mysql.setup as found in jksrc. In brief 3 users are required. These users are readonly, readwrite, browser. These users are configured as follows:


User MySql Permission Databases Used by
browser SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER All except hgcentral developers
readonly SELECT All except hgcentral CGI scripts
readwrite SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER hgcentral browser(?)


Each database must have these 3 users added with the associated permissions. The easiest way to accomplish this is to use the script ex.MYSQLUserPerms.sh which can be found in src/products in jksrc. The script sets the permissions on each database listed by name. NOTE:This script must be edited before use!. The script handles each database explicitly by name. It is likely that the script does not contain the latest set of database names. A current list of database names must be generated and any which are missing will need to be added. Also future updates to the database may require additional changes to the script. As an alternative , it is possible at the cost of a small amount of security to set the permissions globally using *.* edits. An example of the required edit to the script so that permissions are added globally is:

   ${MYSQL} -e "GRANT SELECT, UPDATE on *.* to browser@localhost IDENTIFIED BY 'password';"mysql

After the edits are made the script will add these 3 users to all of the databases found used by the browser. These permissions are limited to localhost for security reasons. ex.MYSQLUserPerms.sh is heavily documented and should be read to make sure that the changes discussed above are understood.


After adding the MySql Users it is necessary to add hg.conf to the cgi-bin directory. Optionally add hg.conf to any developers' home directories.hg.conf contains username/password information and is required by various scripts which access the the databases, in particular the cgi-bin scripts. A sample hg.conf can be found (TODO:here). More discussion of this script can be in README.mysql.install which is located in /src/products in jksrc. The default user/password combinations and permissions can be changed, however doing so will require editing of other scripts which have the user/passwords hardcoded in them (notably ex.MYSQLUserPerms.sh). It is probably best to keep the defaults at least until one knows what one is doing.


In hg.conf you will need to set the document root:

   browser.documentRoot=/var/www/html

The actual path could be different depending on your actual root directory. After an appropriate hg.conf has been created, it should be installed in /var/html/cgi-bin and the permissions set to 600 (my setup has the file owner/group of apache/apache).