CentOS notes

From genomewiki
Jump to: navigation, search

Some random notes on installing a local mirror on an x86_64 CentOS linux box

Introduction

I hope these notes might be useful for anyone trying to download and setup a local mirror

I just finished a limited install on a x86_64 CentOS 4.3 machine. There are some general ideas and some SELinux specific gotcha's worth knowing about. I only grabbed hg18 because that's all I needed for my local mirror.

Useful links

The instructions on this wiki at Browser_Installation were very helpful. Not always consistent, but also very helpful was http://genome.ucsc.edu/admin/jk-install.html When things got wierd, the various README files in JK's source were useful but sometimes you just need to look at the source. Before you start, understand that there is no single recipe that I could find (or write) to make this painless. You're undertaking a fairly complex task in setting up a local mirror. Most of the available instructions are slightly out of date or for a different environment than yours - the good folks at UCSC are doing a great job and they will help if you ask sensible questions after doing your own due diligence, but this is not a commodity software installation and should probably not be undertaken unless you really know your way around as root with linux/apache/mysql. This will likely not be a pleasant experience if it's your first undertaking as a system administrator. However, with some judicious googling I was able to fake it so you probably can too :)

Downloading

Allow plenty of time and bandwidth. Downloading the necessary datasets, especially gbdb/hg18 takes a very, very long time. There's a lot there!

Alternate site roots

If you want your browser to appear at yoursite/goldenpathmirror/ - don't bother. I gave up after a day of trying to get everything working anywhere but at the root of the website. All sorts of hardcoded references to ../trash and such like. Once I gave up and reverted to /var/www/html as the root, everything seemed to work better... If anyone figures out how to do that, please let me know?

max: actually not that difficult: I had to adapt the apache config and changed everything in hg.conf. trash has to be ../trash from cgi-bin and html and cgi have to separate. So, you need a structure like this:

  • /var/www/genome/trash
  • /var/www/genome/cgi-bin
  • /var/www/genome/html

Rsync of MySQL tables

rsyncing the mysql directories directly works a treat - if you have mysql 4.x as CentOS does. I did not undergo the pain of downloading the raw SQL and rebuilding all those tables and indices. Be aware that the files become unavailable while they're being updated in the early morning, so you may find some of your rsync jobs end with 'file unavailable' errors - don't panic - re-starting the same rsync command will work fine and the successfully downloaded files won't be downloaded again. Don't forget to restart mysql

/etc/mysqld restart

as root after all the mysql database directories have been downloaded. There's a fair bit of mysql wrangling to do to get the various tables and permissions set up - google can be very helpful but basically, you may need someone with mysql experience to help out. I also fiddled with my.cnf since the default doesn't allocate much ram for keys. That's a whole other story - after your mirror has been running for a couple of days, you might find tuning-primer.sh handy for tips on adjusting my.cnf.

CentOS preparation

You must have all the development libraries for MySQL installed - use yum list and yum install as needed until you see something like this:

Note: UCSC uses the package MySQL-devel-community-5.0.67-0.rhel5

[root@meme rerla]# yum list mysql*
Setting up repositories
update                    100% |=========================|  951 B    00:00     
base                      100% |=========================| 1.1 kB    00:00     
addons                    100% |=========================|  951 B    00:00     
extras                    100% |=========================| 1.1 kB    00:00     
Reading repository metadata in from local files
Installed Packages
mysql.x86_64                             4.1.20-1.RHEL4.1       installed         
mysql-devel.x86_64                       4.1.20-1.RHEL4.1       installed       
mysql-server.x86_64                      4.1.20-1.RHEL4.1       installed       


The following environment settings (bash) should work for standard CentOS 4.4 fully patched systems - they worked for me.

export  MYSQLINC='/usr/include/mysql'
export MYSQLLIBS='/usr/lib64/mysql/libmysqlclient.a -lz -lcrypto -lssl -lm -lnsl'
export GLOBAL_CONFIG_FILE=/var/www/cgi-bin/hg.conf 
export HGCGI=/var/www/cgi-bin

If you have SE Linux enforcing on, you need to do something like:

chcon -R  -u system_u -r object_r -t httpd_sys_content_t /var/www

so apache can run cgi's and generally do useful things.

While rsync for the mysql tables directly is what I recommend, rsync of the executables did not work for me - there was a problem with the authentication scheme the exe mysql client was trying to use and setting old passwords on in /etc/my.cnf did not fix it. So I'd recommend doing your own compile from a fresh checkout of the source. Strongly recommend that you do a [cvs checkout] of the source - that fixed a couple of misfeatures I found using the jksrc.zip file. Turns out the download of that file is refreshed every 2 weeks or so - for new features such as genome graphs that I was particularly interested in, a fresh CVS checkout and recompile solved a lot of issues.

The build instructions described on that CVS page are different from the ones that worked for me - you'll probably want to do the checkout, then follow the build instructions at Browser_Installation. I think there are UCSC specific issues mixed up in some documentation.

Getting Sessions working

The trick is discussed in http://www.soe.ucsc.edu/pipermail/genome-mirror/2007-February/000344.html

Add the following lines to hg.conf and bounce apache.

# wiki connection data
wiki.host=genomewiki.ucsc.edu
wiki.userNameCookie=wikidb_mw1_UserName
wiki.loggedInCookie=wikidb_mw1_UserID


Gratuitous Praise

I have to add that the new genome graphs view is just what the doctor ordered for those of us struggling with whole genome snp (eg 500k) data sets and I am busy nagging the UCSC folks to add enhancements to make genome graphs available to all users and to make the wig importer smart enough to find the position of snps (or microsatellite) identifiers - meanwhile I'm writing some scripts to take pbat and plink output and turn it into wig tracks. Wish me luck. I'll make them available when they're at least working if anyone wants them and will edit this page when they're ready - watch this page if you're interested!