CentOS notes: Difference between revisions
Line 85: | Line 85: | ||
[[Category:Mirror Site FAQ]] | [[Category:Mirror Site FAQ]] | ||
[[Category:Installation]] |
Latest revision as of 19:44, 15 May 2012
Some random notes on installing a local mirror on an x86_64 CentOS linux box
Introduction
I hope these notes might be useful for anyone trying to download and setup a local mirror
I just finished a limited install on a x86_64 CentOS 4.3 machine. There are some general ideas and some SELinux specific gotcha's worth knowing about. I only grabbed hg18 because that's all I needed for my local mirror.
Useful links
The instructions on this wiki at Browser_Installation were very helpful. Not always consistent, but also very helpful was http://genome.ucsc.edu/admin/jk-install.html When things got wierd, the various README files in JK's source were useful but sometimes you just need to look at the source. Before you start, understand that there is no single recipe that I could find (or write) to make this painless. You're undertaking a fairly complex task in setting up a local mirror. Most of the available instructions are slightly out of date or for a different environment than yours - the good folks at UCSC are doing a great job and they will help if you ask sensible questions after doing your own due diligence, but this is not a commodity software installation and should probably not be undertaken unless you really know your way around as root with linux/apache/mysql. This will likely not be a pleasant experience if it's your first undertaking as a system administrator. However, with some judicious googling I was able to fake it so you probably can too :)
Downloading
Allow plenty of time and bandwidth. Downloading the necessary datasets, especially gbdb/hg18 takes a very, very long time. There's a lot there!
Alternate site roots
If you want your browser to appear at yoursite/goldenpathmirror/ - don't bother. I gave up after a day of trying to get everything working anywhere but at the root of the website. All sorts of hardcoded references to ../trash and such like. Once I gave up and reverted to /var/www/html as the root, everything seemed to work better... If anyone figures out how to do that, please let me know?
max: actually not that difficult: I had to adapt the apache config and changed everything in hg.conf. trash has to be ../trash from cgi-bin and html and cgi have to separate. So, you need a structure like this:
- /var/www/genome/trash
- /var/www/genome/cgi-bin
- /var/www/genome/html
Rsync of MySQL tables
rsyncing the mysql directories directly works a treat - if you have mysql 4.x as CentOS does. I did not undergo the pain of downloading the raw SQL and rebuilding all those tables and indices. Be aware that the files become unavailable while they're being updated in the early morning, so you may find some of your rsync jobs end with 'file unavailable' errors - don't panic - re-starting the same rsync command will work fine and the successfully downloaded files won't be downloaded again. Don't forget to restart mysql
/etc/mysqld restart
as root after all the mysql database directories have been downloaded. There's a fair bit of mysql wrangling to do to get the various tables and permissions set up - google can be very helpful but basically, you may need someone with mysql experience to help out. I also fiddled with my.cnf since the default doesn't allocate much ram for keys. That's a whole other story - after your mirror has been running for a couple of days, you might find tuning-primer.sh handy for tips on adjusting my.cnf.
CentOS preparation
You must have all the development libraries for MySQL installed - use yum list and yum install as needed until you see something like this:
Note: UCSC uses the package MySQL-devel-community-5.0.67-0.rhel5
[root@meme rerla]# yum list mysql* Setting up repositories update 100% |=========================| 951 B 00:00 base 100% |=========================| 1.1 kB 00:00 addons 100% |=========================| 951 B 00:00 extras 100% |=========================| 1.1 kB 00:00 Reading repository metadata in from local files Installed Packages mysql.x86_64 4.1.20-1.RHEL4.1 installed mysql-devel.x86_64 4.1.20-1.RHEL4.1 installed mysql-server.x86_64 4.1.20-1.RHEL4.1 installed
The following environment settings (bash) should work for standard CentOS 4.4 fully patched systems - they worked for me.
export MYSQLINC='/usr/include/mysql' export MYSQLLIBS='/usr/lib64/mysql/libmysqlclient.a -lz -lcrypto -lssl -lm -lnsl' export GLOBAL_CONFIG_FILE=/var/www/cgi-bin/hg.conf export HGCGI=/var/www/cgi-bin
If you have SE Linux enforcing on, you need to do something like:
chcon -R -u system_u -r object_r -t httpd_sys_content_t /var/www
so apache can run cgi's and generally do useful things.
While rsync for the mysql tables directly is what I recommend, rsync of the executables did not work for me - there was a problem with the authentication scheme the exe mysql client was trying to use and setting old passwords on in /etc/my.cnf did not fix it. So I'd recommend doing your own compile from a fresh checkout of the source. Strongly recommend that you do a [cvs checkout] of the source - that fixed a couple of misfeatures I found using the jksrc.zip file. Turns out the download of that file is refreshed every 2 weeks or so - for new features such as genome graphs that I was particularly interested in, a fresh CVS checkout and recompile solved a lot of issues.
The build instructions described on that CVS page are different from the ones that worked for me - you'll probably want to do the checkout, then follow the build instructions at Browser_Installation. I think there are UCSC specific issues mixed up in some documentation.
Getting Sessions working
The trick is discussed in http://www.soe.ucsc.edu/pipermail/genome-mirror/2007-February/000344.html
Add the following lines to hg.conf and bounce apache.
# wiki connection data wiki.host=genomewiki.ucsc.edu wiki.userNameCookie=wikidb_mw1_UserName wiki.loggedInCookie=wikidb_mw1_UserID
Gratuitous Praise
I have to add that the new genome graphs view is just what the doctor ordered for those of us struggling with whole genome snp (eg 500k) data sets and I am busy nagging the UCSC folks to add enhancements to make genome graphs available to all users and to make the wig importer smart enough to find the position of snps (or microsatellite) identifiers - meanwhile I'm writing some scripts to take pbat and plink output and turn it into wig tracks. Wish me luck. I'll make them available when they're at least working if anyone wants them and will edit this page when they're ready - watch this page if you're interested!