User talk:Fubar: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
 
(12 intermediate revisions by one other user not shown)
Line 1: Line 1:
'''Some random notes on installing a local mirror on an x86_64 CentOS linux box'''
'''Things worth remembering that might be useful for other UCSC mirror builders'''


==Introduction==
== Notes ==


''I hope these notes might be useful for anyone trying to download and setup a local mirror''
Galaxy notes: [[Fubar:GalaxyNotes]]


I just finished a limited install on a x86_64 CentOS 4.3 machine. There are some general ideas and some SELinux specific gotcha's worth knowing about. I only grabbed hg18 because that's all I needed for my local mirror.
Some random notes on installing a mirror under CentOS: [[CentOS_notes]]


==Useful links==
Some notes on adding a custom track: [[Custom_track]]


The instructions on this wiki at [[Browser_Installation]] were very helpful. When things got wierd, the various README files in JK's source were useful but sometimes you just need to look at the source. Before you start, understand that there is no single recipe that I could find (or write) to make this painless. You're undertaking a fairly complex task in setting up a local mirror. Most of the available  instructions are slightly out of date or for a different environment than yours - the good folks at UCSC are doing a great job and they will help if you ask sensible questions after doing your own due diligence, but this is not a commodity software installation and should probably not be undertaken unless you really know your way around as root with linux/apache/mysql. This will likely not be a pleasant experience if it's your first undertaking as a system administrator. However, with some judicious googling I was able to fake it so you probably can too :)
== Code of potential use for mirror sites ==


==Downloading==
The ucsc wiki can't set a cookie for your local mirror, so here's an example of
LDAP authentication for local mirrors: [[Fubar:LDAP_auth]]


Allow plenty of time and bandwidth. Downloading the necessary datasets, especially gbdb/hg18 takes a very, very long time. There's a lot there!
== Gripes ==


==Alternate site roots==
Wiggle tracks need a bar graph option - points or joined points can be very misleading!
Ah, bedGraph mentioned in ..kent/src/hg/makeDb/trackDb/README does the trick. It's an ordinary 9 column bed file with extra options including bars which are much more appropriate for (eg) -log10(p) values from an analysis eg. I am making custom tracks using (eg)
<code><pre>
track bdrabpctmincov
priority 2.100
shortLabel bdrabpctmincov
longLabel CAMP 2007 pbat additive model -log10(p) bdrabpctmincov 
visibility hide
type bedGraph
autoScale off
viewLimits 0:7
minLimit 0
maxLimit 7
maxHeightPixels 60:40:40
color 56,88,9
group phenDis
</pre></code>


If you want your browser to appear at yoursite/goldenpathmirror/ - don't bother. I gave up after a day of trying to get everything working anywhere but at the root of the website. All sorts of hardcoded references to ../trash and such like. Once I gave up and reverted to /var/www/html as the root, everything seemed to work better... If anyone figures out how to do that, please let me know?
== Solutions ==
I think you may be confusing "wiggle tracks" here with the "chromGraph" tracks in the
"Genome Graphs" function of the browser. Wiggle tracks are by default bar graphs.
The width of the bar in one of these graphs is the "span" of the data. Each specified
data point applies to "span" number of bases. See also, the discussion about
the proper usage of "span" in [[Wiggle_BED_to_variableStep_format_conversion]].


==Rsync of MySQL tables==
On the other hand, the "Genome Graphs" "chromGraph" data is completely different
than wiggle tracks.  In the chromGraph format, each specified data point is merely
a point and has no defined span to apply to a number of bases.  In this graph,
each point is simply connected to the next point with a line.  It is a line
graph, not a bar graph.  The chromGraph data does not have the ability to
define how large to make bars for a bar graph.  Certainly something to think
about for future improvements, but the purpose of chromGraph data is different than wiggle tracks.


rsyncing the mysql directories directly works a treat - if you have mysql 4.x as CentOS does. I did not undergo the pain of downloading the raw SQL and rebuilding all those tables and indices. Be aware that the files become unavailable while they're being updated in the early morning, so you may find some of your rsync jobs end with 'file unavailable' errors - don't panic - re-starting the same rsync command will work fine and the successfully downloaded files won't be downloaded again. Don't forget to restart mysql <code>
Also, your note about mentions the bedGraph wiggle formatPlease note the discussion
/etc/mysqld restart
of the drawbacks of the bedGraph wiggle format in the above mentioned
</code>
link: [[Wiggle_BED_to_variableStep_format_conversion]]
as root after all the mysql database directories have been downloaded. There's a fair bit of mysql wrangling to do to get the various tables and permissions set up - google can be very helpful but basically, you may need someone with mysql experience to help out. I also fiddled with my.cnf since the default doesn't allocate much ram for keys. That's a whole other story - after your mirror has been running for a couple of days, you might find [[tuning-primer.sh]] handy for tips on adjusting my.cnf.


==CentOS preparation==
[[User:Hiram|Hiram]] 09:12, 2 July 2007 (PDT)
 
You must have all the development libraries for MySQL installed - use yum list and yum install as needed until you see something like this:
<tt>
[root@meme rerla]# yum list mysql*
Setting up repositories
update                    100% |=========================|  951 B    00:00   
base                      100% |=========================| 1.1 kB    00:00   
addons                    100% |=========================|  951 B    00:00   
extras                    100% |=========================| 1.1 kB    00:00   
Reading repository metadata in from local files
Installed Packages
mysql.x86_64                            4.1.20-1.RHEL4.1      installed       
mysql-devel.x86_64                      4.1.20-1.RHEL4.1      installed     
mysql-server.x86_64                      4.1.20-1.RHEL4.1      installed     
</tt>
 
 
The following environment settings (bash) should work for standard CentOS 4.4 fully patched systems - they worked for me.
<tt>
export  MYSQLINC='/usr/include/mysql'
export MYSQLLIBS='/usr/lib64/mysql/libmysqlclient.a -lz -lcrypto -lssl -lm -lnsl'
export GLOBAL_CONFIG_FILE=/var/www/cgi-bin/hg.conf
export HGCGI=/var/www/cgi-bin
</tt>
 
If you have SE Linux enforcing on, you need to do something like:
<code>chcon -R  -u system_u -r object_r -t httpd_sys_content_t /var/www</code>
so apache can run cgi's and generally do useful things.
 
While rsync for the mysql tables directly is what I recommend, rsync of the executables did not work for me - there was a problem with the authentication scheme the exe mysql client was trying to use and setting old passwords on in /etc/my.cnf did not fix it. So I'd recommend doing your own compile from a fresh checkout of the source. Strongly recommend that you do a [[http://genome.ucsc.edu/admin/cvs.html cvs checkout]] of the source - that fixed a couple of misfeatures I found using the jksrc.zip file. Turns out the download of that file is refreshed every 2 weeks or so - for new features such as genome graphs that I was particularly interested in, a fresh CVS checkout and recompile solved a lot of issues.
 
The build instructions described on that CVS page are different from the ones that worked for me - you'll probably want to do the checkout, then follow the build instructions at [[Browser_Installation]]. I think there are UCSC specific issues mixed up in some documentation.
 
==Getting Sessions working==
The trick is discussed in http://www.soe.ucsc.edu/pipermail/genome-mirror/2007-February/000344.html
 
Add the following lines to hg.conf and bounce apache.
<tt>
# wiki connection data
wiki.host=genomewiki.ucsc.edu
wiki.userNameCookie=wikidb_mw1_UserName
wiki.loggedInCookie=wikidb_mw1_UserID
</tt>
 
 
==Gratuitous Praise==
 
I have to add that the new genome graphs view is just what the doctor ordered for those of us struggling with whole genome snp (eg 500k) data sets and I am busy nagging the UCSC folks to add enhancements to make genome graphs available to all users and to make the wig importer smart enough to find the position of snps (or microsatellite) identifiers - meanwhile I'm writing some scripts to take pbat and plink output and turn it into wig tracks. Wish me luck. I'll make them available when they're at least working if anyone wants them and will edit this page when they're ready - watch this page if you're interested!

Latest revision as of 16:12, 2 July 2007

Things worth remembering that might be useful for other UCSC mirror builders

Notes

Galaxy notes: Fubar:GalaxyNotes

Some random notes on installing a mirror under CentOS: CentOS_notes

Some notes on adding a custom track: Custom_track

Code of potential use for mirror sites

The ucsc wiki can't set a cookie for your local mirror, so here's an example of LDAP authentication for local mirrors: Fubar:LDAP_auth

Gripes

Wiggle tracks need a bar graph option - points or joined points can be very misleading! Ah, bedGraph mentioned in ..kent/src/hg/makeDb/trackDb/README does the trick. It's an ordinary 9 column bed file with extra options including bars which are much more appropriate for (eg) -log10(p) values from an analysis eg. I am making custom tracks using (eg)

 
track bdrabpctmincov
priority 2.100
shortLabel bdrabpctmincov
longLabel CAMP 2007 pbat additive model -log10(p) bdrabpctmincov  
visibility hide
type bedGraph
autoScale off
viewLimits 0:7
minLimit 0
maxLimit 7
maxHeightPixels 60:40:40
color 56,88,9
group phenDis

Solutions

I think you may be confusing "wiggle tracks" here with the "chromGraph" tracks in the "Genome Graphs" function of the browser. Wiggle tracks are by default bar graphs. The width of the bar in one of these graphs is the "span" of the data. Each specified data point applies to "span" number of bases. See also, the discussion about the proper usage of "span" in Wiggle_BED_to_variableStep_format_conversion.

On the other hand, the "Genome Graphs" "chromGraph" data is completely different than wiggle tracks. In the chromGraph format, each specified data point is merely a point and has no defined span to apply to a number of bases. In this graph, each point is simply connected to the next point with a line. It is a line graph, not a bar graph. The chromGraph data does not have the ability to define how large to make bars for a bar graph. Certainly something to think about for future improvements, but the purpose of chromGraph data is different than wiggle tracks.

Also, your note about mentions the bedGraph wiggle format. Please note the discussion of the drawbacks of the bedGraph wiggle format in the above mentioned link: Wiggle_BED_to_variableStep_format_conversion

Hiram 09:12, 2 July 2007 (PDT)