User talk:Fubar: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
 
(24 intermediate revisions by one other user not shown)
Line 1: Line 1:
'''Some notes on installing a local mirror on an x86_64 CentOS linux box'''
'''Things worth remembering that might be useful for other UCSC mirror builders'''


'I hope these notes are useful for anyone trying to download and setup a local mirror'
== Notes ==


I just finished a limited install on a x86_64 CentOS 4.3 machine. There are some gotcha's worth knowing about.
Galaxy notes: [[Fubar:GalaxyNotes]]
I only grabbed hg18 because that's all I needed for my local mirror.


The instructions on this wiki at [[Browser_Installation]] were very helpful. When things got wierd, the various README files in JK's source were useful but sometimes you just need to look at the source. Before you start, understand that there is no single recipe that I could find (or write) to make this painless. You're undertaking a fairly complex task in setting up a local mirror. Most of the available  instructions are slightly out of date or for a different environment than yours - the good folks at UCSC are doing a great job and they will help if you ask sensible questions after doing your own due diligence, but this is not a commodity software installation and should probably not be undertaken unless you really know your way around as root with linux/apache/mysql. This will likely not be a pleasant experience if it's your first undertaking as a system administrator. However, with some judicious googling I was able to fake it so you probably can too :)
Some random notes on installing a mirror under CentOS: [[CentOS_notes]]


Firstly, downloading the necessary datasets, especially gbdb/hg18 takes a very, very long time. There's a lot there.
Some notes on adding a custom track: [[Custom_track]]
Secondly, I gave up after a day of trying to get everything working anywhere but at the root of the website. All sorts of hardcoded references to ../trash and such like. Once I gave up and reverted to /var/www/html as the root, everything seemed to work better...
Thirdly, rsyncing the mysql directories directly works a treat. The files become unavailable while they're being updated in the early morning, so you may find some of your rsync jobs end with file unavailable errors - don't panic - re-starting the same rsync command will work fine and the successfully downloaded files won't be downloaded again. Don't forget to restart mysql <code>/etc/mysqld restart</code> as root after all the mysql database directories have been downloaded.


You must have all the development libraries for MySQL installed - use yum list and yum install as needed until you see something like this:
== Code of potential use for mirror sites ==
<tt>
[root@meme rerla]# yum list mysql*
Setting up repositories
update                    100% |=========================|  951 B    00:00   
base                      100% |=========================| 1.1 kB    00:00   
addons                    100% |=========================|  951 B    00:00   
extras                    100% |=========================| 1.1 kB    00:00   
Reading repository metadata in from local files
Installed Packages
mysql.x86_64                            4.1.20-1.RHEL4.1      installed       
mysql-devel.x86_64                      4.1.20-1.RHEL4.1      installed     
mysql-server.x86_64                      4.1.20-1.RHEL4.1      installed     
</tt>


The ucsc wiki can't set a cookie for your local mirror, so here's an example of
LDAP authentication for local mirrors: [[Fubar:LDAP_auth]]


The following should work for standard CentOS 4.4 fully patched systems - they worked for me.
== Gripes ==
<tt>
export  MYSQLINC='/usr/include/mysql'
export MYSQLLIBS='/usr/lib64/mysql/libmysqlclient.a -lz -lcrypto -lssl -lm -lnsl'
export GLOBAL_CONFIG_FILE=/var/www/cgi-bin/hg.conf
export HGCGI=/var/www/cgi-bin
</tt>


If you have SE Linux enforcing on, you need to do something like:
Wiggle tracks need a bar graph option - points or joined points can be very misleading!
<code>chcon -R -u system_u -r object_r -t httpd_sys_content_t /var/www</code>
Ah, bedGraph mentioned in ..kent/src/hg/makeDb/trackDb/README does the trick. It's an ordinary 9 column bed file with extra options including bars which are much more appropriate for (eg) -log10(p) values from an analysis eg. I am making custom tracks using (eg)
so apache can run cgi's and generally do useful things.
<code><pre>
track bdrabpctmincov
priority 2.100
shortLabel bdrabpctmincov
longLabel CAMP 2007 pbat additive model -log10(p) bdrabpctmincov  
visibility hide
type bedGraph
autoScale off
viewLimits 0:7
minLimit 0
maxLimit 7
maxHeightPixels 60:40:40
color 56,88,9
group phenDis
</pre></code>


Strongly recommend that you do a [[http://genome.ucsc.edu/admin/cvs.html cvs checkout]] of the source - that fixed a couple of misfeatures I found using the jksrc.zip file. Turns out the download of that file is refreshed every 2 weeks or so - for new features such as genome graphs that I was particularly interested in, a fresh CVS checkout and recompile solved a lot of issues.  
== Solutions ==
I think you may be confusing "wiggle tracks" here with the "chromGraph" tracks in the
"Genome Graphs" function of the browser. Wiggle tracks are by default bar graphs.
The width of the bar in one of these graphs is the "span" of the data. Each specified
data point applies to "span" number of bases. See also, the discussion about
the proper usage of "span" in [[Wiggle_BED_to_variableStep_format_conversion]].


The build instructions described on that CVS page are different from the ones that worked for me - you'll probably want to do the checkout, then follow the build instructions at [[Browser_Installation]]. I think there are UCSC specific issues mixed up in some documentation.
On the other hand, the "Genome Graphs" "chromGraph" data is completely different
than wiggle tracks.  In the chromGraph format, each specified data point is merely
a point and has no defined span to apply to a number of bases.  In this graph,
each point is simply connected to the next point with a line.  It is a line
graph, not a bar graph.  The chromGraph data does not have the ability to
define how large to make bars for a bar graph.  Certainly something to think
about for future improvements, but the purpose of chromGraph data is different than wiggle tracks.
 
Also, your note about mentions the bedGraph wiggle format.  Please note the discussion
of the drawbacks of the bedGraph wiggle format in the above mentioned
link: [[Wiggle_BED_to_variableStep_format_conversion]]
 
[[User:Hiram|Hiram]] 09:12, 2 July 2007 (PDT)

Latest revision as of 16:12, 2 July 2007

Things worth remembering that might be useful for other UCSC mirror builders

Notes

Galaxy notes: Fubar:GalaxyNotes

Some random notes on installing a mirror under CentOS: CentOS_notes

Some notes on adding a custom track: Custom_track

Code of potential use for mirror sites

The ucsc wiki can't set a cookie for your local mirror, so here's an example of LDAP authentication for local mirrors: Fubar:LDAP_auth

Gripes

Wiggle tracks need a bar graph option - points or joined points can be very misleading! Ah, bedGraph mentioned in ..kent/src/hg/makeDb/trackDb/README does the trick. It's an ordinary 9 column bed file with extra options including bars which are much more appropriate for (eg) -log10(p) values from an analysis eg. I am making custom tracks using (eg)

 
track bdrabpctmincov
priority 2.100
shortLabel bdrabpctmincov
longLabel CAMP 2007 pbat additive model -log10(p) bdrabpctmincov  
visibility hide
type bedGraph
autoScale off
viewLimits 0:7
minLimit 0
maxLimit 7
maxHeightPixels 60:40:40
color 56,88,9
group phenDis

Solutions

I think you may be confusing "wiggle tracks" here with the "chromGraph" tracks in the "Genome Graphs" function of the browser. Wiggle tracks are by default bar graphs. The width of the bar in one of these graphs is the "span" of the data. Each specified data point applies to "span" number of bases. See also, the discussion about the proper usage of "span" in Wiggle_BED_to_variableStep_format_conversion.

On the other hand, the "Genome Graphs" "chromGraph" data is completely different than wiggle tracks. In the chromGraph format, each specified data point is merely a point and has no defined span to apply to a number of bases. In this graph, each point is simply connected to the next point with a line. It is a line graph, not a bar graph. The chromGraph data does not have the ability to define how large to make bars for a bar graph. Certainly something to think about for future improvements, but the purpose of chromGraph data is different than wiggle tracks.

Also, your note about mentions the bedGraph wiggle format. Please note the discussion of the drawbacks of the bedGraph wiggle format in the above mentioned link: Wiggle_BED_to_variableStep_format_conversion

Hiram 09:12, 2 July 2007 (PDT)