Using custom track database
Using the Custom Track Database
A new feature of the genome browser as of March 2007 is the ability to use a data base for custom tracks. Up to this date, custom track data has been kept in files in the /trash/ct/ directory. This article discusses the steps required to enable this function.
Summary configuration
- database loader binaries hgLoadBed, hgLoadWiggle and wigEncode are installed in /cgi-bin/loader/ - these are installed via the normal 'make cgi' in the source tree kent/src/hg/ directory.
- an empty customTrash database has been created on the MySQL host - create this manually once, the MySQL host name is a configuration item, the database name customTrash is not a configuration item
- temporary read-write data directory /data/tmp has been created with read/write/delete enabled for the Apache server effective user, this directory name is a configuration item
- configuration items are specified in /cgi-bin/hg.conf/ - this will turn on the function
- for command line access to the database, create a special ~/.hg.ct.conf to be used with the environment variable HGDB_CONF
- create a cron job to run a cleaner script to expire and remove older tables from the database - dbTrash command is used for this purpose
Host and database name
For performance and security considerations, the MySQL host for the custom track database can be a separate machine from the ordinary MySQL host that usually serves up the assembly databases or the hgcentral database. It is not required that the custom track database be on a separate MySQL server. The specification of the host machine is placed in the /cgi-bin/hg.conf file, for example a host machine called "ctdbhost":
customTracks.host=ctdbHost
The database name used on this host is fixed at customTrash which is a define in the source tree file hg/inc/customTrack.h
/cgi-bin/hg.conf configuration items
The following items must be specified in /cgi-bin/hg.conf to enable this function:
customTracks.host=ctdbhost customTracks.user=ctdbuser customTracks.password=ctdbpasswd customTracks.useAll=yes
Establish this user account and password in MySQL with db and user privileges:
Select, Insert, Update, Delete, Create, Drop, Alter for example with your MySQL root user account: hgsql -hctdbhost -uroot -p -e "GRANT SELECT,INSERT,UPDATE,DELETE,CREATE,DROP,ALTER" on customTrash TO ctdbuser@yourWebHost IDENTIFIED by 'ctdbpasswd';" mysql
Optionally, a temporary read-write directory used during database loading can be specified:
customTracks.tmpdir=/data/tmp
The default for this is /data/tmp and should be created with read/write/delete access for the Apache server effective user.
Database loaders
The database loaders used to load custom tracks are the standard loader commands found in the source tree, hgLoadBed, hgLoadWiggle and wigEncode. They are installed into /cgi-bin/loader/ with a 'make cgi' from the source tree directory kent/src/hg/ These loaders are used by the cgi binaries hgCustom, hgTracks, and hgTables to load custom tracks into the database. They are operated in an exec'd pipeline fashion, the code details can be see in src/hg/lib/customFactory.c
Command line access
Since the MySQL host may be different than your ordinary MySQL host, you will need to create a unique $HOME/.ct.hg.conf file to be used in the case where you want to manipulate this separate database with the kent source tree command line tools. This unique .ct.hg.conf is merely a copy of your normal .hg.conf file but with a different host/username/password specified:
db.host=ctdbhost db.user=ctdbuser db.password=ctdbpasswd
Remember to set the priviledges on this hg.conf file at 600:
chmod 600 $HOME/.ct.hg.conf
To enable the use of this file for subsequent command line operations, set the environment variable HGDB_CONF to point to this file, for example in the bash shell:
export HGDB_CONF=$HOME/.ct.hg.conf
With that in place, you can examine the contents of the customTrash database:
hgsql -e "show tables;" customTrash
This unique hg.conf file will also be used by the cleaner command dbTrash
Cleaner script
The database and the temporary data directory /data/tmp need to be kept clean. This is similar to the current cleaner script you have running on your /trash filesystem. In this case there is a specific source tree utility used to access and clean the database. The temporary data directory /data/tmp would stay clean if each and every loaded custom track was successfully loaded. In the case of badly formatted or illegal data submitted for the custom track, the database loaders do not remove their temporary files from /data/tmp This /data/tmp directory can be kept clean with, for example, an hourly cron job that performs:
find /data/tmp -type f -cmin +10 -exec rm -f {} \;
This would remove any file not accessed in the past 10 minutes.
The database cleaner command dbTrash should be run as a cron job encapsulated in a shell script something like this, which maintains a record of items cleaned to enable later analysis of custom track database usage statistics:
#!/bin/sh DS=`date "+%Y-%m-%d"` YYYY=`date "+%Y"` MM=`date "+%m"` export DS YYYY MM mkdir -p /data/trashLog/ctdbhost/${YYYY}/${MM} RESULT="/data/trashLog/ctdbhost/${YYYY}/${MM}/${DS}.txt" export RESULT /cluster/bin/x86_64/dbTrash -age=48 -drop -verbose=2 > ${RESULT} 2>&1
Running this once a day will remove any tables not accessed within the past 48 hours. The dbTrash command is found in the source tree in kent/src/hg/dbTrash