Using custom track database

From genomewiki
Jump to navigationJump to search

Using the Custom Track Database

A new feature of the genome browser as of March 2007 is the ability to use a data base for custom tracks. Up to this date, custom track data has been kept in files in the /trash/ct/ directory. This article discusses the steps required to enable this function.

Summary configuration

  • database loader binaries hgLoadBed, hgLoadWiggle and wigEncode are installed in /cgi-bin/loader/ - these are installed via the normal 'make cgi' in the source tree kent/src/hg/ directory.
  • an empty customTrash database has been created on the MySQL host - create this manually once, the MySQL host name is a configuration item, the database name customTrash is not a configuration item
  • temporary read-write data directory /data/tmp has been created with read/write/delete enabled for the Apache server effective user, this directory name is a configuration item
  • configuration items are specified in /cgi-bin/hg.conf/ - this will turn on the function
  • for command line access to the database, create a special ~/.hg.ct.conf to be used with the environment variable HGDB_CONF
  • create a cron job to run a cleaner script to expire and remove older tables from the database - dbTrash command is used for this purpose

Host and database name

For performance and security considerations, the MySQL host for the custom track database can be a separate machine from the ordinary MySQL host that usually serves up the assembly databases or the hgcentral database. It is not required that the custom track database be on a separate MySQL server. The specification of the host machine is placed in the /cgi-bin/hg.conf file, for example a host machine called "ctdbhost":

customTracks.host=ctdbHost

The database name used on this host is fixed at customTrash which is a define in the source tree file hg/inc/customTrack.h

/cgi-bin/hg.conf configuration items

The following items must be specified in /cgi-bin/hg.conf to enable this function:

customTracks.host=ctdbhost
customTracks.user=ctdbuser
customTracks.password=ctdbpasswd
customTracks.useAll=yes

Establish this user account and password in MySQL with db and user privileges:

Select, Insert, Update, Delete, Create, Drop, Alter
for example with your MySQL root user account:
hgsql -hctdbhost -uroot -p -e "GRANT SELECT,INSERT,UPDATE,DELETE,CREATE,DROP,ALTER"
on customTrash TO ctdbuser@yourWebHost IDENTIFIED by 'ctdbpasswd';" mysql

Optionally, a temporary read-write directory used during database loading can be specified:

customTracks.tmpdir=/data/tmp

The default for this is /data/tmp and should be created with read/write/delete access for the Apache server effective user.


Database loaders

The database loaders used to load custom tracks are the standard loader commands found in the source tree, hgLoadBed, hgLoadWiggle and wigEncode. They are installed into /cgi-bin/loader/ with a 'make cgi' from the source tree directory kent/src/hg/ These loaders are used by the cgi binaries hgCustom, hgTracks, and hgTables to load custom tracks into the database. They are operated in an exec'd pipeline fashion, the code details can be see in src/hg/lib/customFactory.c

Command line access

Since the MySQL host may be different than your ordinary MySQL host, you will need to create a unique $HOME/.ct.hg.conf file to be used in the case where you want to manipulate this separate database with the kent source tree command line tools. This unique .ct.hg.conf is merely a copy of your normal .hg.conf file but with a different host/username/password specified:

db.host=ctdbhost
db.user=ctdbuser
db.password=ctdbpasswd

Remember to set the priviledges on this hg.conf file at 600:

chmod 600 $HOME/.ct.hg.conf

To enable the use of this file for subsequent command line operations, set the environment variable HGDB_CONF to point to this file, for example in the bash shell:

export HGDB_CONF=$HOME/.ct.hg.conf

With that in place, you can examine the contents of the customTrash database:

hgsql -e "show tables;" customTrash

This unique hg.conf file will also be used by the cleaner command dbTrash

Cleaner script

The database and the temporary data directory /data/tmp need to be kept clean. This is similar to the current cleaner script you have running on your /trash filesystem. In this case there is a specific source tree utility used to access and clean the database. The temporary data directory /data/tmp would stay clean if each and every loaded custom track was successfully loaded. In the case of badly formatted or illegal data submitted for the custom track, the database loaders do not remove their temporary files from /data/tmp This /data/tmp directory can be kept clean with, for example, an hourly cron job that performs:

find /data/tmp -type f -cmin +10 -exec rm -f {} \;

This would remove any file not accessed in the past 10 minutes.

The database cleaner command dbTrash should be run as a cron job encapsulated in a shell script something like this, which maintains a record of items cleaned to enable later analysis of custom track database usage statistics:

#!/bin/sh

DS=`date "+%Y-%m-%d"`
YYYY=`date "+%Y"`
MM=`date "+%m"`
export DS YYYY MM

mkdir -p /data/trashLog/ctdbhost/${YYYY}/${MM}
RESULT="/data/trashLog/ctdbhost/${YYYY}/${MM}/${DS}.txt"
export RESULT

/cluster/bin/x86_64/dbTrash -age=48 -drop -verbose=2 > ${RESULT} 2>&1

Running this once a day will remove any tables not accessed within the past 48 hours.