Running your own gfServer: Difference between revisions
No edit summary |
(Added instructions for dynamic gfServer) |
||
Line 1: | Line 1: | ||
BLAT servers (gfServer) are configured as either static or dynamic servers. | |||
Static BLAT serves index a genome when started and remain running in memory to quickly respond to request. Dynamic BLAT servers pre-index genomes to files | |||
and are run on demand to handle a BLAT request and then exit. | |||
Static gfServer are easier to configure and faster to respond. However, the server | |||
continually uses memory. A dynamic gfServer is more appropriate with multiple | |||
assemblies and infrequent use. Their response time is usually acceptable; however, it varies with the speed of the disk containing the index. With | |||
repeated access, the operating system will cache the indexes in memory, | |||
improving response time. | |||
Both database-based assemblies or assembly hubs may be configured to use either type of BLAT server. | |||
'''NOTE: dynamic BLAT servers are not yet available. They are expected to be released in March 2021''' | |||
== Configuring a static gfServer == | |||
* If you want to run your own blat server you need a lot of spare memory on the machine. You may also want to review our mailing list archives for [https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!searchin/genome/gfServer gfServer troubleshooting advice]. | * If you want to run your own blat server you need a lot of spare memory on the machine. You may also want to review our mailing list archives for [https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!searchin/genome/gfServer gfServer troubleshooting advice]. | ||
* You need two servers, one for protein queries, one for normal DNA queries. | * You need two servers, one for protein queries, one for normal DNA queries. | ||
Line 14: | Line 30: | ||
* On RedHat you might need SELinux permissions: | * On RedHat you might need SELinux permissions: | ||
sudo chcon --type=httpd_sys_content_t /gbdb/ci1/ci1.2bit | sudo chcon --type=httpd_sys_content_t /gbdb/ci1/ci1.2bit | ||
== Configuring a dynamic gfServer == | |||
A xinetd super-server starts a dynamic BLAT server to handle a single user request. It loads a pre-built index from disk for the request A single | |||
xinetd server handles multiple genomes and, nucleotide, protein-translated, and | |||
protein queries. Genomes indexes must be pre-built, with all of them installed | |||
or linked under a common directory hierarchy, called the gfServer root | |||
directory. | |||
=== Configuring xinetd === | |||
The xinetd, or the older inetd server is a standard package on UNIX /Linux | |||
systems. It is a facility that runs a program to handle an internet server | |||
request. A system administrator generally configures it. The | |||
server runs the services as an unprivileged users. Please see your operating system documentation for more details. | |||
An example configuration file below. It launches gfServer with two arguments, | |||
the literal string "dynserver" and the gfServer root directory path. | |||
<pre> | |||
service blat | |||
{ | |||
port = 5010 | |||
socket_type = stream | |||
wait = no | |||
user = blatuser | |||
group = genecats | |||
server = /mnt/data/dyn-blat/bin/gfServer | |||
server_args = dynserver /mnt/data/dyn-blat/genomes | |||
type = UNLISTED | |||
log_on_success += USERID EXIT | |||
log_on_failure += USERID | |||
disable = no | |||
} | |||
</pre> | |||
=== Building gfServer indexes === | |||
Three files are required by dynamic gfServers and must follow the naming | |||
convention: | |||
* myGenome.2bit - two-bit format genomic sequence | |||
* myGenome.untrans.gfidx - untranslated index | |||
* myGenome.trans.gfidx - translated index | |||
Where myGenome is the database or hub name of the assembly. For | |||
database-based assemblies, the files are stored in a directory with the | |||
name as the assembly database, such as ''rootdir/myGenome/''. For assembly | |||
hubs, they may follow this convention or use more deeply nested directories | |||
such as ''rootdir/GCF/000/181/335/GCF_000181335.3/''. | |||
The gfServer parameters are stored with the index and are specified when the index is created. The following commands will build the indexes: | |||
<pre> | |||
gfServer index -stepSize=5 myGenome.untrans.gfidx myGenome.2bit | |||
gfServer index -trans myGenome.trans.gfidx myGenome.2bit | |||
</pre> | |||
=== Configuring database genomes to use a dynamic gfServer === | |||
Existing mirrors will need to add a column "dynamic" to hgcentral.blatServers with the | |||
following SQL command:. | |||
<pre> | |||
alter table hgcentral.blatServers add column dynamic tinyint not null default 0; | |||
</pre> | |||
To change an existing genome to use tghe dynamic gfServer, use the SQL commands: | |||
<pre> | |||
update hgcentral.blatServers SET host = "localhost", port=5010, dynamic=1 where db="ci1" and isTrans=0; | |||
update hgcentral.blatServers SET host = "localhost", port=5010, dynamic=1 where db="ci1" and isTrans=1; | |||
</pre> | |||
=== Configuring assembly hubs to use a dynamic gfServer === | |||
A dynamic BLAT server is specified with the "dynamic" argument to | |||
the blat, transBlat, isPcr definitions in the hub genome.txt file, followed by | |||
the gfServer root-relative path of the directory | |||
containing the 2bit and gfidx files. | |||
For example: | |||
<pre> | |||
blat yourServer.yourInstitution.edu 4096 dynamic myGenome | |||
transBlat yourServer.yourInstitution.edu 4096 dynamic myGenome | |||
isPcr yourServer.yourInstitution.edu 4096 dynamic myGenome | |||
</pre> | |||
The genome and gfServer indexes would be: | |||
<pre> | |||
$rootdir/myGenome/myGenome.2bit | |||
$rootdir/myGenome/myGenome.untrans.gfidx | |||
$rootdir/myGenome/myGenome.trans.gfidx | |||
</pre> | |||
For large hubs, it is possible to have more deeply nest directory, for | |||
instance, the following NCBI convention: | |||
<pre> | |||
blat yourServer.yourInstitution.edu 4096 dynamic GCF/000/181/335/GCF_000181335.3 | |||
transBlat yourServer.yourInstitution.edu 4096 dynamic GCF/000/181/335/GCF_000181335.3 | |||
isPcr yourServer.yourInstitution.edu 4096 dynamic GCF/000/181/335/GCF_000181335.3 | |||
</pre> | |||
Which will reference these genome files and indexes: | |||
<pre> | |||
$rootdir/GCF/000/181/335/GCF_000181335.3/GCF_000181335.3.2bit | |||
$rootdir/GCF/000/181/335/GCF_000181335.3/GCF_000181335.3.untrans.gfidx | |||
$rootdir/GCF/000/181/335/GCF_000181335.3/GCF_000181335.3.trans.gfidx | |||
</pre> |
Revision as of 14:50, 2 February 2021
BLAT servers (gfServer) are configured as either static or dynamic servers. Static BLAT serves index a genome when started and remain running in memory to quickly respond to request. Dynamic BLAT servers pre-index genomes to files and are run on demand to handle a BLAT request and then exit.
Static gfServer are easier to configure and faster to respond. However, the server continually uses memory. A dynamic gfServer is more appropriate with multiple assemblies and infrequent use. Their response time is usually acceptable; however, it varies with the speed of the disk containing the index. With repeated access, the operating system will cache the indexes in memory, improving response time.
Both database-based assemblies or assembly hubs may be configured to use either type of BLAT server.
NOTE: dynamic BLAT servers are not yet available. They are expected to be released in March 2021
Configuring a static gfServer
- If you want to run your own blat server you need a lot of spare memory on the machine. You may also want to review our mailing list archives for gfServer troubleshooting advice.
- You need two servers, one for protein queries, one for normal DNA queries.
* Add something like this to a startup file of your server, e.g. /etc/rc.d/rc.local: gfServer start blatMachine 33333 -stepSize=5 -log=/var/log/blatServerCi1.log /gbdb/ci1/ci1.2bit gfServer start blatMachine 33334 -trans -log=/var/log/blatServerCi1Trans.log /gbdb/ci1/ci1.2bit
- Add the server to hgCentral
update hgcentral.blatServers set host = "localhost", port=33333 where db="ci1" and isTrans=0; update hgcentral.blatServers set host = "localhost", port=33334 where db="ci1" and isTrans=1;
- If you're not running a protein server, remove its entry from hgCentral
delete from hgcentral.blatServers where db="ci1" and isTrans=1;
- Tell the browser where to find the 2bit file:
update dbDb set nibPath = "" where name="ci1";
- On RedHat you might need SELinux permissions:
sudo chcon --type=httpd_sys_content_t /gbdb/ci1/ci1.2bit
Configuring a dynamic gfServer
A xinetd super-server starts a dynamic BLAT server to handle a single user request. It loads a pre-built index from disk for the request A single xinetd server handles multiple genomes and, nucleotide, protein-translated, and protein queries. Genomes indexes must be pre-built, with all of them installed or linked under a common directory hierarchy, called the gfServer root directory.
Configuring xinetd
The xinetd, or the older inetd server is a standard package on UNIX /Linux systems. It is a facility that runs a program to handle an internet server request. A system administrator generally configures it. The server runs the services as an unprivileged users. Please see your operating system documentation for more details.
An example configuration file below. It launches gfServer with two arguments, the literal string "dynserver" and the gfServer root directory path.
service blat { port = 5010 socket_type = stream wait = no user = blatuser group = genecats server = /mnt/data/dyn-blat/bin/gfServer server_args = dynserver /mnt/data/dyn-blat/genomes type = UNLISTED log_on_success += USERID EXIT log_on_failure += USERID disable = no }
Building gfServer indexes
Three files are required by dynamic gfServers and must follow the naming convention:
- myGenome.2bit - two-bit format genomic sequence
- myGenome.untrans.gfidx - untranslated index
- myGenome.trans.gfidx - translated index
Where myGenome is the database or hub name of the assembly. For database-based assemblies, the files are stored in a directory with the name as the assembly database, such as rootdir/myGenome/. For assembly hubs, they may follow this convention or use more deeply nested directories such as rootdir/GCF/000/181/335/GCF_000181335.3/.
The gfServer parameters are stored with the index and are specified when the index is created. The following commands will build the indexes:
gfServer index -stepSize=5 myGenome.untrans.gfidx myGenome.2bit gfServer index -trans myGenome.trans.gfidx myGenome.2bit
Configuring database genomes to use a dynamic gfServer
Existing mirrors will need to add a column "dynamic" to hgcentral.blatServers with the following SQL command:.
alter table hgcentral.blatServers add column dynamic tinyint not null default 0;
To change an existing genome to use tghe dynamic gfServer, use the SQL commands:
update hgcentral.blatServers SET host = "localhost", port=5010, dynamic=1 where db="ci1" and isTrans=0; update hgcentral.blatServers SET host = "localhost", port=5010, dynamic=1 where db="ci1" and isTrans=1;
Configuring assembly hubs to use a dynamic gfServer
A dynamic BLAT server is specified with the "dynamic" argument to the blat, transBlat, isPcr definitions in the hub genome.txt file, followed by the gfServer root-relative path of the directory containing the 2bit and gfidx files.
For example:
blat yourServer.yourInstitution.edu 4096 dynamic myGenome transBlat yourServer.yourInstitution.edu 4096 dynamic myGenome isPcr yourServer.yourInstitution.edu 4096 dynamic myGenome
The genome and gfServer indexes would be:
$rootdir/myGenome/myGenome.2bit $rootdir/myGenome/myGenome.untrans.gfidx $rootdir/myGenome/myGenome.trans.gfidx
For large hubs, it is possible to have more deeply nest directory, for instance, the following NCBI convention:
blat yourServer.yourInstitution.edu 4096 dynamic GCF/000/181/335/GCF_000181335.3 transBlat yourServer.yourInstitution.edu 4096 dynamic GCF/000/181/335/GCF_000181335.3 isPcr yourServer.yourInstitution.edu 4096 dynamic GCF/000/181/335/GCF_000181335.3
Which will reference these genome files and indexes:
$rootdir/GCF/000/181/335/GCF_000181335.3/GCF_000181335.3.2bit $rootdir/GCF/000/181/335/GCF_000181335.3/GCF_000181335.3.untrans.gfidx $rootdir/GCF/000/181/335/GCF_000181335.3/GCF_000181335.3.trans.gfidx