Upgrading gfServer: Difference between revisions

From Genecats
Jump to navigationJump to search
(Created page with " Upgrading gfServers to new executable. THE NEW IMPROVED METHOD: We no longer need to have some of our servers down for up to 10 hours. Now each server will be killed and re...")
 
No edit summary
Line 19: Line 19:
  scp hgwdev:somepath-to-new-gfServer/gfServer blatMachine:/scratch/gfServerNew
  scp hgwdev:somepath-to-new-gfServer/gfServer blatMachine:/scratch/gfServerNew
  ssh blatMachine mv /scratch/gfServerNew /scratch/gfServer
  ssh blatMachine mv /scratch/gfServerNew /scratch/gfServer
#
  /scratch/startBlat.pl forceRestart
  /scratch/startBlat.pl forceRestart


Line 29: Line 28:
automatically by the other steps. But it happens just one at a time,
automatically by the other steps. But it happens just one at a time,
and nothing is killed for long periods ahead of time.
and nothing is killed for long periods ahead of time.
The upgrade procedure for each blatMachine is independent,
so you can have them all running the upgrade step simultaneously.
There is no need to wait for one to finish before upgrading
the next blat machine.


When the startBlat.pl has finished running on all of the machines,
When the startBlat.pl has finished running on all of the machines,
Line 44: Line 48:


-----------
-----------


FOR HISTORY, Here is how it used to be done:
FOR HISTORY, Here is how it used to be done:

Revision as of 20:49, 14 August 2019

Upgrading gfServers to new executable.

THE NEW IMPROVED METHOD:

We no longer need to have some of our servers down for up to 10 hours. Now each server will be killed and restarted in about 10 minutes, so no single blat server will be down for very long.

No messing with blatServers table on the RR to set things to blatx, and then having to restore it back later.

Instead, the admins can follow this NEW simple procedure:

Make sure the new startBlat.pl which supports the new command-line option "forceRestart" is in-place on all the blat-servers. Hopefully Erich will have done this already.

For each blat machine including blatx, do:

scp hgwdev:somepath-to-new-gfServer/gfServer blatMachine:/scratch/gfServerNew
ssh blatMachine mv /scratch/gfServerNew /scratch/gfServer
/scratch/startBlat.pl forceRestart

This is carefully designed to slip in the new gfServer executable without disturbing the running gfServer daemon processes.

The the special option "forceRestart" will cause it to kill each currently runing gfServer process, and then it will get restarted automatically by the other steps. But it happens just one at a time, and nothing is killed for long periods ahead of time.

The upgrade procedure for each blatMachine is independent, so you can have them all running the upgrade step simultaneously. There is no need to wait for one to finish before upgrading the next blat machine.

When the startBlat.pl has finished running on all of the machines, you should be able to test quickly the results with:

blatServersCheck central

And if you set up rrcentral prefixed profile in your hg.conf so that it uses the RR genome-centdb.hgcentral, then you can run this, which is only focussed on the RR.

blatServersCheck rrcentral

Now spot check a few assemblies by blatting on them with hgPcr.



FOR HISTORY, Here is how it used to be done:

OLD PROCEDURE: In the past was done like this:

FIRST, upgrade gfServer for blatx using the routine below.

SECOND, This page (at the bottom) to set the most important BLAT servers to run on blatx, the backup blat server machine:

http://genomewiki.ucsc.edu/genecats/index.php/Emergency_Backup_BLAT_Servers

Now they can test that hgBlat is working fine on the new gfServer code.

Repeat for each blat server (other than blatx)

FOR EACH BLAT MACHINE:

 The admins kill all the gfServer processes on it.
 Copy in the new gfServer executable to /scratch/gfServer.
 Run 
  /scratch/startBlat.pl
 to restart all the blat servers.

The startBlat.pl script has an array that preserves the blat servers so they remain on the same ports.

Finally, they run that page again to restore the important BLAT servers, moving them from blatx back to where they were.

This was not ideal because you can have a fairly long interruption to some of the BLAT servers, up to 10 hours.