RR Down: Sending Alert Messages about Genome Browser Being Offline: Difference between revisions
Line 27: | Line 27: | ||
* We have now resolved the problem on our main site. We apologize for any inconvenience and thank you for your understanding. | * We have now resolved the problem on our main site. We apologize for any inconvenience and thank you for your understanding. | ||
==Things are really bad (over an hour+ offline): update | ==Things are really bad (over an hour+ offline): Ask cluster-admin to update to display the maintenance page== | ||
This [http://redmine.soe.ucsc.edu/issues/9608#note-40 RM] has some history about this page. There is a file maintenance.html at /usr/local/apache/htdocs/ that gets turned on when admin touches another file (maintenance.enable perhaps). | |||
[[Category:Browser QA]] | [[Category:Browser QA]] | ||
[[Category:Browser Development]] | [[Category:Browser Development]] |
Revision as of 19:13, 23 April 2018
Overview
This page has reminders of what to do if the RR is down for a long period. You want to verify the problem, contact cluster-admin. cc'ing the team, and then if it isn't fixed in a reasonable amount of time, consider additional messages.
Contact cluster-admin/cc qateam
Check logs
See Checking_RR_status_through_hgTracksRandom where you can tail -100 /hive/users/qateam/perf/hgTracksRandom.log
to see the history of the RR over 15 minute intervals.
Confirm issue
Navigate to the machines to confirm there is a problem.
- One approach is to have a secondary browser open new windows with all of the machines open as tabs for the home page.
- For example, if Chrome is your main browser and Firefox is your secondary under Preferences/General "Home page:" and When Firefox starts: Show my homepage: paste the following for your homepage:
hgw0.soe.ucsc.edu/cgi-bin/hgTracks?db=hg38|hgw1.soe.ucsc.edu/cgi-bin/hgTracks?db=hg38|hgw2.soe.ucsc.edu/cgi-bin/hgTracks?db=hg38|hgw3.soe.ucsc.edu/cgi-bin/hgTracks?db=hg38|hgw4.soe.ucsc.edu/cgi-bin/hgTracks?db=hg38|hgw5.soe.ucsc.edu/cgi-bin/hgTracks?db=hg38|hgw6.soe.ucsc.edu/cgi-bin/hgTracks?db=hg38|genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg38|http://genome-asia.ucsc.edu/cgi-bin/hgTracks?db=hg38%7C http://hgwdev.cse.ucsc.edu/cgi-bin/hgTracks?db=monDom5&hubUrl=http://genome-test.cse.ucsc.edu/~hiram/hubs/rrCGIStats/hub.txt&position=chr1%3A460068880-469555993
- These open hgw0-hgw6, genome-euro, genome-asia, and Hiram's cool monitoring hub on hgwdev.
Send email
If things look serious send an email to cluster-admin and qateam sharing that the RR (or specific machine, say hgw5 if that what you checking shows) is down.
Things are bad: update twitter
If cluster-admin do not come back with a fix within half an hour, it is probably a good idea to start thinking about notifying the greater community. If the error is minor, for example, only one machine is out (say hgw5) then perhaps it isn't as important to notify the community. But if it is bad, for example mailing list questions start coming in, it might be time to update twitter.
Be sure to say genome-asia and genome-euro are available (if they are).
See this note about our twitter account. Here are some example twitter updates:
- The Genome Browser is unexpectedly down. Please rest assured we are working on having it back up ASAP!
- Our mirrors in Europe and Asia (http://genome-euro.ucsc.edu http://genome-asia.ucsc.edu) are up and available while we work on returning our main site.
- We have now resolved the problem on our main site. We apologize for any inconvenience and thank you for your understanding.
Things are really bad (over an hour+ offline): Ask cluster-admin to update to display the maintenance page
This RM has some history about this page. There is a file maintenance.html at /usr/local/apache/htdocs/ that gets turned on when admin touches another file (maintenance.enable perhaps).