Apache error log output: Difference between revisions
(Adding hints on log locations) |
(Adding Lou's tips about seeing an IP's actions) |
||
Line 31: | Line 31: | ||
Ask Matt for more ways to use these logs, such as building a temporary directory of simlinks to the logs for specific date ranges to do more complicated querries. | Ask Matt for more ways to use these logs, such as building a temporary directory of simlinks to the logs for specific date ranges to do more complicated querries. | ||
===Example of accessing /hive/data/inside/wwwstats/RR/ for IP=== | |||
Error log location for hgw2: | |||
/hive/data/inside/wwwstats/RR/2019/hgw2 | |||
The last two directories are for $year and $machine | |||
Within that directory: | |||
<pre> | |||
-rw-rw-r-- 1 hiram genecats 498M Mar 28 10:40 access_log | |||
-rw-rw-r-- 1 hiram genecats 891M Mar 28 10:40 error_log | |||
-rw-r--r-- 1 hiram protein 54M Jan 14 03:10 access_log.20190106.gz | |||
-rw-rw-r-- 1 hiram genecats 48M Jan 20 03:30 access_log.20190114.gz | |||
-rw-rw-r-- 1 hiram genecats 58M Jan 28 03:41 access_log.20190120.gz | |||
... | |||
</pre> | |||
There is an '''access_log''', which is the latest log (within the last week). Same for the error_log. Once a log is >week old, it gets compressed and the file name is 2019-month-day(start of week). Ex. access_log.20190120.gz. | |||
If you wanted to research what a recent offending IP was doing, you would check the different machines, in this case we saw there is some activity on hgw2, and do the following: | |||
<pre> | |||
$cat access_log | awk '{print $1}' | sort | uniq -c | sort -r | head | |||
33386 54.82.172.170 | |||
28818 130.91.194.123 | |||
16993 130.223.121.12 (Offender) (16993 hits on hgw2) | |||
</pre> | |||
====Ex. Summarizing specific 130.223.121.12 IP actions==== | |||
<pre> | |||
$cat access_log | grep "130.223.121.12" | awk '{print $7}' | cut -f 3 -d '/' | cut -f 1 -d "?" | sort | uniq -c | sort -r | head | |||
16905 hgTables | |||
18 hgt | |||
14 hgTracks | |||
</pre> | |||
So this 130.223.121.12 IP is hammering hgTables. | |||
====Ex. Checking specific 130.223.121.12 IP requests==== | |||
<pre> | |||
$cat access_log | grep "130.223.121.12" | head -n 3 | |||
130.223.121.12 - - [25/Mar/2019:08:12:27 -0700] "POST /cgi-bin/hgTables? HTTP/1.1" 200 1222212 "-" "Python-urllib/2.7" 1257605 microseconds | |||
130.223.121.12 - - [25/Mar/2019:08:12:30 -0700] "POST /cgi-bin/hgTables? HTTP/1.1" 200 167045 "-" "Python-urllib/2.7" 2063636 microseconds | |||
130.223.121.12 - - [25/Mar/2019:08:12:31 -0700] "POST /cgi-bin/hgTables? HTTP/1.1" 200 681933 "-" "Python-urllib/2.7" 2170569 microseconds | |||
</pre> | |||
The results show Python-urllib/2.7 which shows they are using python to programmatically query, and the POST is a different way to input parameters without using the URL, so we don't know what they're searching for. We do see that they are doing multiple queries a minute without waiting for a response, which is more evident as they currently have an almost 60 minute bottleneck. | |||
[[Category:Browser QA]] | [[Category:Browser QA]] | ||
[[Category:Browser QA CGI]] | [[Category:Browser QA CGI]] |
Revision as of 18:19, 28 March 2019
Here are some examples of errors we often see but that are of no concern:
This is seen right after a push of new CGIs and is caused by browsers having a stale version of the CGIs/css file. A bunch of these will show up right after the push, but they should subside after several minutes as browser grab new versions.
Cannot find correct version of file 'HGStyle.css'; this is due to an installation error, referer: http://hgw0.gi.ucsc.edu/
File does not exist: /data/apache/htdocs/js/jquery.imgareaselect-v291.js, referer: http://genome.ucsc.edu/cgi-bin/hgTracks
We also see these often but they tend to right themselves as the load on the browser waxes and wanes.
Too many popAbortHandlers, referer: http://genome.ucsc.edu/cgi-bin/hgTables?position=chr4:150237197-150248873&hgsid=353372405&refGene=pack&hgFind.matches=NM_023119,
How to look at logs
When you are on a machine you can look at the logs by following this file:
- /usr/local/apache/logs/error_log
Older logs with names like this:
- /usr/local/apache/logs/error_log.1
- /usr/local/apache/logs/error_log.2
Example tail command:
- tail -f /usr/local/apache/logs/error_log | grep -v "CGI_TIME\|trackLog"
How to find historic logs
If you wanted to look at historical stuff there is a place here on hive:
- /hive/data/inside/wwwstats/RR/2018/
- /hive/data/inside/wwwstats/RR/2019/
Let's say you wanted to look at specific events on Feb. 14th on the hgw1 machine for 2019:
- zgrep "Feb 14" /hive/data/inside/wwwstats/RR/2019/hgw1/error_log.20190211.gz | less
Ask Matt for more ways to use these logs, such as building a temporary directory of simlinks to the logs for specific date ranges to do more complicated querries.
Example of accessing /hive/data/inside/wwwstats/RR/ for IP
Error log location for hgw2:
/hive/data/inside/wwwstats/RR/2019/hgw2
The last two directories are for $year and $machine
Within that directory:
-rw-rw-r-- 1 hiram genecats 498M Mar 28 10:40 access_log -rw-rw-r-- 1 hiram genecats 891M Mar 28 10:40 error_log -rw-r--r-- 1 hiram protein 54M Jan 14 03:10 access_log.20190106.gz -rw-rw-r-- 1 hiram genecats 48M Jan 20 03:30 access_log.20190114.gz -rw-rw-r-- 1 hiram genecats 58M Jan 28 03:41 access_log.20190120.gz ...
There is an access_log, which is the latest log (within the last week). Same for the error_log. Once a log is >week old, it gets compressed and the file name is 2019-month-day(start of week). Ex. access_log.20190120.gz.
If you wanted to research what a recent offending IP was doing, you would check the different machines, in this case we saw there is some activity on hgw2, and do the following:
$cat access_log | awk '{print $1}' | sort | uniq -c | sort -r | head 33386 54.82.172.170 28818 130.91.194.123 16993 130.223.121.12 (Offender) (16993 hits on hgw2)
Ex. Summarizing specific 130.223.121.12 IP actions
$cat access_log | grep "130.223.121.12" | awk '{print $7}' | cut -f 3 -d '/' | cut -f 1 -d "?" | sort | uniq -c | sort -r | head 16905 hgTables 18 hgt 14 hgTracks
So this 130.223.121.12 IP is hammering hgTables.
Ex. Checking specific 130.223.121.12 IP requests
$cat access_log | grep "130.223.121.12" | head -n 3 130.223.121.12 - - [25/Mar/2019:08:12:27 -0700] "POST /cgi-bin/hgTables? HTTP/1.1" 200 1222212 "-" "Python-urllib/2.7" 1257605 microseconds 130.223.121.12 - - [25/Mar/2019:08:12:30 -0700] "POST /cgi-bin/hgTables? HTTP/1.1" 200 167045 "-" "Python-urllib/2.7" 2063636 microseconds 130.223.121.12 - - [25/Mar/2019:08:12:31 -0700] "POST /cgi-bin/hgTables? HTTP/1.1" 200 681933 "-" "Python-urllib/2.7" 2170569 microseconds
The results show Python-urllib/2.7 which shows they are using python to programmatically query, and the POST is a different way to input parameters without using the URL, so we don't know what they're searching for. We do see that they are doing multiple queries a minute without waiting for a response, which is more evident as they currently have an almost 60 minute bottleneck.