Linux Log Monitoring - Monitor log files and alert on log entries, Monitor single / multiple logs efficiently on Linux, AIX and SunOS, Get alert notifications on errors, patterns - Graph log file behavior and metrics, Analyze log file content, Alert on log timestamps, updatetimes, size and growth

Monitoring Logs with Timestamps

Monitor log files with timestamps similar to 'Aug 21 15:21:58 ...'

Monitor log files for user-defined patterns & EXCLUDE specific lines from the results

Case Scenario:

Within the last 30 minutes, find out how many lines in the log file [ /var/log/app.log ] contain both entries of "ERROR" and "Client". If any lines are found containing these two strings (ERROR.*Client), take note of that.

From the list of lines found, see if there are any lines that also contain the keywords "error 404" OR "updateNumber".  If there are, remove them from the list.  After removing them, show me what is left.  If the number of lines left is between 5 and 9, alert as WARNING.  If equal to or over 10, alert as CRITICAL.  If below 5, do not alert!

Command:

logxray  autofig  /var/log/app.log  30m  'ERROR.*Client' '(error 404|updateNumber)'  5  10  -showexcl

Monitor log files for certain entries - ALERT IF those entries are NOT found

Case Scenario:

For instance, within the last 30 minutes, if logxray does not find at least 2 lines containing the words "Success" and "Client"  and "returned 200" OR "update:OK" in the log file, it must alert.  So in other words, the lines to search for MUST contain both words of Success & Client (Success.*Client) AND one or both of the strings returned 200 and update:OK.

Command:

logxray  autofig  /var/log/app.log  30  'SUCCESS.*Client' '(returned 200|update:OK)'   2  2  -notfoundn

Monitor log files for specific entries & display results to the screen

This is particularly helpful in cases where you might want to see the actual lines that contain the patterns you instructed the tool to search for.

Example:

logxray  autofig  /var/log/app.log  30  'ERROR.*Client' '(error 404|updateNumber:OK)'  5  10  -show

Example:

logxray  autofig  /var/log/app.log  30  'SUCCESS.*Client' '(returned 200|update:OK)'   5  10  -show

Scan log files for minutes, hours, days, weeks or months worth of data

For instance, to pull out 2 days of information from within a large log file and to find out how many lines contain certain strings and patterns, you can run a command similar to this:

Example:

logxray  autofig  /var/log/app.log  2d  'ERROR|error|panic|fail' '.'  5  10  -foundn

From this specific example, I'm telling logxray that I care about EVERY single line that contains any of the keywords I provided.  The [ 2d ] of course means 2 Days. 

See below for the different ways of specifying a preferred time frame:

5m = 5 minutes (changeable to any number of minutes)

10h = 10 hours (changeable to any number of hours)

2d = 2 days (changeable to any number of days)

2w = 2 weeks (changeable to any number of weeks)

3mo = 3 months (changeable to any number of months)

Understanding the 'autofig' syntax and interpreting the numerical response it provides

Syntax:

./logxray autofig (logfile) (timeframe-in-minutes) '(string1)' '(string2)' (warn) (critical) (-foundn)

Basic Usage:

[root@monitor jbowman]#
[root@monitor jbowman]#
[root@monitor jbowman]# logxray autofig /var/log/messages 1440 'ntpd' 'stratum' 5 10 -foundn

2---240---108---ATWFILF---(Apr/13)-(03:35)---(Apr/14)-(03:35:23)

[root@monitor jbowman]#
[root@monitor jbowman]#

So now lets break this down:

logrobot is the tool name.

autofig is an option that is passed to the logrobot tool to tell it what to do.  In this particular case, autofig is instructing logrobot to "automatically figure out" what type of log file /var/log/messages is, and if the format of the log file is supported, perform the remaining functions.

/var/log/messages is of course the log file.

1440 is the amount of previous minutes you want to search the log file for. 1440 = last 24 hours.

"ntpd" is one of the strings that is in the lines of logs that you're interested in.

"stratum" is another string on the same line that you expect to find the "ntpd" string on. Specifying these two strings (luance and Err1310) isolates and processes the lines you want a lot quicker, particularly if you're dealing with a huge log file.

5 specifies Warning. By specifying 5, you're telling the program to alert as WARNING if there are at least 5 occurrences of the search strings you specified, in the log file within the last 60 minutes.

10 specifies Critical. By specifying 10, you're telling the program to alert as CRITICAL if there are at least 10 occurrences of the search strings you specified, in the log file within the last 60 minutes.

-foundn specifies what type of response you'll get. By specifying -foundn, you're saying if anything is found that matches the specified strings within the 60 minute time frame, then that should be regarded as a problem and outputted out.

Summarized Explanation:

As you can see, the logrobot tool is monitoring a log file. The arguments that are passed to the tool instructs it to do the following:

Within the last 60 minutes, if the tool finds less than 5 occurrences of the specified strings in the log file, DO NOT alert. If the tool finds between 5 to 9 occurrences of the specified strings in the log, it'll alert with a WARNING. If the tool discovers 10 or more instances of the strings in the log within the last 60 minutes, it'll alert with a CRITICAL.

Now, let us look at the result of the command:

2---240---108---ATWFILF---(Apr/13)-(03:35)---(Apr/14)-(03:35:23)

There are 6 columns which are separated by 3 hyphens (---).  The first column shows the exit code of the command you just ran.  0 means all is well. 1 means WARNING, which means, LOGROBOT discovered conditions that fell under the WARNING specification you provided.  2 means CRITICAL, which means, the worst case scenario has been reached.

In this particular example, here's what the output is telling us: 

You requested to have the /var/log/messages file scanned as far back as 24 hours ago (1440 minutes).

The timeframe that was scanned was from [ April 13, 03:35 ] to [ April 14, 03:35 ].  After scanning through the records that were written to the log in that time frame, LOGROBOT found 108 lines that contained both strings of "ntpd" and "stratum 2".  Also, as an FYI, the last date and time those specific strings were found in the log file was 240 seconds ago.

Other common log monitoring scenarios

Log File Content

Scan content of log files for new occurrences (or lack thereof) of specific keywords, strings or patterns.

Log File Size

Monitor the sizes of single or multiple log files - alert if log size breaches predefined thresholds.

Log File Growth

Monitor the growth of single or multiple log files - alert when the monitored logs stop receiving new data.

Log File Timestamp

Monitor the timestamp of single or multiple logs. Alert, if logs are older than X amount of minutes or hours.