Linux Log File Analysis, Analyze Logs, Monitor & Alert on any Unix Log File on any Unix Host/Server, Get Email Alerts on Log Checks; System Logs, Database Logs, Application Logs, Custom Log Files and much more

Monitoring logs in time frames (if format is supported)

1). General LoGrobot Syntax: How it is commonly used to monitor log files


logrobot autofig (logfile) (time-in-minutes) '(string1)' '(string2)' (warn) (critical) (-foundn)

Basic Usage: 

[root@monitor jbowman]#
[root@monitor jbowman]#
[root@monitor jbowman]# logrobot autofig /var/log/messages 1440 'ntpd' 'stratum' 5 10 -foundn
 
2---240---108---ATWFILF---(Apr/13)-(03:35)---(Apr/14)-(03:35:23)

[root@monitor jbowman]#
[root@monitor jbowman]#

So now lets break this down:

logrobot is the tool name.

autofig is an option that is passed to the logrobot tool to tell it what to do.  In this particular case, autofig is instructing logrobot to "automatically figure out" what type of log file /var/log/messages is, and if the format of the log file is supported, perform the remaining functions.

/var/log/messages is of course the log file.

1440 is the amount of previous minutes you want to search the log file for. 1440 = last 24 hours.

"ntpd" is one of the strings that is in the lines of logs that you're interested in.

"stratum" is another string on the same line that you expect to find the "ntpd" string on. Specifying these two strings (luance and Err1310) isolates and processes the lines you want a lot quicker, particularly if you're dealing with a huge log file.

5 specifies Warning. By specifying 5, you're telling the program to alert as WARNING if there are at least 5 occurrences of the search strings you specified, in the log file within the last 60 minutes.

10 specifies Critical. By specifying 10, you're telling the program to alert as CRITICAL if there are at least 10 occurrences of the search strings you specified, in the log file within the last 60 minutes.

-foundn specifies what type of response you'll get. By specifying -foundn, you're saying if anything is found that matches the specified strings within the 60 minute time frame, then that should be regarded as a problem and outputted out.

Summarized Explanation:

As you can see, the logrobot tool is monitoring a log file. The arguments that are passed to the tool instructs it to do the following:

Within the last 60 minutes, if the tool finds less than 5 occurrences of the specified strings in the log file, DO NOT alert. If the tool finds between 5 to 9 occurrences of the specified strings in the log, it'll alert with a WARNING. If the tool discovers 10 or more instances of the strings in the log within the last 60 minutes, it'll alert with a CRITICAL.

Now, let us look at the result of the command:

2---240---108---ATWFILF---(Apr/13)-(03:35)---(Apr/14)-(03:35:23)

There are 6 columns which are separated by 3 hyphens (---).  The first column shows the exit code of the command you just ran.  0 means all is well. 1 means WARNING, which means, LoGrobot discovered conditions that fell under the WARNING specification you provided.  2 means CRITICAL, which means, the worst case scenario has been reached.

In this particular example, here's what the output is telling us: 

You requested to have the /var/log/messages file scanned as far back as 24 hours ago (1440 minutes).

The timeframe that was scanned was from [ April 13, 03:35 ] to [ April 14, 03:35 ].  After scanning through the records that were written to the log in that time frame, LoGrobot found 108 lines that contained both strings of "ntpd" and "stratum 2".  Also, as an FYI, the last date and time those specific strings were found in the log file was 240 seconds ago.

2). Scan log files for user-defined entries & EXCLUDE specific lines from the results


Case Scenario:

Within the last 30 minutes, find out how many lines in the log file [ /var/log/app.log ] contain both entries of "ERROR" and "Client". If any lines are found containing these two strings (ERROR.*Client), take note of that.

From the list of lines found, see if there are any lines that also contain the keywords "error 404" OR "updateNumber".  If there are, remove them from the list.  After removing them, show me what is left.  If the number of lines left is between 5 and 9, alert as WARNING.  If equal to or over 10, alert as CRITICAL.  If below 5, do not alert!

Command:

logrobot autofig /var/log/app.log 30 "ERROR.*Client" '(error 404|updateNumber)' 5 10 -showexcl

3). Monitor log files for certain entries - ALERT IF those entries are NOT found


Case Scenario:

For instance, within the last 30 minutes, if LoGrobot does not find at least 2 lines containing the words "Success" and "Client"  and "returned 200" OR "update:OK" in the log file, it must alert.  So in other words, the lines to search for MUST contain both words of Success & Client (Success.*Client) AND one or both of the strings returned 200 and update:OK.

Command:

logrobot autofig /var/log/app.log 30 "SUCCESS.*Client" '(returned 200|update:OK)' 2 2 -notfoundn

4). Scan Log files for specific entries & display all offending lines in alert


This is particularly helpful in cases where you might want to see the actual lines that contain the patterns you instructed the tool to search for.

Example:

logrobot  autofig  /var/log/app.log  30  "ERROR.*Client"  '(error 404|updateNumber:OK)'  5  10  -show

Example:

logrobot  autofig  /var/log/app.log  30  "SUCCESS.*Client"  '(returned 200|update:OK)'   5  10  -show

5). Scan log files for minutes, hours, days, weeks or months worth of data


For instance, to pull out 2 weeks of information from within a large log file and to find out how many lines contain certain strings and patterns, you can run a command similar to this:

Example:

logrobot autofig /var/log/app.log 2w "ERROR|error|panic|fail" "ERROR|error|panic|fail" 5 10 -foundn

Notice the [ 2w ].  And also, notice the strings being searched for.  I repeated the strings "ERROR|error|panic|fail" twice because there is no need to specify different search terms to look for.  You don't have to repeat the first string.  You can just enter a dot in its place for the second string..i.e:

logrobot  autofig  /var/log/app.log  2w  "ERROR|error|panic|fail"  "."  5  10  -foundn

From this specific example, I'm telling LoGrobot that I care about EVERY single line that contains any of the keywords I provided.  The [ 2w ] of course means 2 weeks. 
 
See below for the different ways of specifying the date range:

5m = 5 minutes (changeable to any number of minutes)

10h = 10 hours (changeable to any number of hours)

2d = 2 days (changeable to any number of days)

2w = 2 weeks (changeable to any number of weeks)

3mo = 3 months (changeable to any number of months)

6). Revealing hidden/unusual entries within logs of a App/DB/System or Network


Suppose you inherited a Unix environment at your new job and don't know what to search for in the logs, here's an idea; instead of worrying about what to watch for, why not force the logs to reveal their hidden contents?

In the example below, LoGrobot was instructed to search the entire messages file (denoted with the '.').  Then, it is to ignore every line that contains any one of these specific strings: 'nagios-primary nagios' OR 'not responding' OR 'synchronized to'.  Whatever lines are left after these THREE patterns are ignored should be outputted to the screen.  The logic here is; if you can identify which entries in the logs are of NO importance to you, you can exclude them from being monitored.  Therefore, if a log file is stripped of the familiar/unwanted, whatever is left will be unfamiliar, thus requiring investigation.
 
[root@nagios-primary ~]# logrobot sanal /var/log/messages 24h '.' 'nagios-primary nagios|not responding|synchronized to' 1 5 -showexcl

Jun 13 13:40:04 nagios-primary abrt[8269]: saved core dump of pid 8128 (/prod/nagios-core/sbin/status.cgi)
Jun 13 13:40:04 nagios-primary abrtd: Directory 'ccpp-2012-06-13-13:40:04-8128' creation detected
Jun 13 13:40:04 nagios-primary abrtd: Executable '/prod/nagios-core/sbin/status.cgi' doesn't belong to any
Jun 13 13:40:04 nagios-primary abrtd: Corrupted or bad dump /var/spool/abrt/ccpp-2012-06-13-13:40:04
Jun 14 02:20:41 nagios-primary auditd[5813]: Audit daemon rotating log files

2---0---(93)-(41064)-(0.226476%)-(28.4323)-(422.97)---ATWFILF---(Jun/13)-(13:23)---(Jun/14)-(13:23:26)

7). Simplifying usage of the Logrobot with real life case scenarios


Instead of forcing users to have to read complex documentations, LoGrobot provides real life examples of its usage right from the command line. Yes, REAL LIFE EXAMPLES! No guessing, no confusion, no scratching of the head. We strongly believe in simplicity and we take the extra steps many utilities refuse to take.

In the below output, let's say you forgot how to use the LoGrobot tool. Instead of having to find the documentation and then having to read it as well, you can just run the the tool from the command line and pass to it the option you're interested in...i.e. autofig (or you can type 'auto' to get more information on other available features).

Example:

[root@nagios-primary ~]#  ./logrobot  autofig

-------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
Scan log file for 30 minutes worth of information. Show all lines found containing 'ERROR'
-------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------

EXAMPLE:

./logrobot  autofig  /var/log/messages  30m   'ERROR'   '.'   5  10  -show

-------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------

8). Artificial Intelligence - Reveal a log's hidden contents (by specifying entries to ignore)


Scan the /var/log/messages log file for 24 hours worth of information.  Exclude all lines that contain 'nagios-primary nagios | not responding, timed out| synchronized to'

[root@nagios-primary ~]# logrobot  sanal  /var/log/messages  24h  '.'  'nagios-primary nagios|not responding, timed out| synchronized to'  1  5  -showexcl


Jun 13 13:40:04 nagios-primary abrt[8269]: saved core dump of pid 8128 (/prod/nagios-core/sbin/status.cgi) to /var/spool/abrt/ccpp-2012-06-13-13:40:04-8128.new/coredump (2490368 bytes)
Jun 13 13:40:04 nagios-primary abrtd: Directory 'ccpp-2012-06-13-13:40:04-8128' creation detected
Jun 13 13:40:04 nagios-primary abrtd: Executable '/prod/nagios-core/sbin/status.cgi' doesn't belong to any package
Jun 13 13:40:04 nagios-primary abrtd: Corrupted or bad dump /var/spool/abrt/ccpp-2012-06-13-13:40:04-8128 (res:2), deleting
Jun 14 02:20:41 nagios-primary auditd[5813]: Audit daemon rotating log files

2---0---(93)-(41064)-(0.226476%)-(28.4323)-(422.97)---ATWFILF---(Jun/13)-(13:23)---(Jun/14)-(13:23:26) ZEAGMK

[root@nagios-primary ~]#
[root@nagios-primary ~]#
[root@nagios-primary ~]#

Scan the /var/log/messages log file for 1 week's worth of information.  Show me all lines that contain the strings: 'nagios-primary abrtd:'

[root@nagios-primary ~]# logrobot sanal /var/log/messages 1w '.' 'nagios-primary abrtd:' 1 5 -show

Jun 10 19:45:34 nagios-primary abrtd: Directory 'ccpp-2012-06-10-19:45:34-19662' creation detected
Jun 10 19:45:35 nagios-primary abrtd: Executable '/prod/nagios-core/sbin/status.cgi' doesn't belong to any package
Jun 10 19:45:35 nagios-primary abrtd: Corrupted or bad dump /var/spool/abrt/ccpp-2012-06-10-19:45:34-19662 (res:2), deleting
Jun 12 07:07:03 nagios-primary abrtd: Directory 'ccpp-2012-06-12-07:07:02-30780' creation detected
Jun 12 07:07:03 nagios-primary abrtd: Executable '/prod/nagios-core/sbin/status.cgi' doesn't belong to any package
Jun 12 07:07:03 nagios-primary abrtd: Corrupted or bad dump /var/spool/abrt/ccpp-2012-06-12-07:07:02-30780 (res:2), deleting
Jun 13 13:40:04 nagios-primary abrtd: Directory 'ccpp-2012-06-13-13:40:04-8128' creation detected
Jun 13 13:40:04 nagios-primary abrtd: Executable '/prod/nagios-core/sbin/status.cgi' doesn't belong to any package
Jun 13 13:40:04 nagios-primary abrtd: Corrupted or bad dump /var/spool/abrt/ccpp-2012-06-13-13:40:04-8128 (res:2), deleting

2---81900---(9)-(176115)-(0.0051103%)-(3)-(0)---(Jun/7)-(13:27)---(Jun/14)-(13:27:26)---ETWNFILF---(Jun/10)-(03:37:03)---(Jun/14)-(13:27:26) NAGCGKiv

[root@nagios-primary ~]#
[root@nagios-primary ~]#
[root@nagios-primary ~]#
[root@nagios-primary ~]#

9) Show All entries logged in the [ kern.log ] log file within the last 2 HOURS:


root@nagios-primary ~#
root@nagios-primary ~#
root@nagios-primary ~#
root@nagios-primary ~# logrobot autofig /var/log/kern.log 2h '.' '.' 1 2 -show

Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.160050] usb 5-1: new full-speed USB device number 26 using uhci_hcd
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.388215] hub 5-1:1.0: USB hub found
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.390118] hub 5-1:1.0: 4 ports detected
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.673128] usb 5-1.2: new low-speed USB device number 27 using uhci_hcd
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.831895] input: Logitech USB Receiver as /devices/pci0000:00/0000:00:1d.0/usb5/5-1/5-1.2/5-1.2:1.0/input/input34
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.832071] logitech 0003:046D:C517.001B: input,hidraw0: USB HID v1.10 Keyboard [Logitech USB Receiver] on usb-0000:00:1d.0-1.2/input0
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.863133] logitech 0003:046D:C517.001C: fixing up Logitech keyboard report descriptor
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.865367] input: Logitech USB Receiver as /devices/pci0000:00/0000:00:1d.0/usb5/5-1/5-1.2/5-1.2:1.1/input/input35
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.865633] logitech 0003:046D:C517.001C: input,hiddev0,hidraw3: USB HID v1.10 Mouse [Logitech USB Receiver] on usb-0000:00:1d.0-1.2/input1
Sep 20 17:55:08 jake-XPS-M1530 kernel: [87312.249129] usb 5-1.3: new low-speed USB device number 28 using uhci_hcd
Sep 20 17:55:08 jake-XPS-M1530 kernel: [87312.436287] input: No brand 4 Port KVMSwicther as /devices/pci0000:00/0000:00:1d.0/usb5/5-1/5-1.3/5-1.3:1.0/input/input36
Sep 20 17:55:08 jake-XPS-M1530 kernel: [87312.436429] generic-usb 0003:10D5:55A4.001D: input,hidraw4: USB HID v1.10 Keyboard [No brand 4 Port KVMSwicther] on usb-0000:00:1d.0-1.3/input0
Sep 20 17:55:08 jake-XPS-M1530 kernel: [87312.442165] usbhid 5-1.3:1.1: couldn't find an input interrupt endpoint
     
2---3240---13---(Sep/20)-(16:49)---(Sep/20)-(17:55:08)---ETWNFILF---(Sep/20)-(17:55)---(Sep/20)-(17:55:08) NAGC

root@nagios-primary ~#
root@nagios-primary ~#
root@nagios-primary ~#


Scan through the above output and show ONLY lines that contain the strings "USB HID":
 
root@nagios-primary ~#
root@nagios-primary ~#
root@nagios-primary ~#
root@nagios-primary ~# logrobot autofig /var/log/kern.log 2h '.' 'USB HID' 1 2 -show

Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.832071] logitech 0003:046D:C517.001B: input,hidraw0: USB HID v1.10 Keyboard [Logitech USB Receiver] on usb-0000:00:1d.0-1.2/input0
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.865633] logitech 0003:046D:C517.001C: input,hiddev0,hidraw3: USB HID v1.10 Mouse [Logitech USB Receiver] on usb-0000:00:1d.0-1.2/input1
Sep 20 17:55:08 jake-XPS-M1530 kernel: [87312.436429] generic-usb 0003:10D5:55A4.001D: input,hidraw4: USB HID v1.10 Keyboard [No brand 4 Port KVMSwicther] on usb-0000:00:1d.0-1.3/input0

2---3420---3---(Sep/20)-(16:52)---(Sep/20)-(17:55:08)---ETWNFILF---(Sep/20)-(17:55)---(Sep/20)-(17:55:08) NAGC

root@nagios-primary ~#
root@nagios-primary ~#
root@nagios-primary ~#

10). Identify which Hour/Minute within the last 8 hours had the most entries logged



root@nagios-primary ~#
root@nagios-primary ~#
root@nagios-primary ~# logrobot sanal /var/log/kern.log 8h '.' '.' 1 2 -exceldh

frq=19,zsc=1.41421,asc=[Sep-20-(16)]
frq=13,zsc=-0.707106,asc=[Sep-20-(17)]
frq=13,zsc=-0.707106,asc=[Sep-20-(15)]

root@nagios-primary ~#
root@nagios-primary ~#

Search the [ kern.log ] file once again. Find which MINUTE(S) within the last 8 hours had the most entries logged:

root@nagios-primary ~#
root@nagios-primary ~#
root@nagios-primary ~# logrobot sanal /var/log/kern.log 8h '.' '.' 1 2 -exceldm

frq=13,zsc=0.816496,asc=[Sep-20-(17:55)]
frq=13,zsc=0.816496,asc=[Sep-20-(16:16)]
frq=13,zsc=0.816496,asc=[Sep-20-(15:31)]
frq=3,zsc=-1.22474,asc=[Sep-20-(16:24)]
frq=3,zsc=-1.22474,asc=[Sep-20-(16:15)]

root@nagios-primary ~#
root@nagios-primary ~#
root@nagios-primary ~#

11). Search entries in kern.log from within the last 2 hours, then exclude certain log lines


root@nagios-primary ~#
root@nagios-primary ~#
root@nagios-primary ~# logrobot autofig /var/log/kern.log 2h '.' '.' 1 2 -show

Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.160050] usb 5-1: new full-speed USB device number 26 using uhci_hcd
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.388215] hub 5-1:1.0: USB hub found
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.390118] hub 5-1:1.0: 4 ports detected
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.673128] usb 5-1.2: new low-speed USB device number 27 using uhci_hcd
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.831895] input: Logitech USB Receiver as /devices/pci0000:00/0000:00:1d.0/usb5/5-1/5-1.2/5-1.2:1.0/input/input34
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.832071] logitech 0003:046D:C517.001B: input,hidraw0: USB HID v1.10 Keyboard [Logitech USB Receiver] on usb-0000:00:1d.0-1.2/input0
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.863133] logitech 0003:046D:C517.001C: fixing up Logitech keyboard report descriptor
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.865367] input: Logitech USB Receiver as /devices/pci0000:00/0000:00:1d.0/usb5/5-1/5-1.2/5-1.2:1.1/input/input35
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.865633] logitech 0003:046D:C517.001C: input,hiddev0,hidraw3: USB HID v1.10 Mouse [Logitech USB Receiver] on usb-0000:00:1d.0-1.2/input1
Sep 20 17:55:08 jake-XPS-M1530 kernel: [87312.249129] usb 5-1.3: new low-speed USB device number 28 using uhci_hcd
Sep 20 17:55:08 jake-XPS-M1530 kernel: [87312.436287] input: No brand 4 Port KVMSwicther as /devices/pci0000:00/0000:00:1d.0/usb5/5-1/5-1.3/5-1.3:1.0/input/input36
Sep 20 17:55:08 jake-XPS-M1530 kernel: [87312.436429] generic-usb 0003:10D5:55A4.001D: input,hidraw4: USB HID v1.10 Keyboard [No brand 4 Port KVMSwicther] on usb-0000:00:1d.0-1.3/input0
Sep 20 17:55:08 jake-XPS-M1530 kernel: [87312.442165] usbhid 5-1.3:1.1: couldn't find an input interrupt endpoint

2---3960---13---(Sep/20)-(17:01)---(Sep/20)-(17:55:08)---ETWNFILF---(Sep/20)-(17:55)---(Sep/20)-(17:55:08) NAGC

root@nagios-primary ~#
root@nagios-primary ~#

From the above output, exclude all lines that contain 'Logitech' and show me what is left:

root@nagios-primary ~#
root@nagios-primary ~#
root@nagios-primary ~# logrobot sanal /var/log/kern.log 2h '.' 'Logitech' 1 2 -showexcl

Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.160050] usb 5-1: new full-speed USB device number 26 using uhci_hcd
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.388215] hub 5-1:1.0: USB hub found
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.390118] hub 5-1:1.0: 4 ports detected
Sep 20 17:55:06 jake-XPS-M1530 kernel: [87310.673128] usb 5-1.2: new low-speed USB device number 27 using uhci_hcd
Sep 20 17:55:08 jake-XPS-M1530 kernel: [87312.249129] usb 5-1.3: new low-speed USB device number 28 using uhci_hcd
Sep 20 17:55:08 jake-XPS-M1530 kernel: [87312.436287] input: No brand 4 Port KVMSwicther as /devices/pci0000:00/0000:00:1d.0/usb5/5-1/5-1.3/5-1.3:1.0/input/input36
Sep 20 17:55:08 jake-XPS-M1530 kernel: [87312.436429] generic-usb 0003:10D5:55A4.001D: input,hidraw4: USB HID v1.10 Keyboard [No brand 4 Port KVMSwicther] on usb-0000:00:1d.0-1.3/input0
Sep 20 17:55:08 jake-XPS-M1530 kernel: [87312.442165] usbhid 5-1.3:1.1: couldn't find an input interrupt endpoint

2---4320---(8)-(13)-(61.5385%)-(8)-(0)-(frq=8,zsc=0,asc=[Sep-20-(17:55)])---(Sep/20)-(17:07)---(Sep/20)-(17:55:08)---ETWNFILF---(Sep/20)-(17:55)---(Sep/20)-(17:55:08) NAGCzzmm

root@nagios-primary ~#
root@nagios-primary ~#

Monitoring logs of any format or size (has no limitations!)

1a). Full syntax/usage with more control


./logrobot localhost <default-dir> <feature> <logfile> <age> <str-1> <str-2> <WARNING> <CRITICAL> <tag> <option>

Example:

logrobot  localhost  /tmp/logXray  autonda  /var/log/kern.log  60m  'error'  '.'  1  2  app_err_monitor  -ndfoundn


Explanation of Parameters:

logrobot - This is the tool that does the work for you 

/var/tmp/logXray - This is the designated default directory where logrobot will process its data

autonda - This is the feature that allows logrobot to perform this particular auto-resolve task for you

/var/log/kern.log - This is the log file which is going to be scanned

To scan a directory, simply specify the directory path instead...i.e. /var/log

age - The age the monitored log file must be for it to be monitored

'error' - This is where you specify the string/pattern to look for in the log

Make sure there are no spaces in the patterns you specify.

For instance, to search for the pattern "error found in data", you can specify it this way:

'error.*found.*in.*data'

'.' - This is where you specify an additional pattern you wish to look for on the same line as the previous string

Useful if you want to filter out specific log entries

1 - This is the WARNING number of entries that must be found in the log before an alert is generated.

2 - This is the CRITICAL number of entries that must be found in the log before an alert is generated.

app_err_check - This is the tag name given to this particular log check

The name should describe the application/database or function that's writing to the log - Basically, give this a deserving name

-ndshow - When entries are found in the log, this option will show you those entries

-ndfoundn - When entries are found in the log, this option will NOT them - It will tell you the total count of the newest entries found matching your criteria

2a). Scan the same log file in multiple different directories

Example 1 - (this shows the matching entries found in each log):

Command:

./logrobot localhost /var/tmp/logXray,tail=10 autonda /usr/WebSphere/AppServer_ast_/profiles/paposa_ast_AppServer_ast_/logs/rmcosCluster1-paposa_ast_-node_ast_-server_ast_/SystemOut.log 60m 'Total.*time.*taken' '.' 1 1 testing1 -ndshow

CRITICAL: [/usr/WebSphere/AppServer_ast_/profiles/paposa_ast_AppServer_ast_/logs/rmcosCluster1-paposa_ast_-node_ast_-server_ast_/SystemOut.log][4]
/usr/WebSphere/AppServer2/profiles/paposa01AppServer02/logs/rmcosCluster1-paposa01-node2-server1/SystemOut.log:P=(2)_F=(13s,1s)_R=(39232,39253=21)
/usr/WebSphere/AppServer1/profiles/paposa01AppServer01/logs/rmcosCluster1-paposa01-node1-server2/SystemOut.log:P=(2)_F=(13s,6s)_R=(75789,75811=22)
/usr/WebSphere/AppServer2/profiles/paposa01AppServer02/logs/rmcosCluster1-paposa01-node2-server2/SystemOut.log:P=(2)_F=(13s,0s)_R=(105911,105932=21)

usr_WebSphere_AppServer2_profiles_paposa01AppServer02_logs_rmcosCluster1-paposa01-node2-server2_SystemOut.log:::
[11/16/16 13:48:41:722 PST] 000004e3 SystemOut O TOK : Total time taken to De-Tokenize a number is [12] ms.
[11/16/16 13:48:53:265 PST] 000004b6 SystemOut O TOK : Total time taken to De-Tokenize a number is [15] ms. 2

usr_WebSphere_AppServer2_profiles_paposa01AppServer02_logs_rmcosCluster1-paposa01-node2-server1_SystemOut.log:::
[11/16/16 13:48:43:915 PST] 000004f6 SystemOut O TOK : Total time taken to De-Tokenize a number is [17] ms.
[11/16/16 13:48:52:317 PST] 000004f6 SystemOut O TOK : Total time taken to De-Tokenize a number is [17] ms. 2

usr_WebSphere_AppServer1_profiles_paposa01AppServer01_logs_rmcosCluster1-paposa01-node1-server2_SystemOut.log:::
[11/16/16 13:48:45:693 PST] 000002e3 SystemOut O TOK : Total time taken to De-Tokenize a number is [14] ms.
[11/16/16 13:48:47:873 PST] 000002b2 SystemOut O TOK : Total time taken to De-Tokenize a number is [26] ms. 2

usr_WebSphere_AppServer1_profiles_paposa01AppServer01_logs_rmcosCluster1-paposa01-node1-server1_SystemOut.log::: 0

Example 2 - (this shows the total count of each matching entry in each log)

Command:

./logrobot localhost /var/tmp/logXray,tail=10 autonda /usr/WebSphere/AppServer_ast_/profiles/paposa_ast_AppServer_ast_/logs/rmcosCluster1-paposa_ast_-node_ast_-server_ast_/SystemOut.log 60m 'Total.*time.*taken' '.' 1 1 testing3 -ndfoundmul

CRITICAL: [/usr/WebSphere/AppServer_ast_/profiles/paposa_ast_AppServer_ast_/logs/rmcosCluster1-paposa_ast_-node_ast_-server_ast_/SystemOut.log][4] 

/usr/WebSphere/AppServer1/profiles/paposa01AppServer01/logs/rmcosCluster1-paposa01-node1-server2/SystemOut.log:P=(Total__time__taken=8)_F=(25s)_R=(76970,77031=61)
/usr/WebSphere/AppServer2/profiles/paposa01AppServer02/logs/rmcosCluster1-paposa01-node2-server1/SystemOut.log:P=(Total__time__taken=4)_F=(25s)_R=(40355,40503=148)
/usr/WebSphere/AppServer1/profiles/paposa01AppServer01/logs/rmcosCluster1-paposa01-node1-server1/SystemOut.log:P=(Total__time__taken=3)_F=(25s)_R=(23434,23467=33)
/usr/WebSphere/AppServer2/profiles/paposa01AppServer02/logs/rmcosCluster1-paposa01-node2-server2/SystemOut.log:P=(Total__time__taken=9)_F=(25s)_R=(106908,106997=89)

NOTE:

The '_P_' represents the pipe "|"(OR) symbol.  If using this tool as a log monitoring alert system, specifying "_P_" instead of "|" prevents unnecessary errors.

The default log file age limit is 60 minutes.  That means, the above commands will only scan log files that were modified/created within the last 60 minutes.

To change the age limit, see the full syntax example below...simply replace the 60m with whichever age you prefer

If no entries are found matching the patterns you specified, but you believe there should be, simply add a ".*" to the beginning and end of each pattern...i.e:

'.*error.*_P_.*panic.*_P_.*fail.*_P_.*fault.*'

3a). Scan an entire directory of logs, search those logs for specific pattterns/entries



[root@localhost jserver]# 
[root@localhost jserver]# time ./logrobot localhost /var/tmp/logXray autonda /var/log 60m 'error' '.' 1 2 appmon -ndfoundn
CRITICAL: [/var/log] maillog:P=(25)_F=(107s)_R=(0,281=281) up2date:P=(5)_F=(51s)_R=(0,73=73), Xorg.0.log:P=(1)_F=(197s)_R=(0,659=659) 

real 0m1.571s
user 0m0.694s
sys 0m0.637s

[root@localhost jserver]# 
[root@localhost jserver]# time ./logrobot localhost /var/tmp/logXray autonda /var/log 60m 'error' '.' 1 2 appmon -ndfoundn
OK: [/var/log] up2date:P=(0)_F=(5s)_R=(73,73=0) boot.log:P=(0)_F=(5s)_R=(58,58=0) cron:P=(0)_F=(5s)_R=(214,214=0) messages:P=(0)_F=(5s)_R=(643,643=0) dmesg:P=(0)_F=(5s)_R=(502,502=0) Xorg.0.log:P=(0)_F=(5s)_R=(659,659=0) maillog:P=(0)_F=(5s)_R=(281,281=0) pm-powersave.log:P=(0)_F=(5s)_R=(2,2=0) secure:P=(0)_F=(5s)_R=(13,13=0)

real 0m1.604s
user 0m0.674s
sys 0m0.634s

[root@localhost jserver]# 
[root@localhost jserver]# time ./logrobot localhost /var/tmp/logXray autonda /var/log/messages 60m 'error' '.' 1 2 appmsg -ndfoundn
OK: [/var/log/messages] /var/log/messages:P=(0)_F=(383s)_R=(0,643=643) 

real 0m1.331s
user 0m0.734s
sys 0m0.622s
[root@localhost jserver]#

4a). Monitor a single log file - Alert if log's size breaches Warning or Critical thresholds


[root@nagios-primary ~]# ./logrobot localhost /var/tmp/logXray autodoc /wms/prod/jdf/data/log/error 1GB 1.6GB filesize

OK: File [ /wms/prod/jdf/data/log/error ]. Current Size = [ 682.637MB 7 ]. Thresholds: [ W=1GB ] and [ C=1.6GB ].

[root@nagios-primary ~]# ./logrobot localhost /var/tmp/logXray autodoc /var/lib/nagios/retention.dat 80MB 100MB filesize

CRITICAL: File [ /var/lib/nagios/retention.dat ]. Current Size = [ 179.734MB ]. Thresholds: [ W=80MB ] and [ C=100MB ].

# Sending metrics to a graphite/graphing server:

[root@nagios001 ~]# ./logrobot localhost /var/tmp/logXray,graphite,52.88.12.122:2003,typical autonda /var/log/messages 60m 'nothing-to-search-for' '.' 1 2 LogGrowthChk -ndfoundn

5a). Monitor the size of a directory of logs - Specify which logs (.FDC) to Include in monitoring


The following command will alert if files are found with size greater than zero.

[root@nagios-primary ~]# ./logrobot localhost /var/tmp/logXray autodoc /var/mqm/errors,.FDC,12m 0B 0B filesize

CRITICAL: File [ /var/mqm/errors,.FDC,12m ]. Current Size = [ /var/mqm/errors/AMQ24835.0.FDC(repeat),27053(bytes),11 /var/mqm/errors/AMQ24834.0.FDC(repeat),27053(bytes),11 /var/mqm/errors/AMQ24821.0.FDC(repeat),81673(bytes),11 /var/mqm/errors/AMQ24832.0.FDC(repeat),26973(bytes),11 /var/mqm/errors/AMQ24827.0.FDC(repeat),27053(bytes),11 /var/mqm/errors/AMQ24826.0.FDC(repeat),26973(bytes),11 /var/mqm/errors/AMQ24833.0.FDC(repeat),27053(bytes),11 /var/mqm/errors/AMQ24828.0.FDC(repeat),27053(bytes),11 /var/mqm/errors/AMQ24836.0.FDC(repeat),27053(bytes),11 /var/mqm/errors/AMQ24825.0.FDC(repeat),26973(bytes),11 /var/mqm/errors/AMQ24831.0.FDC(repeat),26973(bytes),11 /var/mqm/errors/AMQ24830.0.FDC(repeat),27053(bytes),11 /var/mqm/errors/AMQ24829.0.FDC(repeat),27053(bytes),11 ]. Thresholds: [ W=0B ] and [ C=0B ].

6a). Monitor all files in a directory with size greater than 0 and newer than 1400 minutes


[root@nagios001 ~]# ./logrobot localhost /var/tmp/logXray autodoc /apps/scope/GAP/wmswave/cbs/logs/cores,1,*,1440m 0B 0B filesize

OK: File [ /apps/scope/GAP/wmswave/cbs/logs/cores,1,*,1440m ]. Current Size = [ no_problem_files_detected ]. Thresholds: [ W=0B ] and [ C=0B ].


[root@nagios001 ~]# ./logrobot localhost /var/tmp/logXray autodoc /apps/scope/GAP/wmswave/cbs/logs/cores,1,*,1440m 0B 0B filesize

CRITICAL: File [ /apps/scope/GAP/wmswave/cbs/logs/cores,1,*,1440m ]. Current Size = [ /apps/scope/GAP/wmswave/cbs/logs/cores/PkShipWaveS/core.10114,533901312(bytes),3m ]. Thresholds: [ W=0B ] and [ C=0B ].

Next time check runs, you'll see the word 'repeat' next to each file that has already been reported/alerted on

CRITICAL: File [ /apps/scope/GAP/wmswave/cbs/logs/cores,1,*,1440m ]. Current Size = [ /apps/scope/GAP/wmswave/cbs/logs/cores/PkShipWaveS/core.12263(repeat),592871424(bytes),7m ]. Thresholds: [ W=0B ] and [ C=0B ].

7a). Monitor a single log - Alert if the log stops growing in size...i.e. no new data being logged


[root@nagios-primary ~]# ./logrobot localhost /var/tmp/logXray autodoc /opt/apps/iptuibatch/logs/iptconflictCheck.log 1 5 filegrowth

CRITICAL: File [ /opt/apps/iptuibatch/logs/iptconflictCheck.log ]. Size Now = [ 744KB (Wed Dec 30 17:35:56 2015) ]. Size Before = [ 744KB (Wed Dec 30 17:35:55 2015) ].


[root@nagios-primary ~]# ./logrobot localhost /var/tmp/logXray autodoc /opt/apps/iptuibatch/logs/iptconflictCheck.log 1 5 filegrowth

OK: File [ /opt/apps/iptuibatch/logs/iptconflictCheck.log ]. Size Now = [ 752KB (752) (Wed Dec 30 17:37:55 2015) ]. Size Before = [ 744KB (Wed Dec 30 17:35:55 2015) ].

8a). Poll log files, graph log sizes over time - send recorded metrics to a graphing server


[root@nagios001 ~]# ./logrobot localhost /tmp/logXray,graphite,52.88.12.122:2003,typical autonda /var/log/messages 60m 'nothing-to-search-for' '.'  1 2 LogGrowthChk -ndfoundn

9a). Monitor local/remote logs - Alert if log's timestamp is older than X number of minutes


[root@monitor jbowman]#
[root@monitor jbowman]#
[root@monitor jbowman]# ./logxray localhost /var/tmp/logXray autodoc /var/log/syslog 10 20 -timestamp

OK: File = [ /var/log/syslog ]. Timestamp = [ 4s ] = [ 0d, 0h, 0.066m ago ]. Thresholds: [ W=(10m) / C=(20m) ].

[root@monitor jbowman]#
[root@monitor jbowman]#


[root@monitor jbowman]#
[root@monitor jbowman]# ./logxray logrobot001.phx.logrobot.com /var/tmp/logXray autodoc /var/log/syslog 10 20 -timestamp

OK: File = [ /var/log/syslog ]. Timestamp = [ 4s ] = [ 0d, 0h, 0.066m ago ]. Thresholds: [ W=(10m) / C=(20m) ].

[root@monitor jbowman]#

10a). Monitor time stamp of log files in a directory - Alert if older than X number of minutes



Case Scenario:

Monitor all files that have the pattern "gap_inc" in their names, under the /opt/apache/httpd-2/3/2/htdocs/pkicrlpub directory.

Alert as Warning if the age of any of the discovered file is at least 4 hours old but less than 8 hours.

Alert as Critical when the age of any of the discovered files is at least 8 hours old.

The _ast_ is used to denote "*"

Asterisks have the potential to cause problems, therefore, we allow users to use a predetermined string to reference them.

In other words, when having to specify the path to a log file with asterisks in it, replace the asterisks with "_ast_"

For example,

	This:

		/opt/apache/httpd-2.4.2/htdocs/pkicrlpub/*gap_inc*

	Becomes:

		/opt/apache/httpd-2/3/2/htdocs/pkicrlpub,_ast_gap_inc__ast_


[root@monitor jbowman]#
[root@monitor jbowman]#
[root@monitor jbowman]# ./logxray localhost /var/tmp/logXray autodoc /opt/apache/httpd-2/3/2/htdocs/pkicrlpub,_ast_gap_inc__ast_ 4h 8h timestamp

OK: [ /opt/apache/httpd-2.4.2/htdocs/pkicrlpub/gap_inc_stores_issuing_ca_g1.crl,age=(0d/0h/39.6m ago) /opt/apache/httpd-2.4.2/htdocs/pkicrlpub/gap_inc_corp_root_ca_g1.crl,age=(0d/0h/39.6m ago) /opt/apache/httpd-2.4.2/htdocs/pkicrlpub/gap_inc_corp_issuing_ca_g1.crl,age=(0d/0h/39.6m ago) /opt/apache/httpd-2.4.2/htdocs/pkicrlpub/gap_inc_corp_intermediate_ca_g1.crl,age=(0d/0h/39.6m ago) ].

[root@monitor jbowman]#
[root@monitor jbowman]#

11a). Monitor local/remote logs - Alert if log's timestamp is older than X sec/min/hours


[root@monitor jbowman]#
[root@monitor jbowman]#
[root@monitor jbowman]# ./logxray localhost /var/tmp/logXray autodoc /var/log/syslog 10 20 -timestamp

OK: File = [ /var/log/syslog ]. Timestamp = [ 4s ] = [ 0d, 0h, 0.066m ago ]. Thresholds: [ W=(10m) / C=(20m) ].

[root@monitor jbowman]#
[root@monitor jbowman]#


[root@monitor jbowman]#
[root@monitor jbowman]# ./logxray logrobot001.phx.logrobot.com /var/tmp/logXray autodoc /var/log/syslog 10 20 -timestamp

OK: File = [ /var/log/syslog ]. Timestamp = [ 4s ] = [ 0d, 0h, 0.066m ago ]. Thresholds: [ W=(10m) / C=(20m) ].

[root@monitor jbowman]#

LoGrobot: Who needs it?! Download LoGrobot if you wish to...

Monitor & Alert

Monitor & Alert

Dissect & Analyze

Monitor & Graph

Monitor & Auto-Resolve

Quick, Easy & Customizable

Consolidate Log Monitoring

Additional Benefits

Simplified Log File Monitoring

How-To Videos on Common Tasks

Print X Number Around Pattern

Directory Log Monitor

Growth Monitoring

File Size / Log Size

Directory File Count Monitor

File / Log Timestamp

What is a Log File Monitor

Monitoring logs in time frames (if format is supported)

Monitoring logs of any format or size (has no limitations!)