Search multiple log files at once - Alert / Show detected entries from each log

Monitoring multiple logs(on Linux, AIX, SunOS systems)

Installation:

1. Right-click and Copy the download URL to the right
2. Go to your Unix box and wget the URL:
	wget <the-generated-zip-url-link>
3. After download, unzip the zip file
	unzip <the-downloaded-zip-file>
4. Change directory to EnScryption.com
	cd EnScryption.com
5. To automatically install the script, simply run it:
   ./<script-name>
NOTE:	The first run of the script will set it up under the directory /var/tmp/EnScryption.com/SHIELDX-<script-name>
	After the first run, you can now proceed to run your script as normal.
Example 1 - (this shows the matching entries found in each log under /var/log):

./logrobot.kl.sh /var/log 'error_P_panic_P_fail_P_fault' -ndshow

Example 2 - (this shows the total count of each matching entry in each log found)

./logrobot.kl.sh /var/log 'error_P_panic_P_fail_P_fault' -ndfoundmul

NOTE:

The '_P_' represents the pipe "|"(OR) symbol.  If using this tool as a log monitoring alert system, specifying "_P_" instead of "|" prevents unnecessary errors.

The default log file age limit is 60 minutes.  That means, the above commands will only scan log files that were modified/created within the last 60 minutes.

To change the age limit, see the full syntax example below...simply replace the 60m with whichever age you prefer

If no entries are found matching the patterns you specified, but you believe there should be, simply add a ".*" to the beginning and end of each pattern...i.e:

'.*error.*_P_.*panic.*_P_.*fail.*_P_.*fault.*'
Example 1 - (this shows the matching entries found in each log):

Command:

	./logrobot.kl.sh localhost /var/tmp/logXray,tail=10 autonda /usr/WebSphere/AppServer_ast_/profiles/paposa_ast_AppServer_ast_/logs/rmcosCluster1-paposa_ast_-node_ast_-server_ast_/SystemOut.log 60m 'Total.*time.*taken' '.' 1 1 testing3 -ndshow

CRITICAL: [/usr/WebSphere/AppServer_ast_/profiles/paposa_ast_AppServer_ast_/logs/rmcosCluster1-paposa_ast_-node_ast_-server_ast_/SystemOut.log][4]
/usr/WebSphere/AppServer2/profiles/paposa01AppServer02/logs/rmcosCluster1-paposa01-node2-server1/SystemOut.log:P=(2)_F=(13s,1s)_R=(39232,39253=21)
/usr/WebSphere/AppServer1/profiles/paposa01AppServer01/logs/rmcosCluster1-paposa01-node1-server2/SystemOut.log:P=(2)_F=(13s,6s)_R=(75789,75811=22)
/usr/WebSphere/AppServer2/profiles/paposa01AppServer02/logs/rmcosCluster1-paposa01-node2-server2/SystemOut.log:P=(2)_F=(13s,0s)_R=(105911,105932=21)

usr_WebSphere_AppServer2_profiles_paposa01AppServer02_logs_rmcosCluster1-paposa01-node2-server2_SystemOut.log:::
[11/16/16 13:48:41:722 PST] 000004e3 SystemOut O TOK : Total time taken to De-Tokenize a number is [12] ms.
[11/16/16 13:48:53:265 PST] 000004b6 SystemOut O TOK : Total time taken to De-Tokenize a number is [15] ms. 2

usr_WebSphere_AppServer2_profiles_paposa01AppServer02_logs_rmcosCluster1-paposa01-node2-server1_SystemOut.log:::
[11/16/16 13:48:43:915 PST] 000004f6 SystemOut O TOK : Total time taken to De-Tokenize a number is [17] ms.
[11/16/16 13:48:52:317 PST] 000004f6 SystemOut O TOK : Total time taken to De-Tokenize a number is [17] ms. 2

usr_WebSphere_AppServer1_profiles_paposa01AppServer01_logs_rmcosCluster1-paposa01-node1-server2_SystemOut.log:::
[11/16/16 13:48:45:693 PST] 000002e3 SystemOut O TOK : Total time taken to De-Tokenize a number is [14] ms.
[11/16/16 13:48:47:873 PST] 000002b2 SystemOut O TOK : Total time taken to De-Tokenize a number is [26] ms. 2

usr_WebSphere_AppServer1_profiles_paposa01AppServer01_logs_rmcosCluster1-paposa01-node1-server1_SystemOut.log::: 0

Example 2 - (this shows the total count of each matching entry in each log)

Command: 

	./logrobot.kl.sh localhost /var/tmp/logXray,tail=10 autonda /usr/WebSphere/AppServer_ast_/profiles/paposa_ast_AppServer_ast_/logs/rmcosCluster1-paposa_ast_-node_ast_-server_ast_/SystemOut.log 60m 'Total.*time.*taken' '.' 1 1 testing3 -ndfoundmul

CRITICAL: [/usr/WebSphere/AppServer_ast_/profiles/paposa_ast_AppServer_ast_/logs/rmcosCluster1-paposa_ast_-node_ast_-server_ast_/SystemOut.log][4] 

/usr/WebSphere/AppServer1/profiles/paposa01AppServer01/logs/rmcosCluster1-paposa01-node1-server2/SystemOut.log:P=(Total__time__taken=8)_F=(25s)_R=(76970,77031=61)
/usr/WebSphere/AppServer2/profiles/paposa01AppServer02/logs/rmcosCluster1-paposa01-node2-server1/SystemOut.log:P=(Total__time__taken=4)_F=(25s)_R=(40355,40503=148)
/usr/WebSphere/AppServer1/profiles/paposa01AppServer01/logs/rmcosCluster1-paposa01-node1-server1/SystemOut.log:P=(Total__time__taken=3)_F=(25s)_R=(23434,23467=33)
/usr/WebSphere/AppServer2/profiles/paposa01AppServer02/logs/rmcosCluster1-paposa01-node2-server2/SystemOut.log:P=(Total__time__taken=9)_F=(25s)_R=(106908,106997=89)

NOTE:

The '_P_' represents the pipe "|"(OR) symbol.  If using this tool as a log monitoring alert system, specifying "_P_" instead of "|" prevents unnecessary errors.

The default log file age limit is 60 minutes.  That means, the above commands will only scan log files that were modified/created within the last 60 minutes.

To change the age limit, see the full syntax example below...simply replace the 60m with whichever age you prefer

If no entries are found matching the patterns you specified, but you believe there should be, simply add a ".*" to the beginning and end of each pattern...i.e:

'.*error.*_P_.*panic.*_P_.*fail.*_P_.*fault.*'
./logrobot.kl.sh localhost <default-dir> <feature> <logfile> <age> <str-1> <str-2> <WARNING> <CRITICAL> <tag> <option>
Example:

logrobot.kl.sh  localhost  /tmp/logXray  autonda  /var/log/kern.log  60m  'error'  '.'  1  2  app_err_monitor  -ndfoundn
Explanation of Parameters:

logrobot.kl.sh - This is the tool that does the work for you - (it is a very limited free version of the logXray tool)

/var/tmp/logXray - This is the designated default directory where logrobot will process its data

autonda - This is the feature that allows logrobot to perform this particular auto-resolve task for you

/var/log/kern.log - This is the log file which is going to be scanned

To scan a directory, simply specify the directory path instead...i.e. /var/log

age - The age the monitored log file must be for it to be monitored

'error' - This is where you specify the string/pattern to look for in the log

Make sure there are no spaces in the patterns you specify.

For instance, to search for the pattern "error found in data", you can specify it this way:

'error.*found.*in.*data'

'.' - This is where you specify an additional pattern you wish to look for on the same line as the previous string

Useful if you want to filter out specific log entries

1 - This is the WARNING number of entries that must be found in the log before any action, script or command can be run on a host

If this number is not breached, the command specified for the WARNING will not run

2 - This is the CRITICAL number of entries that must be found in the log before any action, script or command can be run on a host

If this number is not breached, the command specified for the CRITICAL will not run

app_err_check - This is the tag name given to this particular log check

The name should describe the application/database or function that's writing to the log - Basically, give this a deserving name

-ndshow - When entries are found in the log, this option will show you those entries

-ndfoundn - When entries are found in the log, this option will NOT them - It will tell you the total count of the newest entries found matching your criteria

[root@localhost jserver]# 
[root@localhost jserver]# 
[root@localhost jserver]# time ./logrobot.kl.sh localhost /var/tmp/logXray autonda /var/log 60m 'error' '.' 1 2 appmon -ndfoundn
CRITICAL: [/var/log] maillog:P=(25)_F=(107s)_R=(0,281=281) up2date:P=(5)_F=(51s)_R=(0,73=73), Xorg.0.log:P=(1)_F=(197s)_R=(0,659=659) 

real 0m1.571s
user 0m0.694s
sys 0m0.637s

[root@localhost jserver]# 
[root@localhost jserver]# 
[root@localhost jserver]# time ./logrobot.kl.sh localhost /var/tmp/logXray autonda /var/log 60m 'error' '.' 1 2 appmon -ndfoundn
OK: [/var/log] up2date:P=(0)_F=(5s)_R=(73,73=0) boot.log:P=(0)_F=(5s)_R=(58,58=0) cron:P=(0)_F=(5s)_R=(214,214=0) messages:P=(0)_F=(5s)_R=(643,643=0) dmesg:P=(0)_F=(5s)_R=(502,502=0) Xorg.0.log:P=(0)_F=(5s)_R=(659,659=0) maillog:P=(0)_F=(5s)_R=(281,281=0) pm-powersave.log:P=(0)_F=(5s)_R=(2,2=0) secure:P=(0)_F=(5s)_R=(13,13=0)

real 0m1.604s
user 0m0.674s
sys 0m0.634s

[root@localhost jserver]# 
[root@localhost jserver]# 
[root@localhost jserver]# 
[root@localhost jserver]# time ./logrobot.kl.sh localhost /var/tmp/logXray autonda /var/log/messages 60m 'error' '.' 1 2 appmsg -ndfoundn
OK: [/var/log/messages] /var/log/messages:P=(0)_F=(383s)_R=(0,643=643) 

real 0m1.331s
user 0m0.734s
sys 0m0.622s
[root@localhost jserver]#
About the Tool:

Syntax:

./autodrgrep.kl.sh <type-of-log> <logfile> <date-range-or-how-log-ago> '<string1>' '<string2>' 5 10 -show

Usage:

./autodrgrep.kl.sh notchef /tmp/client.log '2016-05-08_19:12:00,2016-05-08_21:13:00' 'INFO' 'a2ensite' 5 10 -show

2---78720---23---ATWFILF---(2016-05-08)-(19:12)---(2016-05-08)-(21:13) SEAGM

[root@monitor jbowman]#

So now lets break this down:

autodrgrep.kl.sh is the tool name.

notchef is an option that is passed to the logrobot tool to tell it what to do. In this particular case, it is telling the tool what type of log file /tmp/client.log is.

/tmp/client.log is of course the log file.

2016-05-08_19:12:00,2016-05-08_21:13:00 is the range of date from within the log that you wish to scan

"INFO" is one of the strings that is in the lines of logs that you're interested in.

"a2ensite" is another string on the same line that you expect to find the "INFO" string on. Specifying these two strings (INFO and a2ensite) isolates and processes the lines you want a lot quicker, particularly if you're dealing with a huge log file.

5 specifies Warning. By specifying 5, you're telling the program to alert as WARNING if there are at least 5 occurrences of the search strings you specified

10 specifies Critical. By specifying 10, you're telling the program to alert as CRITICAL if there are at least 10 occurrences of the search strings you specified.

-show specifies what type of response you'll get. By specifying -shown, you're saying if anything is found that matches the specified patterns, output to screen.

Summarized Explanation:

As you can see, the logrobot tool is monitoring a log file. The arguments that are passed to the tool instructs it to do the following:

Get all entries written to the log between the dates '2016-05-08_19:12:00' AND '2016-05-08_21:13:00'.  If the tool finds less than 5 occurrences of the specified strings in the log file, DO NOT alert. If the tool finds between 5 to 9 occurrences of the specified strings in the log, it'll alert with a WARNING. If the tool discovers 10 or more instances of the strings in the log within the specified date range, it'll alert with a CRITICAL.

Now, let us look at the result of the command:

2---78720---23---ATWFILF---(2016-05-08)-(19:12)---(2016-05-08)-(21:13) SEAGM

There are 6 columns which are separated by 3 hyphens (---). The first column shows the exit code of the command you just ran. 0 means all is well. 1 means WARNING, which means, LOGROBOT discovered conditions that fell under the WARNING specification you provided. 2 means CRITICAL, which means, the worst case scenario has been reached.

In this particular example, here's what the output is telling us: 

You requested to have the /tmp/client.log file scanned for a specific date.

The date range that was scanned was from '2016-05-08_19:12:00' AND '2016-05-08_21:13:00'. After scanning through the records that were written to the log in that time frame, LOGROBOT found 23 lines that contained both strings of "INFO" and "a2ensite".

ATWFILF means that the actual date range or time frame you requested searched was found in the log.  So this is very good.

ETWNFILF means the actual date range or time frame you requested searched was NOT found in the log.  In this case, the closest time to the time you specified will be detected and used instead.