Home

Most Common Log Monitoring Scenarios (on Linux, AIX, SunOS systems)

root@jbowman-prodmark001:~#
root@jbowman-prodmark001:~#
root@jbowman-prodmark001:~# time ./logrobot localhost /var/tmp/logXray,tail=20 autonda /var/log/kern.log 60m 'kernel'  '.'  3 5 HealthAAA  -ndshow

CRITICAL: [/var/log/kern.log]  /var/log/kern.log:P=(526)_F=(1039s)_R=(0,526=526) 
==========================
var_log_kern.log
==========================
Aug 22 18:04:57 jbowman-prodmark001 kernel: [   13.658169] floppy0: no floppy controllers found
Aug 22 18:04:57 jbowman-prodmark001 kernel: [   13.916595] bridge: automatic filtering via arp/ip/ip6tables has been deprecated. Update your scripts to load br_netfilter if you need this.
Aug 22 18:04:57 jbowman-prodmark001 kernel: [   13.982259] Bridge firewalling registered
Aug 22 18:04:57 jbowman-prodmark001 kernel: [   14.086714] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
Aug 22 18:04:57 jbowman-prodmark001 kernel: [   14.407732] ip_tables: (C) 2000-2006 Netfilter Core Team
Aug 22 18:04:59 jbowman-prodmark001 kernel: [   16.125814] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
Aug 22 18:04:59 jbowman-prodmark001 kernel: [   16.473851] audit_printk_skb: 27 callbacks suppressed
Aug 22 18:04:59 jbowman-prodmark001 kernel: [   16.473853] audit: type=1400 audit(1471914299.905:21): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="docker-default" pid=1804 comm="apparmor_parser"
Aug 22 18:06:45 jbowman-prodmark001 kernel: [  119.749920] VbglR0HGCMInternalCall: vbglR0HGCMInternalDoCall failed. rc=-2
Aug 22 18:06:45 jbowman-prodmark001 kernel: [  119.749940] VBOXGUEST_IOCTL_HGCM_CALL: 64 Failed. rc=-2.
Aug 22 18:06:48 jbowman-prodmark001 kernel: [  123.458137] VbglR0HGCMInternalCall: vbglR0HGCMInternalDoCall failed. rc=-2
Aug 22 18:06:48 jbowman-prodmark001 kernel: [  123.458157] VBOXGUEST_IOCTL_HGCM_CALL: 64 Failed. rc=-2.
Aug 22 18:06:50 jbowman-prodmark001 kernel: [  125.363359] VbglR0HGCMInternalCall: vbglR0HGCMInternalDoCall failed. rc=-2
Aug 22 18:06:50 jbowman-prodmark001 kernel: [  125.363376] VBOXGUEST_IOCTL_HGCM_CALL: 64 Failed. rc=-2.
Aug 22 18:08:15 jbowman-prodmark001 kernel: [  210.462603] VbglR0HGCMInternalCall: vbglR0HGCMInternalDoCall failed. rc=-2
Aug 22 18:08:15 jbowman-prodmark001 kernel: [  210.462650] VBOXGUEST_IOCTL_HGCM_CALL: 64 Failed. rc=-2.
Aug 22 18:13:12 jbowman-prodmark001 kernel: [  507.043987] VbglR0HGCMInternalCall: vbglR0HGCMInternalDoCall failed. rc=-2
Aug 22 18:13:12 jbowman-prodmark001 kernel: [  507.044004] VBOXGUEST_IOCTL_HGCM_CALL: 64 Failed. rc=-2.
Aug 22 21:49:24 jbowman-prodmark001 kernel: [13479.849309] chrome[4578]: segfault at fbadbeef ip 82b625c1 sp bfdfab40 error 7 in chrome[8002e000+6484000]
526

real    0m0.137s
user    0m0.008s
sys     0m0.000s
root@jbowman-prodmark001:~#
root@jbowman-prodmark001:~#
root@jbowman-prodmark001:~# time ./logrobot localhost /var/tmp/logXray,tail=20 autonda /var/log/kern.log 60m 'STDERR|kernel|panic|failure|ERROR|segfault'  '.'  3 5 HealthAAA  -ndshow
OK: [/var/log/kern.log]  /var/log/kern.log:P=(0)_F=(3s,1042s)_R=(526,526=0) 

real    0m0.184s
user    0m0.004s
sys     0m0.004s
root@jbowman-prodmark001:~#
root@jbowman-prodmark001:~#
root@jbowman-prodmark001:~#

[root@nagios001 ~]# time ./logrobot localhost /tmp/logXrayTesting,tail=10 autonda /var/log/nagios 60m 'error_P_fatal_P_panic_P_defaulted' '.'  1 2 errorchk -ndshow    

CRITICAL: [/var/log/nagios][7]  nagios.debug:P=(239)_F=(0)_R=(lnsrd=(46387)=lnct~(2953260)) nagios.log:P=(=6)_F=(5)_R=(lnsrd=(43247)=lnct~(478954))
==========================
var_log_nagios_apache_error.log
==========================
0
==========================
var_log_nagios_eventhandler.log
==========================
0
==========================
var_log_nagios_apache_access.log
==========================
0
==========================
var_log_nagios_servicenow.log
==========================
0
==========================
var_log_nagios_nagios.log
==========================
[1471453524] SERVICE ALERT: mosmq001.logXray.com;MQ Events MOS;OK;HARD;1;(gearman001.logXray.com) - OK: Queue defaulted to OK
[1471453524] SERVICE NOTIFICATION: HQ-GID-PROD-OPS-FULFILLMENT-DG;mosmq001.logXray.com;MQ Events MOS;OK;service-notify-by-email;(gearman001.logXray.com) - OK: Queue defaulted to OK
[1471453524] SERVICE NOTIFICATION: bigpanda;mosmq001.logXray.com;MQ Events MOS;OK;notify-service-by-bigpanda;(gearman001.logXray.com) - OK: Queue defaulted to OK
[1471472594] SERVICE ALERT: mosmq001.logXray.com;MQ Events MOS;OK;HARD;1;(gearman001.logXray.com) - OK: Queue defaulted to OK
[1471472594] SERVICE NOTIFICATION: HQ-GID-PROD-OPS-FULFILLMENT-DG;mosmq001.logXray.com;MQ Events MOS;OK;service-notify-by-email;(gearman001.logXray.com) - OK: Queue defaulted to OK
[1471472594] SERVICE NOTIFICATION: bigpanda;mosmq001.logXray.com;MQ Events MOS;OK;notify-service-by-bigpanda;(gearman001.logXray.com) - OK: Queue defaulted to OK
6
==========================
var_log_nagios_nagios.debug
==========================
[1471472348.855956] [032.2] [pid=22873] Default interval: 0.000000
[1471472348.856004] [032.2] [pid=22873] Default interval: 0.000000
[1471472594.070888] [032.2] [pid=22873] Processed notification command: /usr/bin/printf %b "Wed Aug 17 15:23:14 PDT 2016 - MQ Events MOS OK\n\nmosmq001.logXray.com  RECOVERY\n\n(gearman001.logXray.com) - OK: Queue defaulted to OK\n\n\n\nHOST: mosmq001.logXray.com" | /bin/mail -s '** RECOVERY - mosmq001.logXray.com - MQ Events MOS - OK' HQ-GID-PROD-OPS-FULFILLMENT-DGlogXray.com
[1471472594.229644] [032.2] [pid=22873] Processed notification command: /usr/bin/bigpanda-notification -queue_max_file_count 10000 HOSTNAME="mosmq001.logXray.com" HOSTSTATE="UP" HOSTOUTPUT="(gearman001.logXray.com) - OK - 10.107.69.173: rta 0.153ms, lost 0%" LONGHOSTOUTPUT="" SERVICEDESC="MQ Events MOS" SERVICESTATE="OK" SERVICEOUTPUT="(gearman001.logXray.com) - OK: Queue defaulted to OK" HOSTGROUPNAMES="splunk-forwarder,servers,prod,mqm_mos,monitoring_mqm_v8,monitoring_gid_support,gapMQSeriesServer,chefclient12_4_1,base,all" LASTSERVICESTATECHANGE="1471472589" LASTHOSTSTATECHANGE="1468177826" LONGSERVICEOUTPUT="" NOTIFICATIONTYPE="RECOVERY" SERVICEGROUPNAMES=""
[1471472594.365237] [032.2] [pid=22873] Default interval: 60.000000
[1471472824.855164] [032.2] [pid=22873] Default interval: 60.000000
[1471473314.250327] [032.2] [pid=22873] Default interval: 0.000000
[1471473479.284913] [032.2] [pid=22873] Default interval: 0.000000
[1471473744.558849] [032.2] [pid=22873] Default interval: 0.000000
239
==========================
var_log_nagios_livestatus.log
==========================
0
 
real    0m1.345s
user    0m0.139s
sys     0m0.057s
[root@nagios001 ~]#
[root@nagios001 ~]#
[root@nagios001 ~]# ./nlm logxray host:port /tmp/logXrayTestingJ autonda /var/log/messages 60m 'error_P_fatal_P_panic_P_defaulted' '.'  1 2 systemAAerrorsA -ndshow

Showing just the total count:

[root@nagios001 ~]# 
[root@nagios001 ~]# ./logrobot localhost /var/tmp/logXray autonda /var/log/messages 60m 'error_P_fatal_P_panic_P_defaulted' '.'  1 2 systemAAerrorsA -ndfoundmul

OK: [/var/log/messages][1] /var/log/messages:P=(error=0 panic=0 defaulted=0 fatal=0)_F=(0)_R=(339(lnsrd)=0,339(rnge),alnct(686867)) 

[root@nagios001 ~]# 
[root@nagios001 ~]# 
[root@nagios001 ~]# tail /var/log/messages
Aug 28 10:43:23 nagios001 xinetd[3050]: START: livestatus pid=32321 from=::ffff:10.107.68.135
Aug 28 10:43:23 nagios001 xinetd[3050]: EXIT: livestatus status=0 pid=32321 duration=0(sec)
Aug 28 10:43:23 nagios001 xinetd[3050]: START: livestatus pid=32324 from=::ffff:10.107.68.135
Aug 28 10:43:23 nagios001 xinetd[3050]: EXIT: livestatus status=0 pid=32324 duration=0(sec)
Aug 28 10:43:23 nagios001 xinetd[3050]: START: livestatus pid=32327 from=::ffff:10.107.68.135
Aug 28 10:43:23 nagios001 xinetd[3050]: EXIT: livestatus status=0 pid=32327 duration=0(sec)
Aug 28 10:43:23 nagios001 xinetd[3050]: START: livestatus pid=32330 from=::ffff:10.107.68.135
Aug 28 10:43:24 nagios001 xinetd[3050]: EXIT: livestatus status=0 pid=32330 duration=1(sec)
Aug 28 10:43:24 nagios001 xinetd[3050]: START: livestatus pid=32333 from=::ffff:10.107.68.135
Aug 28 10:43:24 nagios001 xinetd[3050]: EXIT: livestatus status=0 pid=32333 duration=0(sec)
[root@nagios001 ~]# 
[root@nagios001 ~]# 
[root@nagios001 ~]# 
[root@nagios001 ~]# ./logrobot localhost /var/tmp/logXray autonda /var/log/messages 60m 'error_P_fatal_P_panic_P_defaulted_P_.*xinetd.*' '.'  1 2 SystemsErrCheck -ndfoundmul 

CRITICAL: [/var/log/messages][1] /var/log/messages:P=(_xinetd_=71129)_F=(0s)_R=(0,0=0) 

[root@nagios001 ~]#
Case Scenario:

In all log files discovered by the command 'ls -ltcr /wms/prod/*/data/download_files/Rcv*.out', search for the pattern 'Orace.*Error'.  When found, consider this as critical and output all lines found to the screen. Search only files that were created or updated within the last 30 minutes.

Command to run for the above scenario:

[root@nagios001 ~]# time ./nlm logxray host:port /var/tmp/logXray autonda /wms/prod/*/data/download_files/Rcv*.out 30m 'Oracle.*Error' '.' 1 1 rcv_out_sdc -ndshow

(OR)

[root@nagios001 ~]# time ./nlm logxray host:port /var/tmp/logXray autonda /wms/prod/_ast_/data/download_files/Rcv_ast_.out 30m 'Oracle.*Error' '.' 1 1 rcv_out_sdc -ndshow
From: Reshma XXXXXX <XXXXXX@XXXXXX>
Date: Friday, August 12, 2016 at 2:53 PM
To: LoGrobot Support <Support@LoGrobot.com>
Cc: XXXXX XXXXXXX <XXXXXXX@XXXXX>
Subject: RE: MONOPS-3356 - Main Frame RH Log Monitoring

Hi

Currently our logs have these two error messages which happen for two different reasons

1.      'INVALID PREPACK NUMBER INPUT'

2.      'INVALID PREPACK NUMBER'.

The second one is what we want to capture , we do not  want alert for the first one . Since Second one is the substring of the first , I was not sure if you will be able to capture the exact string . Because of which I provided the second option ((PGM ERROR CODE | A25)) to uniquely define.

If we can do the exact match on the string and create the alert we would be good , need not match on the second option. If exact match is not possible we would need both

On the log along with the next 8 lines which follow the pattern can we also have the first 6 lines which precede this pattern?
 
Let me know in case of any queries

Thanks
Reshma


From: LoGrobot Support <Support@LoGrobot.com>
Date: Monday, August 15, 2016 at 3:39 PM
To: Reshma XXXXXX <XXXXXX@XXXXXX>
Cc: XXXXX XXXXXXX <XXXXXXX@XXXXX>
Subject: Re: MONOPS-3356 - MFRH Log Monitoring

Hey Reshma,

As per our discussion on Lync.  I ran the tests on the old log and it pulled out the expected lines.

So basically, it looks like your log file contains spaces after the pattern 'NUMBER'.  So that needed to be accounted for.  Therefore, LoGrobot will be checking for two scenarios here:

1.      It will check to see if the log file contains any line that contains this pattern (ERROR.*MESSAGE.*INVALID.*PREPACK.*NUMBER) , with NUMBER being the last set of patterns/characters on the line
2.      Then it will check to see if any lines are found containing the same above pattern, but with spaces or tabs at the end.

If either of the two conditions above are met, you'll get an alert similar to the following:

bash-4.1#
bash-4.1# time ./logxray localhost /tmp/logXrayMyTest autonda /tmp/test.log 30m 'ERROR.*MESSAGE.*INVALID.*PREPACK.*NUMBER_DS__P_ERROR.*MESSAGE.*INVALID.*PREPACK.*NUMBER_LB__LB_:blank:_RB__RB__ast__DS_' '.' 1:2 1:2 apip_lgchecker -ndshowgetlineba

NEW CHECK: Should recover on next run. [ 52M ] instance(s) of string(s) found.  Tag = [ apip_lgchecker ]. Date & Time = [ Mon Aug 15 15:33:56 2016 ]. Logfile = [ /tmp/test.log ]. Log Size [ 52M ]. Actual Logfile Line Count = [ lnct~(24132) ].                            Est.Type = [ overestimated ]. Estimation off by = [ ~(0.001%-3%) ].  OUTPUT: [ 

PRJ2P01**2016-07-07-17.00.56.5454** ERROR KEY          _ 2c09e8-582d-4dae-8425-6K6b4e7_263410641 00Z2 S
PRJ2P01**2016-07-07-17.00.56.5454** ERROR MESSAGE      _ INVALID PREPACK NUMBER                  
PRJ2P01**2016-07-07-17.00.56.5454** ERROR CODE         _ 100 
PRJ2P01**2016-07-07-17.00.56.5454** ERROR KEY          _ 2c09e8-582d-4dae-8425-6K6b4e7_263410641 00Z2 S
PRJ2P01**2016-07-07-17.00.56.5454** ERROR MESSAGE      _ INVALID PREPACK NUMBER                  
PRJ2P01**2016-07-07-17.00.56.5454** ERROR CODE         _ 100 
PRJ2P01**2016-07-07-17.00.56.5454** ERROR KEY          _ 2c09e8-582d-4dae-8425-6K6b4e7_263410641 00Z2 S        
PRJ2P01**2016-07-07-17.00.56.5454** ERROR MESSAGE      _ INVALID PREPACK NUMBER                  
PRJ2P01**2016-07-07-17.00.56.5454** ERROR CODE         _ 100 
PRJ2P01**2016-07-07-17.00.56.5454** ERROR KEY          _ 2c09e8-582d-4dae-8425-6K6b4e7_263410641 00Z2 S  
PRJ2P01**2016-07-07-17.00.56.5454** ERROR MESSAGE      _ INVALID PREPACK NUMBER     
PRJ2P01**2016-07-07-17.00.56.5454** ERROR CODE         _ 100 

4 ]. | New_Patterns=lnsrd=(24026);1:2;1:2;0; New_Entries=lnct~(24132);;;;lnct~(24132) Error_Percent=% Time_Frame=0s;;;;0s Total_Log_Entries=lnct~(24132);;;;lnct~(24132) Rate_of_Update=0lps;;;;0lps Log_Size=estdiff~(3);;;;estdiff~(3)

real    0m0.907s
user    0m0.329s
sys     0m0.095s

bash-4.1#
bash-4.1# 
bash-4.1# 
bash-4.1# time ./logxray localhost /tmp/logXrayMyTest autonda /tmp/test.log 30m 'ERROR.*MESSAGE.*INVALID.*PREPACK.*NUMBER_LB__LB_:blank:_RB__RB__ast__DS_' '.' 1:2 1:2 apip_lgcheck -ndshowgetlineba

OK: [ 0 ] instance(s) of [ ERROR.*MESSAGE.*INVALID.*PREPACK.*NUMBER_LB__LB_:blank:_RB__RB__ast__DS_.*. ] found in log [ /tmp/test.log ]. Scan time = [ Aug-15-(15:30)-2016 (to) Aug-15-(15:31)-2016 ; 15s(time)s ]. Scan range =  [ 0,0(rnge),alnct(24132) ; 0(lnsrd) ].  Log Size = [ 52M ]. Tag = [ apip_lgcheck ]. OUTPUT: [  ]. | New_Patterns=0;1:2;1:2;0; New_Entries=0;;;;0 Error_Percent=0% Time_Frame=15s;;;;15s Total_Log_Entries=0;;;;0 Rate_of_Update=0lps;;;;0lps Log_Size=52M;;;;52M

real    0m0.620s
user    0m0.246s
sys     0m0.066s
bash-4.1# uptime
15:31:16 up 65 days,  1:42, 28 users,  load average: 1.68, 5.29, 5.73
bash-4.1# 
bash-4.1#

In the command I ran above, here are the important parameters:
 
ERROR.*MESSAGE.*INVALID.*PREPACK.*NUMBER_DS__P_ERROR.*MESSAGE.*INVALID.*PREPACK.*NUMBER_LB__LB_:blank:_RB__RB__ast__DS_
   ---  This is the pattern/string being searched for.

_DS_ = The _DS_ means dollar sign.  It tells the log monitor to look for lines containing ERROR.*MESSAGE.*INVALID.*PREPACK.*NUMBER, where NUMBER is the LAST word/string/character on the lines

_P_  = The _P_ means OR. It's an alternative/safer way to specify multiple patterns to look for.  We allow users to specify OR this way in cases where their applications may be misusing the pipe "|" symbol.

ERROR.*MESSAGE.*INVALID.*PREPACK.*NUMBER_LB__LB_:blank:_RB__RB__ast__DS_ = This is the second string being look for.

Without the substitutions, it would look like this:

ERROR.*MESSAGE.*INVALID.*PREPACK.*NUMBER[[:blank:]]*$

With this string, we're basically saying if there only spaces or tabs after the string 'ERROR.*MESSAGE.*INVALID.*PREPACK.*NUMBER', then alert.

1:2   =  This is the warning value.  If at least 1 entry is found matching the above string, grab the 2 lines that came BEFORE the string (the lines to be grabbed in the actual check will be 6, as you requested)
1:2   =  This is the critical value.  If at least 1 entry is found matching the above string, grab the 2 lines that came AFTER the string (the lines to be grabbed in the actual check will be 8).

If WARNING and CRITICAL are the same value, CRITICAL takes precedence...which means, check will go straight to CRITICAL if either the WARNING or CRITICAL threshold is breached.
[root@nagios001 ~]# time ./nlm logxray host:port /var/tmp/logXray autonda /opt/apps/tokemon/logs/_ast_.log 120m 'User' 'Tokemon.*:.*2.*:.*De-tokenization.*Requested' 10 10 tokenization_logchk -ndfoundauditast

Other common log monitoring scenarios