Real-time Eventlog Monitoring with Nagios and NSClient++

Introduction to real-time eventlog monitoring

NSClient++ has a very powerful component that enables you to achieve real-time eventlog monitoring on Windows systems. This feature requires passive monitoring of Windows eventlogs via NSCA or NRDP.

The biggest benefits of real-time eventlog monitoring are:

  • It can help you find problems faster (real-time), as NSClient++ will send the events with NSCA the moment it occurs.
  • It is much more resource efficient then using active checks for monitoring eventlogs. It actually requires fewer resources on both the Nagios server, as on the client where NSClient is running!
  • There is no need to search through every application’s documentation, as you can just catch all the errors and filter them out if not needed.

The biggest drawbacks of real-time eventlog monitoring are:

  • As it are passive services, new events will overwrite the previous event, which could cause you to miss a problem on your Nagios dashboards. 
  • You need  a dedicated database table to store the real-time eventlog exclusions. 
  • You will need some basic scripting skills to automate building the real-time eventlog exclusion string in the NSClient configuration file.

General requirements for using real-time eventlog monitoring

NSCA Configuration of your NSClient++

As NSClient++’s real-time eventlog monitoring component will send the events passively to you Nagios server, you will need to setup NSCA. Please read through this documentation for configuring NSCA in NSClient++.

NSCA Configuration of your Nagios server

NSCA also requires some configuration on your Nagios server. Please read through this documentation for configuring NSCA in Nagios Core or this documentation for configuring NSCA in Nagios XI.

Passive services for each Windows host on your Nagios server

Each Windows host needs at least one passive service, which is able to accept the filtered Windows eventlogs. You can make as much of them as you require. I choose to use one for all application eventlog errors and one for all system eventlog errors:

Real-Time Eventlog Monitoring Passive Services

A database to store your real-time eventlog exclusions

If you want to generate a real-time eventlog exclusion filter, you need to somehow store a combination of hostnames, event id’s and event sources. We are using MSSQL at the moment and generate the exclusions with Powershell. This database needs at least a servername, eventlog, eventid, eventsource and comment column. The combination of those allow you to make an exclusion for almost any type of Windows event.

Real-time Eventlog Monitoring Exclusion Database

Some sort of automation software which can be called with a Nagios XI quick action

Thanks to Nagios XI quick actions, you can quickly exclude noisy events by updating the NSClient++ configuration file with the correct filter. With the correct customization and scripts, this allows you to create a self-learning system. For this to work, you basically need one script which will store a new real-time eventlog exclusion in a database and another which generates the NSClient++ configuration file with the latest combination of real-time eventlog exclusions. We are using Rundeck, a free and open source automation tool to execute the above jobs.

Detailed NSClient ++ configuration

Minimal nsclient.ini ‘modules’ settings:

Minimal nsclient.ini ‘NSCA’ settings:

The above configuration doesn’t use any encryption. Once your tests work out, I advise you to configure some sort of encryption to prevent hackers from sniffing your NSCA packets. Please note that at this moment (31/05/17) the official Nagios NSCA project does not support aes, only Rijndael. This GitHub issue has been created to fix this problem. You’ll have to use one of the other less strong encryption methods at the moment.

Example nsclient.ini ‘eventlog’ settings:

This is an example configuration for getting real-time eventlog monitoring to work. Please note that this has been tested on NSClient++ 0.5.1.28. I’m not 100 % sure it works on earlier versions.

The above configuration template is just an example. As you can see it contains a DUMMYAPPLICATIONFILTER and a DUMMYSYSTEMFILTER. You can easily replace these with the generated exclusion filter. A few examples of how such a filter might look:

(id NOT IN (1,3,10,12,13,23,26,33,37,38,58,67,101,103,104,107,108,110,112,274,502,511,1000,1002,1004,1005,1009,1010,1026,1027,1053,1054,1085,1101,1107,1116,1301,1325,1334,1373,1500,1502,1504,1508,1511,1515,1521,1533)) AND (id NOT IN (1509) OR source NOT IN ('Userenv')) AND (id NOT IN (1055) OR source NOT IN ('Userenv')) AND (id NOT IN (1030) OR source NOT IN ('Userenv')) AND (id NOT IN (1006) OR source NOT IN ('Userenv')) 

Or

(id NOT IN (1,3,4,5,8,9,10,11,12,15,19,27,37,39,50,54,56,137,1030,1041,1060,1066,1069,1071,1111,1196,3621,4192,4224,4243,4307,5722,5723)) AND (id NOT IN (36888) OR source NOT IN ('Schannel')) AND (id NOT IN (36887) OR source NOT IN ('Schannel')) AND (id NOT IN (36874) OR source NOT IN ('Schannel')) AND (id NOT IN (36870) OR source NOT IN ('Schannel')) AND (id NOT IN (12292) OR source NOT IN ('VSS')) AND (id NOT IN (7030) OR source NOT IN ('ServiceControlManager')) 

Only errors which are not filtered by the real-time eventlog filters such as the examples above will be sent to your Nagios passive services.

Multiple NSCA Targets

This is an nsclient.ini config file where two NSCA targets are defined. This can be useful in scenarios where a backup Nagios server needs to be identical as the primary Nagios server:

How to generate errors in your Windows eventlogs?

In order to test, you will need a way to debug and hence a way to generate errors with specific sources or id’s. You can do this very easily with Powershell:

If you get an error saying that the source passed with the above command does not exist, you can create it like this:

Or another way:

(Almost) Final Words

As I can hear some people think “why don’t you post the code to generate the real-time eventlog exclusion filter?”. Well, the answer is simple, I don’t have the time to clean up all the code, so it doesn’t contain any sensitive information. But as a special gift for all my blog readers who got to the end of this post, I’ll post a snippet of the exclusion generating Powershell code here. The rest you will have to make your self for now.

I will open the comments section for now, but please only use it for constructive information. 

Grtz

Willem

Monitoring Microsoft Windows Updates

Introduction

Monitoring WSUS updates on Microsoft Windows Server is critical to ensure you get alerted when your systems need to be patched. The process to update Windows Updates on high priority servers implies proper planning to ensure no post-installation problems. If we could trust Microsoft patches for 100 %, installing WSUS updates on a system would be done the moment a maintenance schedule could be created for this system. Unfortunately in my personal experience, WSUS updates are more a cause of problems instead of a solution. That’s why we prefer to not install them too fast, as you might experience major issues with your production systems or with the software that is running on it. A recent example, a colleague accidentally patched some production SharePoint servers, which prohibited the creation of new sitecollections and caused issues with some icons. The only solution was to restore a backup…

Ideally the updates would first need to get tested on QA systems. If the QA servers are running for some times without issues, the production systems can get patched. The above is one of the reasons I spent some time combining the best features from the available Windows Update plugins on the Nagios Exchange.
Such as Christian Kaufmann’s idea to cache the list of Windows Updates into a file. This results in a much lower performance impact of the plugin on the servers you are monitoring. If you have any experience with WSUS updates, you will have noticed that the ‘TrustedInstaller.exe” process which is a MS Windows system process that takes care of querying the WSUS server and installing updates if requested. 

The plugin will count all available WSUS updates and output the count in every possible state. However it will only alert in case a set number of days have passed since the last successful update was installed. By using this method, you can then define a policy and agree to patch all systems which had no updates for a certain time. You could use different policies for QA and PR (production) systems to prevent problems. 

WSUS

 

Details

Some things you need to know about Windows Updates. Microsoft saves the date of the ‘last successful update’ in the registry. The location of the String Value is:

This date however is saved in the Greenwich Mean Time (GMT) or the Coordinated Universal Time (UTC) format. My plugin will try to translate this time to the local time format with the help of a function called Get-LocalTime. This function uses the [System.TimeZoneInfo] .NET class which is only usable if you have .NET 3.5 or higher. So keep in mind the ‘Last Successful Update’ date is in UTC format for servers where .NET 3.5 or higher is not installed.

The plugin will also check this registry key:

And give a warning if the system has a required reboot pending.

PSWindowsUpdate

Starting from Windows 10, Microsoft apparently decided to no longer make use of the above registry key. The only way I found to retrieve the last successful update date and time is with the help of the PSWindowsUpdate module. So I added another argument which allows you to select a different method named ‘PSWindowsUpdate’ to retrieve the necessary information. Please not that the default method is still the original method, I called ‘UpdateSearcher”

In order for this method to work, you will need to install the PSWindowsUpdate module in this location: C:\Windows\System32\WindowsPowerShell\v1.0\Modules. If you are using Powershell 5 you can just do:

I’ve included the 1.5.1.11 and 1.5.2 version of the module in the GitHub repository. Or you can download it on the Microsoft Script Center Repository.

How to monitor your WSUS updates?

  1. Please note that the default DaysBeforeWarning and DaysBeforeCritical parameters are set to 120 and 150. Feel free to adjust them as required or pass them as an argument.
  2. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell.
  3. In the nsclient.ini configuration file, define the script like this:
  4. Make a command in Nagios like this:
  5. Configure your service in Nagios. Make use of the above created command. Configure something similar like this as $ARG1$:
    QA servers =>

    PR servers =>

  6. If you want to make use of the new ‘PSWindowsUpdate’ method you will need to have an argument like this:

(Almost) Final words

So why did I create another pluging to check WSUS updates? Because I’m using a system which completely automates Windows Update installation with the help of Nagios XI and Rundeck. The existing plugins did not meet my requirements.

Please note that there are several known issues with WSUS on some operating systems. It’s recommended to always update to the latest ‘Windows Update Client’. Please check Windows 8.1 and Windows Server 2012 R2 update history for more information. More specific, when using WIndows Server 2012 R2, you will really want the following KB’s:

  • KB3172614 => “July 2016 update rollup for Windows 8.1 and Windows Server 2012 R2”
  • KB3179574 => “August 2016 update rollup for Windows 8.1 and Windows Server 2012 R2”
  • KB3185279 => “September 2016 update rollup for Windows 8.1 and Windows Server 2012 R2”

When you don’t have these update rollup’s, checking  for updates and updating your Windows 2012 R2 systems could go very slow. In our case an update check could take up to 40 minutes instead of 10 seconds. 

Let me know on the Nagios Exchange what you think of my plugin by rating it or submitting a review. Please also consider starring the project on GitHub.

NSClient++ 0.5.0.7

Introduction

There seems to be quite some development work done on NSClient++ lately by Michael Medin, as you can see in this GitHub commit graph.

nsclient

As I’m still on 0.4.1.105, I though it was time to make a little review on the latest (nightly) version of NSClient++, which is at this moment 0.5.0.7. To be honest, I’m not looking forward to migrating all our old NSClient installations to later versions. As the nsclient.ini configuration has changed drastically, this will imply I will have some work to migrate everything without issues.

There aren’t really any alternatives at the moment. As far as I know NSClient++ is still the only client offering real-time eventlog monitoring capabilities and this is imho a must-have. 

So in this review I will go through all my old 0.4.1.105 checks, and check out if they are still working in 0.5.0.7. Please not that this is a nightly build and is not fit for production environments yet.

Installation

So download the latest version of NSClient++ and start the installation.
Choose the generic monitoring option. 

nsclient-0.5.0.7-installation-01

Choose custom setup:

nsclient-0.5.0.7-installation-02

Set the ip address of your Nagios server in the allowed hosts field and a strong password in the password field. You will need this passsword later to log in to the website. For now, choose the ‘Insecure legacy mode’ option. In order to use the ‘Safe mode’ and ‘Secure’ mode, you will have to install NSClient on your Nagios server too, but if you are only monitoring internal servers (not over the Internet), the ‘Insecure legacy mode’ should be ok. I’ll try to make a post about the other modes in the future.

nsclient-0.5.0.7-installation-03

Click next and NSClient++ will finish the installation.In order to understand al the default options and settings, it’s generally a good idea to add the default settings to the nsclient.ini file. This can be done with the following command from the NSClient++ installation folder:

In my case, with a fresh install, this generated some errors, but I’m quite sure this won’t be a real issue. Probabaly just related to the nightly build 0.5.0.7.

Webserver

So I was curious to see if I could get the NSClient++ webserver working, as in my previous tests (0.4.3.x) I never got it to work properly.

As checked the ‘Enable Web server’ checkbox, I was expecting it to work out of the box, which was not the case. Browsing to https://localhost:8443/ resulted in an ‘ERR_CONNECTION_REFUSED’ error.

So I had a look through the nsclient.ini and noticed that although I did enable it in the installation I still found this:

So after setting it to 1 and restarting the nscp service, I was able to log in with the password I configured during the installation. The webserver is using a self-signed certificate, which is better then nothing. If you have a certificate authority, you should be able to generate secure certificates so don’t get the ‘red cross’ in your browser.

nsclient-0.5.0.7-webserver-01

Home

After logging in, you immediately arrive in the Home webpage with some basic information, such as CPU Load, Processes, threads, handles and uptime

nsclient-0.5.0.7-webserver-02

It looks a bit like a remake of the Windows Task Manager, but with a little less information.

nsclient-0.5.0.7-webserver-03

There seems to be no X-axis information in the NSClient PCU webpage. The interval used in the Task Manager is one second, while in the NSclient webpage it is  seconds, which of course results in slightly different results. 

The NSClient webpage is of course accessible from ‘anywhere’ if set up correctly, which is definitely a plus. 

One more small remark, is that Michael seems to have chosen ‘CPU Load’ as the name which represents the CPU utilization. Imho this is quite confusing as on Linux servers, CPU load is more a value representing the current CPU queue. As NSClient is supposed to also work on Linux servers now, I think it should be named ‘CPU Usage’ (which is a bit shorter then ‘CPU Utilization’)

Besides CPU info, there is also some memory information:

nsclient-0.5.0.7-webserver-04

And a list of 38 metrics. I think these are all the metrics NSClient++ is caching, enabling it to calculate nice averages instead of current values.

nsclient-0.5.0.7-webserver-05

Modules

The second menu item ‘Modules’ lists all the available modules and their state. 

nsclient-0.5.0.7-webserver-06

So I tried checking an extra module to see if it is changed in the nsclient.ini, but apart from the checkbox being checked, nothing really changed. 

nsclient-0.5.0.7-webserver-07

As there is almost no documentation about the new webserver, I tried some things myself, but to no effect. I’m not quite sure what the reload and shutdown actions are supposed to do.

nsclient-0.5.0.7-webserver-08

I’ve tested this and it does not restart the nscp service. Shutdown doesn’t really seem to do anything yet.

And then I suddenly noticed that a new menu item appears ‘Changes’, which allows me to Save or Undo the configuration.

nsclient-0.5.0.7-webserver-10

It felt a bit weird that this menu items just appeared out of nowhere. Maybe it would better if it was always there, but with a green icon or when there are no detected changes. Something else I noticed is that when loading a module, you cannot enable this module unless you save it first.

In the nsclient.ini the modules I activated were properly adjusted. the only weird thing is that changes done with the web gui are using ‘enabled or diasbled’, while changes done in commandline, such as generating the defaults are using ‘0’ or ‘1’ to disable or enable a module. It would be nice if this was somehow more consistent.

Settings

The settings menu seems to need some work, as I saw a lot of ‘TODO’ and ‘Unknown’ strings for several items.

Also, I’m not quite sure what the ‘Changed’, ‘Basic’, ‘Advanced’ tabs are supposed to do.

nsclient-0.5.0.7-webserver-11

Queries

The queries menu gives an overview of all possible queries. 

nsclient-0.5.0.7-webserver-12

When you click on a query, you are linked to the module which enable you to use this query and you are able to see a ‘Help’ file with the usable arguments for the selected query.

nsclient-0.5.0.7-webserver-13

And it seems Michael also enables us to test a query:

nsclient-0.5.0.7-webserver-16

Which is a very nice feature. It would be nice to see a list of more complex working examples.

Log

The Log menu gives a nice filterable overview of the NSClient logfile:

nsclient-0.5.0.7-webserver-14

Console

Similar to the Logs menu, the Console menu gives also a filterable overview of all console messages.

nsclient-0.5.0.7-webserver-15

(Almost) Final words

The features I just listed are just a few of the many new exiting features in the new NSClient++. The webserver has a nice gui and is a nice preview of things to come. Thanks a lot Michael for sharing your work with the world.

I will continue writing on this review when I find the time.

Monitoring Microsoft Exchange 2010 Mailbox

 
 

Error: Your Requested widget " wp-github-commits-6" is not in the widget list.
  • [do_widget_area sidebar-1]
    • [do_widget_area sidebar-2]
      • [do_widget id="tag_cloud-2"]
    • [do_widget_area widgets_for_shortcodes]
      • [do_widget id="wp-github-commits-14"]
      • [do_widget id="wp-github-commits-17"]
      • [do_widget id="wp-github-commits-15"]
      • [do_widget id="wp-github-commits-25"]
      • [do_widget id="wp-github-commits-3"]
      • [do_widget id="wp-github-commits-13"]
      • [do_widget id="wp-github-commits-20"]
      • [do_widget id="wp-github-commits-16"]
      • [do_widget id="wp-github-commits-2"]
      • [do_widget id="wp-github-commits-24"]
      • [do_widget id="wp-github-commits-23"]
      • [do_widget id="wp-github-commits-21"]
      • [do_widget id="wp-github-commits-5"]
      • [do_widget id="wp-github-commits-6"]
      • [do_widget id="wp-github-commits-26"]
      • [do_widget id="wp-github-commits-9"]
      • [do_widget id="wp-github-commits-22"]
      • [do_widget id="wp-github-commits-19"]
      • [do_widget id="wp-github-commits-8"]
      • [do_widget id="wp-github-commits-7"]
      • [do_widget id="wp-github-commits-18"]
      • [do_widget id="wp-github-commits-11"]
      • [do_widget id="wp-github-commits-4"]
      • [do_widget id="wp-github-commits-12"]
      • [do_widget id="wp-github-commits-27"]
    • [do_widget_area wp_inactive_widgets]
      • [do_widget id="recent-posts-2"]
      • [do_widget id="recent-comments-2"]
     

    Introduction

    I based this scipt on the MS Exchange 2010 DAG Mailbox check by Matt Haynes, but made the following changes:

    • Recovery mailbox databases are excluded. This way we don’t get any false alerts when colleagues are restoring a backup to a temporary recovery database.
    • For MS Exchange, backups are crucial, as your logs will grow very fast when they don’t get backupped. Therefore a warning is generated if the last backup date is over 26 hours and an error is generated if it was over 50 hours. (these time periods can be altered, depending on the total time your backup software requires to backup all your mailbox databases)
    • Performance data is gathered from the amount of mounted, healthy, and unhealthy mailbox databases.
    • If no mailbox databases are found on the Exchange server, an error is also generated.

    I would like to thank my colleague, De Clerck John, who created the exclusion for the recovery mailbox database and the check for last backup. His knowledge of MS Exchange and Powershell seems to be unlimited. As previously said, the script will count all healthy, failed and mounted mailbox databases and output performance data. This way, you can easily see when exactly something went wrong with your mailbox databases:

    check-ms-exchange-2010-health-graph-01

    In this screenshot it appears we had an issue on 28 April. In fact our Exchange administrators are busy migrating the mailbox databases to different MS Exchange servers, which is also clear in the next days of the graph.

    exchange

    How to monitor your MS Exchange databases?

    1. Copy the ‘check_ms_exchange_2010_health.ps1’ Powershell script to the NSClient++ scripts folder, preferably in a sub-folder ‘Powershell’ on all the Exchange 2010 servers you wish to monitor.
    2. In the nsclient.ini configuration file, define the external command like this (and restart the NSClient++ service (nscp) afterwards):
    3. Make a command in Nagios like this:
    4. Configure your service in Nagios. Use the command you previously made. Argument 1 should be the hostname of the MS Exchange server

    Grtz Willem