Monitoring Linux Processes


As I had some issues with my Linode server related to mistuned MariaDB settings, I was forced to find a way to monitor a Linux process, such as httpd, mysqld and php. Not only did I need to know if they were running, how many of them were running, but also their cpu and memory usage, so I could tune my Apache settings (located at /etc/httpd/conf/httpd.conf). I hoped to find a plugin which did all of the above, but couldn’t find one. The plugin that came closest to what I needed, was this one written bij Eli Keimig. 

As the last release date was 08/11/2010 and it missed some crucial features, I decided to make it better. At the moment I added the following features:

  • Performance data for Linux process CPU usage.
  • Performance data for Linux process Memory usage.
  • Added Linux process count with performance data.
  • Improved the plugin output.
  • Added minimum and maximum Linux process count.

How to monitor a Linux process?

The plugin uses ‘ps’ to retrieve the Linux process information. Logged in as root, type the following in your terminal to show active processes on the server:

The a option tells ps to list the processes of all users on the system rather than just those of the current user, with the exception of group leaders and processes not associated with a terminal. A group leader is the first member of a group of related processes.

The u option tells ps to provide detailed information about each process.

The x option adds to the list processes that have no controlling terminal, such as daemons, which are programs that are launched during boot and run unobtrusively in the background until they are activated by a particular event or condition.

As the list of processes can be quite long and occupy more than a single screen, the output of ps aux can be piped (transferred) to the less command, which lets it be viewed one screen full at a time. The output can be advanced one screen forward by pressing the SPACE bar and one screen backward by pressing the b key.

With the -C parameter you can specify the Linux process for which to show information.

And you can specify what specific information to show with the -o parameter:

After joining the results with paste and making the sum with bc, we get the result we want.

Check out this screenshot which shows information about the httpd, mysqld, nagios and php processes.

Linux process

This information can really help troubleshoot LAMP configuration issues. I haven’t got a lot of time to produce a decent post, but I’ll extend this post when I find some more time. As it’s a Bash script I’m guessing it doesn’t need to much explanation to get it working in Nagios.



There seems to be quite some development work done on NSClient++ lately by Michael Medin, as you can see in this GitHub commit graph.


As I’m still on, I though it was time to make a little review on the latest (nightly) version of NSClient++, which is at this moment To be honest, I’m not looking forward to migrating all our old NSClient installations to later versions. As the nsclient.ini configuration has changed drastically, this will imply I will have some work to migrate everything without issues.

There aren’t really any alternatives at the moment. As far as I know NSClient++ is still the only client offering real-time eventlog monitoring capabilities and this is imho a must-have. 

So in this review I will go through all my old checks, and check out if they are still working in Please not that this is a nightly build and is not fit for production environments yet.


So download the latest version of NSClient++ and start the installation.
Choose the generic monitoring option. 


Choose custom setup:


Set the ip address of your Nagios server in the allowed hosts field and a strong password in the password field. You will need this passsword later to log in to the website. For now, choose the ‘Insecure legacy mode’ option. In order to use the ‘Safe mode’ and ‘Secure’ mode, you will have to install NSClient on your Nagios server too, but if you are only monitoring internal servers (not over the Internet), the ‘Insecure legacy mode’ should be ok. I’ll try to make a post about the other modes in the future.


Click next and NSClient++ will finish the installation.In order to understand al the default options and settings, it’s generally a good idea to add the default settings to the nsclient.ini file. This can be done with the following command from the NSClient++ installation folder:

In my case, with a fresh install, this generated some errors, but I’m quite sure this won’t be a real issue. Probabaly just related to the nightly build


So I was curious to see if I could get the NSClient++ webserver working, as in my previous tests (0.4.3.x) I never got it to work properly.

As checked the ‘Enable Web server’ checkbox, I was expecting it to work out of the box, which was not the case. Browsing to https://localhost:8443/ resulted in an ‘ERR_CONNECTION_REFUSED’ error.

So I had a look through the nsclient.ini and noticed that although I did enable it in the installation I still found this:

So after setting it to 1 and restarting the nscp service, I was able to log in with the password I configured during the installation. The webserver is using a self-signed certificate, which is better then nothing. If you have a certificate authority, you should be able to generate secure certificates so don’t get the ‘red cross’ in your browser.



After logging in, you immediately arrive in the Home webpage with some basic information, such as CPU Load, Processes, threads, handles and uptime


It looks a bit like a remake of the Windows Task Manager, but with a little less information.


There seems to be no X-axis information in the NSClient PCU webpage. The interval used in the Task Manager is one second, while in the NSclient webpage it is  seconds, which of course results in slightly different results. 

The NSClient webpage is of course accessible from ‘anywhere’ if set up correctly, which is definitely a plus. 

One more small remark, is that Michael seems to have chosen ‘CPU Load’ as the name which represents the CPU utilization. Imho this is quite confusing as on Linux servers, CPU load is more a value representing the current CPU queue. As NSClient is supposed to also work on Linux servers now, I think it should be named ‘CPU Usage’ (which is a bit shorter then ‘CPU Utilization’)

Besides CPU info, there is also some memory information:


And a list of 38 metrics. I think these are all the metrics NSClient++ is caching, enabling it to calculate nice averages instead of current values.



The second menu item ‘Modules’ lists all the available modules and their state. 


So I tried checking an extra module to see if it is changed in the nsclient.ini, but apart from the checkbox being checked, nothing really changed. 


As there is almost no documentation about the new webserver, I tried some things myself, but to no effect. I’m not quite sure what the reload and shutdown actions are supposed to do.


I’ve tested this and it does not restart the nscp service. Shutdown doesn’t really seem to do anything yet.

And then I suddenly noticed that a new menu item appears ‘Changes’, which allows me to Save or Undo the configuration.


It felt a bit weird that this menu items just appeared out of nowhere. Maybe it would better if it was always there, but with a green icon or when there are no detected changes. Something else I noticed is that when loading a module, you cannot enable this module unless you save it first.

In the nsclient.ini the modules I activated were properly adjusted. the only weird thing is that changes done with the web gui are using ‘enabled or diasbled’, while changes done in commandline, such as generating the defaults are using ‘0’ or ‘1’ to disable or enable a module. It would be nice if this was somehow more consistent.


The settings menu seems to need some work, as I saw a lot of ‘TODO’ and ‘Unknown’ strings for several items.

Also, I’m not quite sure what the ‘Changed’, ‘Basic’, ‘Advanced’ tabs are supposed to do.



The queries menu gives an overview of all possible queries. 


When you click on a query, you are linked to the module which enable you to use this query and you are able to see a ‘Help’ file with the usable arguments for the selected query.


And it seems Michael also enables us to test a query:


Which is a very nice feature. It would be nice to see a list of more complex working examples.


The Log menu gives a nice filterable overview of the NSClient logfile:



Similar to the Logs menu, the Console menu gives also a filterable overview of all console messages.


(Almost) Final words

The features I just listed are just a few of the many new exiting features in the new NSClient++. The webserver has a nice gui and is a nice preview of things to come. Thanks a lot Michael for sharing your work with the world.

I will continue writing on this review when I find the time.

Nagios XI Docker Container


I think most of you have heard of Docker. It’s free, ist’s fast, there are a lot of prebuild packages, in short: it’s the playground we’ve all been waiting for.
But when I searched for a Nagios XI docker container, there seemed to be no such thing….
Therefore I build one myself to experiment with.

So if you want to play with Nagios XI 5, check out Docker Hub and fire up a container within minutes

For those not that familiar with docker, there is a bunch of helpfull information on the Docker site itself:
Make sure to check out how to install docker and take som time to look at the different Docker Run options.



Monitoring Windows Disk Load


Monitoring disk load is one of the harder things to monitor, but also one of the most crucial things you should monitor. Disk load problems can really give your applications a hard time, slowing them down or crippling them completely. On Linux servers it’s easy, as the CPU wait counter gives clear hints of issues with your disk io.

I rolled out check_diskstat on our Linux servers in September 2014  and really missed a similar plugin for monitoring disk load on Windows servers. Hence, I started thinking about a new Powershell script, which would use the Powershell command ‘get-counter’, to gather all disk related information from the Performance Monitor. I started with making a list of the requirements:

  • The main requirement was that it had to be multilingual, as I work on English and Dutch versions of Windows Server 2003, 2003 R2, 2008 and 2008 R2. 
  • Another requirement was that the script had to allow an argument that specifies the amount of samples over which an average could be calculated.
  • The perfdata output should be outputted in a way where all disk load related values had to be visible in a graph. I had to deal with very high values, eg 8763098004 and very small decimals, eg 0,00014. This implied I had to find some way to make it visually attractive and correct in Highcharts, for example by outputting in milliseconds instead of seconds or megabytes instead of bytes.
  • The plugin also had to work culture independent. Some culture use ‘,’ and other use ‘.’ as decimal. I solved this by replacing [System.Threading.Thread]::CurrentThread.CurrentCulture with ‘en-US’ ans setting it back to the original value once I’m done.

Monitoring disk load may be useful in finding the cause of performance issues. If a component of an application starts writing huge logs or big amounts of data in a database on your Windows disks, a bottleneck could be created in your application’s flow. This bottleneck could quickly result in any kind of lag, latency or slowness for end-users, resulting in more incidents, calls or complaints. An integral part of the job as monitoring engineer, is to avoid  situations as described above. Here Nagios can help you, by alerting you before applications start getting slow. Up until now, the only way to monitor performance counters for Windows servers, was using an agent like NSClient++ (or NCPA?) to retrieve one performance counter. My check_ms_windows_disk_load plugin enables you to combine several disk load related performance counters with only one service. This method has several advantages:

  • You don’t need to worry what counters to monitor. The plugin will do that for you.
  • As the plugin monitors 8 performance counters, and you only need one service, this would save you 7 services for each disk. So your Nagios server has less work, which enables you to monitor other stuff instead or increase the monitor interval on your checks.
  • As you can pass maxsamples (-ms or –MaxSamples) as a parameter, you can choose yourself how long you want the plugin to run before calculating averages. Each sample should be one second.

You could also prove to your application engineers that the storage is or is not the cause of their application’s performance. You can use comprehensive graphs visualizing a collection of disk performance related information. You also need knowledge about your disk load in order to choose the right disk type for the job. Are your 3TB SATA disks strong enough to handle the job or will you have to buy more expensive SSD’s to achieve the performance you need?

How to monitor your disk load?

  1. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell.
  2. In the nsclient.ini configuration file, define the script like this:

  3. Make a command in Nagios like this:

  4. Configure your service in Nagios. Make use of the above created command. Configure something similar like this as $ARG1$:


One day after everything is configured correctly, your Highcharts graphs should look like this:

disk load graph 01

If you want to test the load on your Windows disks, you can use this Storage Load Generator DiskSPD from Microsoft to play. (Yes Microsoft has a GitHub account!!)

I hope this plugin can help you monitor the disk load on your Windows hosts. Please rate it on the Nagios Exchange if you like my work.

Monitoring MS SharePoint Health



Error: Your Requested widget " wp-github-commits-19" is not in the widget list.
  • [do_widget_area sidebar-1]
    • [do_widget_area sidebar-2]
      • [do_widget id="tag_cloud-2"]
    • [do_widget_area widgets_for_shortcodes]
      • [do_widget id="wp-github-commits-14"]
      • [do_widget id="wp-github-commits-17"]
      • [do_widget id="wp-github-commits-15"]
      • [do_widget id="wp-github-commits-25"]
      • [do_widget id="wp-github-commits-3"]
      • [do_widget id="wp-github-commits-13"]
      • [do_widget id="wp-github-commits-20"]
      • [do_widget id="wp-github-commits-16"]
      • [do_widget id="wp-github-commits-2"]
      • [do_widget id="wp-github-commits-24"]
      • [do_widget id="wp-github-commits-23"]
      • [do_widget id="wp-github-commits-21"]
      • [do_widget id="wp-github-commits-5"]
      • [do_widget id="wp-github-commits-6"]
      • [do_widget id="wp-github-commits-26"]
      • [do_widget id="wp-github-commits-9"]
      • [do_widget id="wp-github-commits-22"]
      • [do_widget id="wp-github-commits-19"]
      • [do_widget id="wp-github-commits-8"]
      • [do_widget id="wp-github-commits-7"]
      • [do_widget id="wp-github-commits-18"]
      • [do_widget id="wp-github-commits-11"]
      • [do_widget id="wp-github-commits-4"]
      • [do_widget id="wp-github-commits-12"]
      • [do_widget id="wp-github-commits-27"]
    • [do_widget_area wp_inactive_widgets]
      • [do_widget id="recent-posts-2"]
      • [do_widget id="recent-comments-2"]


    SharePoint is a web application platform in the Microsoft Office server suite. Launched in 2001, SharePoint combines various functions which are traditionally separate applications: intranet, extranet, content management, document management, personal cloud, enterprise social networking, enterprise search, business intelligence, workflow management, web content management, and an enterprise application store.
    SharePoint Health Analyzer is a feature in Microsoft SharePoint Foundation 2010 that enables administrators to schedule regular, automatic checks for potential configuration, performance, and usage problems in the server farm. Any errors that SharePoint Health Analyzer finds are identified in status reports that are made available to farm administrators in Central Administration. Status reports explain each issue, list the servers where the problem exists, and outline the steps that an administrator can take to remedy the problem.
    SharePoint Health Analyzer monitors the farm by applying a set of health rules. A number of these rules ship with SharePoint Foundation. You can create and deploy additional rules by writing code that uses the SharePoint Foundation object model. When a health rule executes, SharePoint Health Analyzer creates a status report and adds it to the Health Analyzer Reports list in the Monitoring section of Central Administration.
    This plugin will create a PSObject for each item in the status report that has no ‘Success’ severity and return a critical state if any problems are found, together with information about each problem in the status report, such as the failed service and date modified. 


    How to use check_ms_sharepoint_health?

    1. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell, enabling you to use this Reactor action to update your plugins folder without having to edit the script.
    2. In the nsclient.ini configuration file, define the script like this:
      check_ms_sharepoint health=cmd /c echo scripts/powershell/check_ms_sharepoint_health.ps1; exit $LastExitCode | powershell.exe -command –
    3. Make a command in Nagios like this:
      check_ms_sharepoint_health => $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -t 60 -c check_ms_sharepoint_health 
    4. Configure your service in Nagios, make use of the above created command.