Monitoring Network Connections Nagios 2

Monitoring Windows Network Connections

Introduction

Monitoring the network connections on your Windows servers can be crucial to examine server load and investigate bottlenecks and anomalies. There are many ways to monitor your network connections. This blog post will go into detail of some of the tools that can be used to achieve optimal monitoring of your Windows network connections.

How To monitor your Windows Network Connections?

PerfMon

In the Windows Performance Monitor, you can find several counters for all kinds network connections. This set of counters is available for TCPv4 and TCPv6 connections.

Counter NameCounter Description
Connection FailuresConnection Failures is the number of times TCP connections have made a direct transition to the CLOSED state from the SYN-SENT state or the SYN-RCVD state, plus the number of times TCP connections have made a direct transition to the LISTEN state from the SYN-RCVD state.
Connections ActiveConnections Active is the number of times TCP connections have made a direct transition to the SYN-SENT state from the CLOSED state. In other words, it shows a number of connections which are initiated by the local computer. The value is a cumulative total.
Connections EstablishedConnections Established is the number of TCP connections for which the current state is either ESTABLISHED or CLOSE-WAIT.
Connections PassiveConnections Passive is the number of times TCP connections have made a direct transition to the SYN-RCVD state from the LISTEN state. In other words, it shows a number of connections to the local computer, which are initiated by remote computers. The value is a cumulative total.
Connections ResetConnections Reset is the number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state.
Segments Received/secSegments Received/sec is the rate at which segments are received, including those received in error. This count includes segments received on currently established connections.
Segments Retransmitted/secSegments Retransmitted/sec is the rate at which segments are retransmitted, that is, segments transmitted containing one or more previously transmitted bytes.
Segments Sent/secSegments Sent/sec is the rate at which segments are sent, including those on current connections, but excluding those containing only retransmitted bytes.
Segments/secSegments/sec is the rate at which TCP segments are sent or received using the TCP protocol.

At the moment there seems to be no Performance Monitor counter available  in Windows to show the UDP connection count.  Although the Windows Performance Monitor is an easy choice to have a quick glance at how many TCP connections are currently active, it is not an optimal tool to use for debugging or alerting. The PerfMon user interface also hasn’t changed much over the years. 

UDP Connection Count

This means that we will have to look at other options, such as Netstat:

Netstat

Netstat is a command-line tool that displays very detailed information about your network connections, both incoming and outgoing, routing tables, network interfaces and network protocol statistics.
It is mostly used for finding problems in the network and to determine the amount of traffic on the network as a performance measurement. 

Although Netstat is the perfect tool for looking in real-time at your network connections, you will need some way to graph the Netstat values. Being able to analyze the connection count over time really helps with getting a better understanding of what your servers and applications are doing.

Nagios

As I saw multiple plugins to check network connections with Netstat on Linux hosts, but not on Windows hosts, I decided to write a Powershell script which uses Netstat to monitor your TCP and UDP network connections on Windows hosts.

How to monitor your network connections with Nagios?

  1. Download the latest version of check_ms_win_network_connections on GitHub.
  2. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell.
  3. In the nsclient.ini configuration file, define the script like this:

  4. Make a command in Nagios like this:

  5. Configure your service in Nagios. Make use of the above created command. Configure something similar like this as $ARG1$:

Additional Information

The script initiates a ‘netstat -ano’ , which will display all active network connections with their respective ip addresses, port number and the corresponding process id’s, parse the results and apply the optional filters.
This could of course also be accomplished by just retrieving the ‘\TCPv4Connections Established’ performance countera and it’s UDP variant, but the real strength of the script are it’s parameters. If you think your systems have been compromised by a virus or other malicious software, you can distribute the check_ms_network_connections plugin to all Windows servers and then check your network connections for a given process, port or ip address. This could quickly result in an overview of all impacted systems.

Usage

Because the Powershell command  get-process  doesn’t add file extensions, the -P parameter also does not need it’s file extensions eg ‘.exe’. For example in order to look for all connections made by svchost.exe, the parameters would look like this: -H server.fqdn -P svchost 

Another usage example could be the need to monitor a server that needs a continuous link with another server. By specifying, the -wl and -cl parameters like this -H server.fqdn -wl 2 -cl 0 -wh 10 -ch 15  , you should get a warning alert when the amount of TCP connections drops below 2 and a critical alert when there is no TCP connection with the remote server.
Please note that when using different filter parameters, ‘or’ is used, not ‘and’. So if any of the filters apply’s, the connection should be added. 

If you don’t want to filter on IP address or port, I suggest you use the ‘-c’ parameter, which improves performance a lot. If you are running the plugin on a server with a very high amount of connections, I also suggest using the -c parameter.
The ‘-c’ parameter will execute  (netstat -abn -proto TCP).count which is way faster then having to loop through each individual connection. It does imply you will get less information, as it only counts the active TCP connections.

Results

The result of using Nagios XI to monitor your network connections looks like this:

Monitoring Network Connections Nagios

TIG

A third option is to use a TIG stack, which will use Telegraf to query the counters from PerfMon and sends them to an InfluxDB time series database. Visualization is done with Grafana.

The Telegraf agent configuration file needs this input:

TIG Network Connections

Grafana allows you to create a query which will show all values for all hosts with a certain tag. With the help of templates, it becomes very easy to create beautiful graphs with filterable, sortable min, max, avg and current values o all your network connections counters. And this with a one second granular interval.

TIG-Windows-Network-Connections-Top-Avg

A disadvantage of using Telegraf is that you are limited to using PerfMon counters. This means it’s not possible to get the UDP connection count. There seems to be a way to execute Powershell scripts with telegraf, but my guess is that the resulting load will be too high to execute this with a one second interval.

Final Words

As you can seen there are multiple options to monitor your Windows network connections. I’ll try to extend this documentation with some alerting examples.

check-ms-win-disk-load-graph-01

Monitoring Windows Disk Load

Introduction

I rolled out check_diskstat on our Linux servers in September 2014  and really missed a similar plugin for monitoring disk load on Windows servers. Hence, I started thinking about a new Powershell script, which would use the Powershell command ‘get-counter’, to gather all disk related information from the Performance Monitor. I started with making a list of the requirements:

  • The main requirement was that it had to be multilingual, as I work on English and Dutch versions of Windows Server 2003, 2003 R2, 2008 and 2008 R2. 
  • Another requirement was that the script had to allow an argument that specifies the amount of samples over which an average could be calculated.
  • The perfdata output should be outputted in a way where all disk load related values had to be visible in a graph. I had to deal with very high values, eg 8763098004 and very small decimals, eg 0,00014. This implied I had to find some way to make it visually attractive and correct in Highcharts, for example by outputting in milliseconds instead of seconds or megabytes instead of bytes.
  • The plugin also had to work culture independent. Some culture use ‘,’ and other use ‘.’ as decimal. I solved this by replacing [System.Threading.Thread]::CurrentThread.CurrentCulture with ‘en-US’ ans setting it back to the original value once I’m done.

Monitoring disk load may be useful in finding the cause of performance issues. If a component of an application starts writing huge logs or big amounts of data in a database on your Windows disks, a bottleneck could be created in your application’s flow. This bottleneck could quickly result in any kind of lag, latency or slowness for end-users, resulting in more incidents, calls or complaints. An integral part of the job as monitoring engineer, is to avoid  situations as described above. Here Nagios can help you, by alerting you before applications start getting slow. Up until now, the only way to monitor performance counters for Windows servers, was using an agent like NSClient++ (or NCPA?) to retrieve one performance counter. My check_ms_windows_disk_load plugin enables you to combine several disk load related performance counters with only one service. This method has several advantages:

  • You don’t need to worry what counters to monitor. The plugin will do that for you.
  • As the plugin monitors 8 performance counters, and you only need one service, this would save you 7 services for each disk. So your Nagios server has less work, which enables you to monitor other stuff instead or increase the monitor interval on your checks.
  • As you can pass maxsamples (-ms or –MaxSamples) as a parameter, you can choose yourself how long you want the plugin to run before calculating averages. Each sample should be one second.

You could also prove to your application engineers that the storage is or is not the cause of their application’s performance. You can use comprehensive graphs visualizing a collection of disk performance related information. You also need knowledge about your disk load in order to choose the right disk type for the job. Are your 3TB SATA disks strong enough to handle the job or will you have to buy more expensive SSD’s to achieve the performance you need?

How to monitor your disk load?

  1. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell.
  2. In the nsclient.ini configuration file, define the script like this:
  3. Make a command in Nagios like this:
  4. Configure your service in Nagios. Make use of the above created command. Configure something similar like this as $ARG1$:

Examples:

One day after everything is configured correctly, your Highcharts graphs should look like this:

disk load graph 01

If you want to test the load on your Windows disks, you can use this Storage Load Generator DiskSPD from Microsoft to play. (Yes Microsoft has a GitHub account!!)

I hope this plugin can help you monitor the disk load on your Windows hosts. Please rate it on the Nagios Exchange if you like my work.

Monitoring Microsoft Exchange 2010 Mailbox

 
 

Error: Your Requested widget " wp-github-commits-6" is not in the widget list.
  • [do_widget_area sidebar-1]
    • [do_widget_area sidebar-2]
      • [do_widget id="tag_cloud-2"]
    • [do_widget_area widgets_for_shortcodes]
      • [do_widget id="wp-github-commits-14"]
      • [do_widget id="wp-github-commits-17"]
      • [do_widget id="wp-github-commits-15"]
      • [do_widget id="wp-github-commits-25"]
      • [do_widget id="wp-github-commits-3"]
      • [do_widget id="wp-github-commits-13"]
      • [do_widget id="wp-github-commits-20"]
      • [do_widget id="wp-github-commits-16"]
      • [do_widget id="wp-github-commits-2"]
      • [do_widget id="wp-github-commits-24"]
      • [do_widget id="wp-github-commits-23"]
      • [do_widget id="wp-github-commits-21"]
      • [do_widget id="wp-github-commits-5"]
      • [do_widget id="wp-github-commits-6"]
      • [do_widget id="wp-github-commits-26"]
      • [do_widget id="wp-github-commits-9"]
      • [do_widget id="wp-github-commits-22"]
      • [do_widget id="wp-github-commits-19"]
      • [do_widget id="wp-github-commits-8"]
      • [do_widget id="wp-github-commits-7"]
      • [do_widget id="wp-github-commits-18"]
      • [do_widget id="wp-github-commits-11"]
      • [do_widget id="wp-github-commits-4"]
      • [do_widget id="wp-github-commits-12"]
      • [do_widget id="wp-github-commits-27"]
    • [do_widget_area wp_inactive_widgets]
      • [do_widget id="recent-posts-2"]
      • [do_widget id="recent-comments-2"]
     

    Introduction

    I based this scipt on the MS Exchange 2010 DAG Mailbox check by Matt Haynes, but made the following changes:

    • Recovery mailbox databases are excluded. This way we don’t get any false alerts when colleagues are restoring a backup to a temporary recovery database.
    • For MS Exchange, backups are crucial, as your logs will grow very fast when they don’t get backupped. Therefore a warning is generated if the last backup date is over 26 hours and an error is generated if it was over 50 hours. (these time periods can be altered, depending on the total time your backup software requires to backup all your mailbox databases)
    • Performance data is gathered from the amount of mounted, healthy, and unhealthy mailbox databases.
    • If no mailbox databases are found on the Exchange server, an error is also generated.

    I would like to thank my colleague, De Clerck John, who created the exclusion for the recovery mailbox database and the check for last backup. His knowledge of MS Exchange and Powershell seems to be unlimited. As previously said, the script will count all healthy, failed and mounted mailbox databases and output performance data. This way, you can easily see when exactly something went wrong with your mailbox databases:

    check-ms-exchange-2010-health-graph-01

    In this screenshot it appears we had an issue on 28 April. In fact our Exchange administrators are busy migrating the mailbox databases to different MS Exchange servers, which is also clear in the next days of the graph.

    exchange

    How to monitor your MS Exchange databases?

    1. Copy the ‘check_ms_exchange_2010_health.ps1’ Powershell script to the NSClient++ scripts folder, preferably in a sub-folder ‘Powershell’ on all the Exchange 2010 servers you wish to monitor.
    2. In the nsclient.ini configuration file, define the external command like this (and restart the NSClient++ service (nscp) afterwards):
    3. Make a command in Nagios like this:
    4. Configure your service in Nagios. Use the command you previously made. Argument 1 should be the hostname of the MS Exchange server

    Grtz Willem