Monitoring Network Connections Nagios 2

Monitoring Windows Network Connections

Introduction

Monitoring the network connections on your Windows servers can be crucial to examine server load and investigate bottlenecks and anomalies. There are many ways to monitor your network connections. This blog post will go into detail of some of the tools that can be used to achieve optimal monitoring of your Windows network connections.

How To monitor your Windows Network Connections?

PerfMon

In the Windows Performance Monitor, you can find several counters for all kinds network connections. This set of counters is available for TCPv4 and TCPv6 connections.

Counter NameCounter Description
Connection FailuresConnection Failures is the number of times TCP connections have made a direct transition to the CLOSED state from the SYN-SENT state or the SYN-RCVD state, plus the number of times TCP connections have made a direct transition to the LISTEN state from the SYN-RCVD state.
Connections ActiveConnections Active is the number of times TCP connections have made a direct transition to the SYN-SENT state from the CLOSED state. In other words, it shows a number of connections which are initiated by the local computer. The value is a cumulative total.
Connections EstablishedConnections Established is the number of TCP connections for which the current state is either ESTABLISHED or CLOSE-WAIT.
Connections PassiveConnections Passive is the number of times TCP connections have made a direct transition to the SYN-RCVD state from the LISTEN state. In other words, it shows a number of connections to the local computer, which are initiated by remote computers. The value is a cumulative total.
Connections ResetConnections Reset is the number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state.
Segments Received/secSegments Received/sec is the rate at which segments are received, including those received in error. This count includes segments received on currently established connections.
Segments Retransmitted/secSegments Retransmitted/sec is the rate at which segments are retransmitted, that is, segments transmitted containing one or more previously transmitted bytes.
Segments Sent/secSegments Sent/sec is the rate at which segments are sent, including those on current connections, but excluding those containing only retransmitted bytes.
Segments/secSegments/sec is the rate at which TCP segments are sent or received using the TCP protocol.

At the moment there seems to be no Performance Monitor counter available  in Windows to show the UDP connection count.  Although the Windows Performance Monitor is an easy choice to have a quick glance at how many TCP connections are currently active, it is not an optimal tool to use for debugging or alerting. The PerfMon user interface also hasn’t changed much over the years. 

UDP Connection Count

This means that we will have to look at other options, such as Netstat:

Netstat

Netstat is a command-line tool that displays very detailed information about your network connections, both incoming and outgoing, routing tables, network interfaces and network protocol statistics.
It is mostly used for finding problems in the network and to determine the amount of traffic on the network as a performance measurement. 

Although Netstat is the perfect tool for looking in real-time at your network connections, you will need some way to graph the Netstat values. Being able to analyze the connection count over time really helps with getting a better understanding of what your servers and applications are doing.

Nagios

As I saw multiple plugins to check network connections with Netstat on Linux hosts, but not on Windows hosts, I decided to write a Powershell script which uses Netstat to monitor your TCP and UDP network connections on Windows hosts.

How to monitor your network connections with Nagios?

  1. Download the latest version of check_ms_win_network_connections on GitHub.
  2. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell.
  3. In the nsclient.ini configuration file, define the script like this:

  4. Make a command in Nagios like this:

  5. Configure your service in Nagios. Make use of the above created command. Configure something similar like this as $ARG1$:

Additional Information

The script initiates a ‘netstat -ano’ , which will display all active network connections with their respective ip addresses, port number and the corresponding process id’s, parse the results and apply the optional filters.
This could of course also be accomplished by just retrieving the ‘\TCPv4Connections Established’ performance countera and it’s UDP variant, but the real strength of the script are it’s parameters. If you think your systems have been compromised by a virus or other malicious software, you can distribute the check_ms_network_connections plugin to all Windows servers and then check your network connections for a given process, port or ip address. This could quickly result in an overview of all impacted systems.

Usage

Because the Powershell command  get-process  doesn’t add file extensions, the -P parameter also does not need it’s file extensions eg ‘.exe’. For example in order to look for all connections made by svchost.exe, the parameters would look like this: -H server.fqdn -P svchost 

Another usage example could be the need to monitor a server that needs a continuous link with another server. By specifying, the -wl and -cl parameters like this -H server.fqdn -wl 2 -cl 0 -wh 10 -ch 15  , you should get a warning alert when the amount of TCP connections drops below 2 and a critical alert when there is no TCP connection with the remote server.
Please note that when using different filter parameters, ‘or’ is used, not ‘and’. So if any of the filters apply’s, the connection should be added. 

If you don’t want to filter on IP address or port, I suggest you use the ‘-c’ parameter, which improves performance a lot. If you are running the plugin on a server with a very high amount of connections, I also suggest using the -c parameter.
The ‘-c’ parameter will execute  (netstat -abn -proto TCP).count which is way faster then having to loop through each individual connection. It does imply you will get less information, as it only counts the active TCP connections.

Results

The result of using Nagios XI to monitor your network connections looks like this:

Monitoring Network Connections Nagios

TIG

A third option is to use a TIG stack, which will use Telegraf to query the counters from PerfMon and sends them to an InfluxDB time series database. Visualization is done with Grafana.

The Telegraf agent configuration file needs this input:

TIG Network Connections

Grafana allows you to create a query which will show all values for all hosts with a certain tag. With the help of templates, it becomes very easy to create beautiful graphs with filterable, sortable min, max, avg and current values o all your network connections counters. And this with a one second granular interval.

TIG-Windows-Network-Connections-Top-Avg

A disadvantage of using Telegraf is that you are limited to using PerfMon counters. This means it’s not possible to get the UDP connection count. There seems to be a way to execute Powershell scripts with telegraf, but my guess is that the resulting load will be too high to execute this with a one second interval.

Final Words

As you can seen there are multiple options to monitor your Windows network connections. I’ll try to extend this documentation with some alerting examples.

Rundeck 2.10 – Ultimate Open Source Job scheduler

Rundeck Review

June 2016, Nagios announced they were stopping development on Nagios Reactor. So I had to start looking for a replacement. After playing with Foreman, Jenkins, Rundeck and Stackstorm, I decided the best solution for my needs was definitely Rundeck. In this Rundeck review, I’ll try to go into detail on some of the most useful Rundeck features I’ve been using over the last years.

Rundeck Review

Rundeck was definitely a hidden gem in the open source automation landscape, which has been dominated by configuration management oriented tools, such as Ansible, Chef, Puppet and Salt. But imho we don’t always need full configuration management. Usage of a job scheduler and orchestrator is in a lot of cases a more suitable option. And an added bonus is that Rundeck integrates with Ansible thanks to this plugin.

Rundeck is being very actively developed, meaning they regularely release new features. The nice thing is that they truly listen to their community, by allowing us to vote for popular features in a Trello board. Feel free to create an ccount and vote for the features you think deserve priority development time.

So what if you want professional support? Then you can opt into Rundeck Pro, which has some additional features and pro plugins available. Ok, I hope this Rundeck review helps you take a better informed decision on which automation platform to start using in your digital transformation.

Rundeck Projects and Jobs

Rundeck projects will contain definitions about nodes, as well as a set a jobs that reference these nodes. Using access control policies allows you to choose which teams have access to perform actions on jobs. Each node in the Rundeck project can be customized with tags, allowing you to target each kind of node rather than reference specific hosts names or IP addresses. All these Rundeck features allow you to create job libraries with useful scripts. Integrating The Rundeck access, job and exeecution logs into an Elastic stack gives you full visibility of what’s happening in your Rundeck server.

You can group Rundeck jobs in folders and subfolders. A collapsed view of all jobs in my DAF project:

 

Rundeck Security

Please note I’m just listing a few security related topics in this Rundeck review. Please refer to the official Rundeck documentation for all information you need to setup a secure Rundeck instance.

Active Directory integration

Active Directory integration is a basic requirement for any automation tool. Using Active Directory groups allows you to group users and assign specific permissions to them. Please refer to the official Rundeck documentation if you want more information how to configure this.

Agentless SSH based automation

A critical feature of any automation tool is a way to encrypt it’s traffic. As RunDeck uses SSH for executing commands on nodes, it already has a big advantage over other protocols. SSH is a secure protocol used as the primary means of connecting to Linux servers remotely. When you connect, you will be dropped into a shell session, which is a text-based interface where you can interact with your server. For the duration of your SSH session, any commands that you type into your local terminal are sent through an encrypted tunnel and executed on your server. Clients generally authenticate either using passwords (less secure and not recommended) or SSH keys, which are very secure.

SSL / HTPS

The RunDeck URL also needs to be protected, otherwise attackers could easily sniff your network and extract usernames, passwords, job options and more from api calls or logins. This procedure decribes the steps that need to be taken in order to configure SSL for your RunDeck server. I decided to create my ow version of the official documentation, but it’s only applicable to Microsoft .pfx certificates.

SSL

How to configure SSL for RunDeck?

  • Generate a .pfx server certificate with your private root ca
  • Copy the generated server certificate <servername>.pfx to /etc/rundeck/ssl
  • Create a keystore to hold the server certificate <servername>.pfx

  • Retrieve the alias from the <servername>.pfx file

  • Import the Certificate and Private Key into the Java keystore

  • Create a keystore for the CA certificate

  • Add the CA certificate to the CA keystore

  • Edit /etc/rundeck/ssl/ssl.properties and update all properties with their current values:

  • Edit /etc/rundeck/profile and uncomment:

  • Edit /etc/rundeck/rundeck-config.properties

  • Edit /etc/rundeck/framework.properties

  • Make sure port 4443 is opened in the firewall:

  • Restart the rundeckd daemon

  • Tail the RunDeck logs to make sure everything works fine:

Final words

I’d love to give a big thanks to the Rundeck developers for making Rundeck available to the public. I’m sorry if important stuff is missing in this (basic) Rundeck review, I’ll try to add more information over time. It’s also on my to do to open source my Elastic pipeline configurations, which enable analytics on the access, job and execution logs.

Linux Vulnerabilities Overview

Introduction

Linux is considered to be much more secure then Windows. Over the last years however, several big Linux vulnerabilities were discovered . This definitely doesn’t mean that Linux is suddenly an insecure operating system. What it does mean is that you need to monitor and patch your systems. The same goes of course for Windows server, but I’l try to go into detail about WSUS updates in another post.

When you look at the latest Red Hat security advisories, it becomes very clear that you need to implement a system which automatically installs security updates. Doing this manually on 500+ servers would be crazy and a big waste of time. You also need make sure you always have a recent snapshot or backup in place, preferably right before the time the security updates are installed.

RunDeck allows you to do such a thing. After adding your Linux server as nodes to RunDeck, you can easily schedule a job containing a workflow where a VMware snapshot could be taken after which the installation of the security updates can be started safely.

I’ll try to go over the most famous Linux vulnerabilities and summarize some very basic information abut them.

Heartbleed

Security bug disclosed 01/04/2014 by Neel Mehta (Google) in the OpenSSL cryptography library, qualified as a buffer over-read situation where more data can be read than should be allowed.

  • CVE-2014-0160

Linux vulnerabilities Hearthbleed

Shellshock (Bashdoor)

Everybody must have heard of Heartbleed, discovered 24/09/14 by Stephane Chazelas. Shellshock allows attackers to execute any kind of code, smuggled in environment variables. Anything that invokes the flawed open-source shell and passes in malicious variables, which seems to be surprisingly easy to do, is vulnerable to being hijacked.

Just in case specific CGI scripts are vulnerable, you could use Shellshock Tester or Shellshock Test Tool.

  • CVE-2014-6271
  • CVE-2014-6277
  • CVE-2014-6278
  • CVE-2014-7169
  • CVE-2014-7186
  • CVE-2014-7187

Linux vulnerabilities Shellshock

Ghost

The last critical security flaw to hit the news 16/01/2016 was Ghost. It’s a stack-based buffer overflow in the glibc DNS client-side resolver that puts Linux machines at risk for remote code execution. It was discovered by a Google engineer. The glibc maintainers had previously been alerted of the issue via their bug tracker in July 2015. The issue was solved by a combined effort of two engineers o the Red Hat team, the Google team and the glibc team. Check out the Google blogpost.

  • CVE-2015-7547: glibc getaddrinfo stack-based buffer overflow

Linux vulnerabilities Ghost

Kernel Zero-Day Flaw

19/01/2016 a new critical zero-day Linux vulnerability has been found in the kernel that could allow attackers to gain root privileges. It has been discovered by a research group named Perception Point. The issue was apparently present since 2012 and is the result of a reference leak in the keyrings facility built into Linux. The keyrings facility is a way to encrypt and store login data, encryption keys and certificates and make them available to applications. 

A PoC was released on GitHub with an example exploit code.

  • CVE-2016-0728

Patch your impacted systems against Linux vulnerabilities

Ensure that you are running the latest patch level. If it’s a virtual machine, take a VMware snapshot first, so that in worst case scenario, you can go back.

CentOS / Red Hat / Fedora

Ubuntu / Debian

You can schedule this easily with for example Nagios Reactor. It allows you execute commands over SSH on scheduled intervals. In combination with the VMware snapshot chain, you easily create a robust patching ecosystem. Please note that Nagios reactor is completely free, but is still in beta. It also only seems to work on CentOS 6.

RunDeck

You can use an inline script such as this to start a yum update on your Linux serves:

The job only requires one variable and that I called reboot. This can be set to true or false.

This is a screenshot of the Log Output of a RunDeck job:

DAF Linux Yum