CentOS 7 – An Enterprise Ready Problemless OS

Introduction

It must be about 8 years now since we choose CentOS as our default operating system for Linux servers. A lot has changed since then and it has always been on my to do to write a blog post about it. Karanbir Singh announced the release of CentOS 7.4.1708 on 13/09/17. As with all CentOS 7 components, this release was built from sources hosted at git.centos.org. It also supersedes all previously released content for CentOS Linux 7, and users are highly encouraged to upgrade all systems running CentOS 7. Make sure to read the release notes before upgrading.

One month later, we were able to patch all our CentOS 7 systems and did not run into a single upgrade problem. I would say that merits a big congratulations to the whole CentOS team, and of course also all Red Hat engineers for producing a problemless and stable distribution.

In this post I’ll try to give a general overview of what CentOS is about and why you should choose for this partcular operating system.

centos 7

CentOS Lifecycle

It’s very important to keep an eye on the lifecycles of the operating systems you are managing. Good planning ensures you have enough time to migrate your applications in time before your operating systems are no longer supported. 

CentOS VersionRelease DateFull UpdatesMaintenance Updates
319 March 200420 July 200631 October 2010
49 March 200531 March 200929 February 2012
512 April 200731 January 201431 March 2017
610 July 201110 May 201730 November 2020
77 July 2014Q4 202030 June 2024

Source: https://en.wikipedia.org/wiki/CentOS 

CentOS 7 Repositories

There are three primary CentOS repositories (also known as channels), containing software packages that make up the main CentOS distribution. 

  • base – Contains packages that form CentOS point releases, and gets updated when the actual point release is formally made available in form of ISO images.
  • updates – Contains packages that serve as security, bugfix or enhancement updates, issued between the regular update sets for point releases. 

  • extras – Provides additional packages that may be useful.

  • centosplus – Provides additional packages that extend functionality of existing packages. Please note that this repository is disabled by default. Using this repository is more dangerous than using other CentOS repositories, as it is designed to have several updated packages and it is not really designed to be completely enabled. You should only pick the packages you are looking for and use exclude= and includepkgs= (or exclude= and yum-plugin-priorities) to load only those packages from the centosplus repository. (also check the official centosplus documentation)

CentOS vs Red Hat Enterprise Linux

While CentOS is derived from the Red Hat Enterprise Linux codebase, CentOS and Red Hat Enterprise Linux are distinguished by divergent build environments, QA processes, and, in some editions, different kernels and other open source components. For this reason, the CentOS binaries are not the same as the Red Hat Enterprise Linux binaries. Red Hat Enterprise Linux (RHEL) is actually also open source. But although the code is available for Red Hat users, it is not free to use. Red Hat and the CentOS project announced 7 January 2014 they were actually joining forces.

 CentOSRHEL
License FOSS – GPL and othersCommercial – RedHat EULA
SecuritySELinux, NSS, Linux PAM, firewalld SELinux, NSS, Linux PAM, firewalld
Patches/fixesAs promptly as possible given available project resources.SLA through Red Hat
SupportSelf-support24x7 support through Red Hat
Package managementYumYum
Enterprise package managementSpacewalk / KatelloRed Hat Satellite
ClusteringLinux-HARed Hat Cluster Suite (RHCS)
BootloaderGRUB 2GRUB 2
Graphical user interface (GUI)GNOME 3 / KDE SC 4.10GNOME 3 / KDE SC 4.10
Service managementsystemdsystemd
Storage managementLVM / SSM LVM / SSM
Default file systemXFSXFS
ContainerizationDocker, KubernetesRed Hat OpenShift
Virtual device interface (VDI)SPICESPICE

red hat

There are a lot of advantages in choosing Red Hat 7 over CentOS 7. 

  • Enterprise-level support
  • Access to engineering resources
  • Red Hat’s Customer Portal
  • Certifications
  • Latest features

But choosing Red hat also has some considerable disadvantages:

  • Not free
  • Administration overhead for license management

And yes, I do mention the administration overhead as a problem. This problem might not apply for everyone though. In my case though the process of ordering new Red Hat licenses or prolonging expiring licenses just takes a lot of (unnecessary) time. 

Final words

So I hope my blog post gave you some additional information to make a better informed decision which operating systems are best suited for your use case. If you need professional support, Red Hat is there for you, if you feel comfortable supporting your own Linux servers, follow the CentOS rabbit. 

Monitoring Infoblox DDI

Introduction

Infoblox is a DDI (DNS, DHCP, and IP address management solution) which simplifies network management a lot. Over the past 8 years I was able to work with it and never looked into another solution, as it completely fulfills all our DNS and DHCP needs. During that time, I’ve been finetuning my Infoblox Logstash grok patterns and index template mappings. As I didn’t found any existing Infoblox Logstash grok patterns, I decided to make them open source. You can download the Logstash configuration file on GitHub here. There is also a template included with the mappings for Elasticsearch. 

Infoblox Logstash

Infoblox Logging

Thanks to Infoblox, we can:

  • Consolidate DNS, DHCP, IP address management, and other core network services into a single platform, managed from a common console
  • Centrally orchestrate DDI functions across diverse infrastructure
  • Boost IT efficiency and automation by seamlessly integrating with other IT systems (such as Rundeck) through RESTful APIs

Infoblox has integrated reporting & analytics capabilities, but imho DNS and DHCP related logs are on the top priority list for sending to a log aggregator, such as Elasticsearch or NLS. DHCP and DNS logs allow us to link ip addresses to device hostnames and mac addresses. As ip addresses are logged everywhere, this is a vital log source in order to trace what happened by who on your network. A good Logstash filter is able to parse all the relevant fields, so they can be used in aggregations. 

Infoblox Logstash filter (named, dhcpd and httpd)

Please note hat I’m not using a syslog input, but a tcp input. I’ve had considerable issue with the default syslog patterns used by Elasticsearch.  Apart from that I prefer to apply my own field names for syslog data. Using my own custom syslog grok pattern allows me to match the parsed field to our internally used naming conventions. Feel free to adjust the field names as needed.

 

 

 

 

Monitoring Network Connections Nagios 2

Monitoring Windows Network Connections

Introduction

Monitoring the network connections on your Windows servers can be crucial to examine server load and investigate bottlenecks and anomalies. There are many ways to monitor your network connections. This blog post will go into detail of some of the tools that can be used to achieve optimal monitoring of your Windows network connections.

How To monitor your Windows Network Connections?

PerfMon

In the Windows Performance Monitor, you can find several counters for all kinds network connections. This set of counters is available for TCPv4 and TCPv6 connections.

Counter NameCounter Description
Connection FailuresConnection Failures is the number of times TCP connections have made a direct transition to the CLOSED state from the SYN-SENT state or the SYN-RCVD state, plus the number of times TCP connections have made a direct transition to the LISTEN state from the SYN-RCVD state.
Connections ActiveConnections Active is the number of times TCP connections have made a direct transition to the SYN-SENT state from the CLOSED state. In other words, it shows a number of connections which are initiated by the local computer. The value is a cumulative total.
Connections EstablishedConnections Established is the number of TCP connections for which the current state is either ESTABLISHED or CLOSE-WAIT.
Connections PassiveConnections Passive is the number of times TCP connections have made a direct transition to the SYN-RCVD state from the LISTEN state. In other words, it shows a number of connections to the local computer, which are initiated by remote computers. The value is a cumulative total.
Connections ResetConnections Reset is the number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state.
Segments Received/secSegments Received/sec is the rate at which segments are received, including those received in error. This count includes segments received on currently established connections.
Segments Retransmitted/secSegments Retransmitted/sec is the rate at which segments are retransmitted, that is, segments transmitted containing one or more previously transmitted bytes.
Segments Sent/secSegments Sent/sec is the rate at which segments are sent, including those on current connections, but excluding those containing only retransmitted bytes.
Segments/secSegments/sec is the rate at which TCP segments are sent or received using the TCP protocol.

At the moment there seems to be no Performance Monitor counter available  in Windows to show the UDP connection count.  Although the Windows Performance Monitor is an easy choice to have a quick glance at how many TCP connections are currently active, it is not an optimal tool to use for debugging or alerting. The PerfMon user interface also hasn’t changed much over the years. 

UDP Connection Count

This means that we will have to look at other options, such as Netstat:

Netstat

Netstat is a command-line tool that displays very detailed information about your network connections, both incoming and outgoing, routing tables, network interfaces and network protocol statistics.
It is mostly used for finding problems in the network and to determine the amount of traffic on the network as a performance measurement. 

Although Netstat is the perfect tool for looking in real-time at your network connections, you will need some way to graph the Netstat values. Being able to analyze the connection count over time really helps with getting a better understanding of what your servers and applications are doing.

Nagios

As I saw multiple plugins to check network connections with Netstat on Linux hosts, but not on Windows hosts, I decided to write a Powershell script which uses Netstat to monitor your TCP and UDP network connections on Windows hosts.

How to monitor your network connections with Nagios?

  1. Download the latest version of check_ms_win_network_connections on GitHub.
  2. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell.
  3. In the nsclient.ini configuration file, define the script like this:

  4. Make a command in Nagios like this:

  5. Configure your service in Nagios. Make use of the above created command. Configure something similar like this as $ARG1$:

Additional Information

The script initiates a ‘netstat -ano’ , which will display all active network connections with their respective ip addresses, port number and the corresponding process id’s, parse the results and apply the optional filters.
This could of course also be accomplished by just retrieving the ‘\TCPv4Connections Established’ performance countera and it’s UDP variant, but the real strength of the script are it’s parameters. If you think your systems have been compromised by a virus or other malicious software, you can distribute the check_ms_network_connections plugin to all Windows servers and then check your network connections for a given process, port or ip address. This could quickly result in an overview of all impacted systems.

Usage

Because the Powershell command  get-process  doesn’t add file extensions, the -P parameter also does not need it’s file extensions eg ‘.exe’. For example in order to look for all connections made by svchost.exe, the parameters would look like this: -H server.fqdn -P svchost 

Another usage example could be the need to monitor a server that needs a continuous link with another server. By specifying, the -wl and -cl parameters like this -H server.fqdn -wl 2 -cl 0 -wh 10 -ch 15  , you should get a warning alert when the amount of TCP connections drops below 2 and a critical alert when there is no TCP connection with the remote server.
Please note that when using different filter parameters, ‘or’ is used, not ‘and’. So if any of the filters apply’s, the connection should be added. 

If you don’t want to filter on IP address or port, I suggest you use the ‘-c’ parameter, which improves performance a lot. If you are running the plugin on a server with a very high amount of connections, I also suggest using the -c parameter.
The ‘-c’ parameter will execute  (netstat -abn -proto TCP).count which is way faster then having to loop through each individual connection. It does imply you will get less information, as it only counts the active TCP connections.

Results

The result of using Nagios XI to monitor your network connections looks like this:

Monitoring Network Connections Nagios

TIG

A third option is to use a TIG stack, which will use Telegraf to query the counters from PerfMon and sends them to an InfluxDB time series database. Visualization is done with Grafana.

The Telegraf agent configuration file needs this input:

TIG Network Connections

Grafana allows you to create a query which will show all values for all hosts with a certain tag. With the help of templates, it becomes very easy to create beautiful graphs with filterable, sortable min, max, avg and current values o all your network connections counters. And this with a one second granular interval.

TIG-Windows-Network-Connections-Top-Avg

A disadvantage of using Telegraf is that you are limited to using PerfMon counters. This means it’s not possible to get the UDP connection count. There seems to be a way to execute Powershell scripts with telegraf, but my guess is that the resulting load will be too high to execute this with a one second interval.

Final Words

As you can seen there are multiple options to monitor your Windows network connections. I’ll try to extend this documentation with some alerting examples.

Real-time Eventlog Monitoring with Nagios and NSClient++

Introduction to real-time eventlog monitoring

NSClient++ has a very powerful component that enables you to achieve real-time eventlog monitoring on Windows systems. This feature requires passive monitoring of Windows eventlogs via NSCA or NRDP.

The biggest benefits of real-time eventlog monitoring are:

  • It can help you find problems faster (real-time), as NSClient++ will send the events with NSCA the moment it occurs.
  • It is much more resource efficient then using active checks for monitoring eventlogs. It actually requires fewer resources on both the Nagios server, as on the client where NSClient is running!
  • There is no need to search through every application’s documentation, as you can just catch all the errors and filter them out if not needed.

The biggest drawbacks of real-time eventlog monitoring are:

  • As it are passive services, new events will overwrite the previous event, which could cause you to miss a problem on your Nagios dashboards. 
  • You need  a dedicated database table to store the real-time eventlog exclusions. 
  • You will need some basic scripting skills to automate building the real-time eventlog exclusion string in the NSClient configuration file.

General requirements for using real-time eventlog monitoring

NSCA Configuration of your NSClient++

As NSClient++’s real-time eventlog monitoring component will send the events passively to you Nagios server, you will need to setup NSCA. Please read through this documentation for configuring NSCA in NSClient++.

NSCA Configuration of your Nagios server

NSCA also requires some configuration on your Nagios server. Please read through this documentation for configuring NSCA in Nagios Core or this documentation for configuring NSCA in Nagios XI.

Passive services for each Windows host on your Nagios server

Each Windows host needs at least one passive service, which is able to accept the filtered Windows eventlogs. You can make as much of them as you require. I choose to use one for all application eventlog errors and one for all system eventlog errors:

Real-Time Eventlog Monitoring Passive Services

A database to store your real-time eventlog exclusions

If you want to generate a real-time eventlog exclusion filter, you need to somehow store a combination of hostnames, event id’s and event sources. We are using MSSQL at the moment and generate the exclusions with Powershell. This database needs at least a servername, eventlog, eventid, eventsource and comment column. The combination of those allow you to make an exclusion for almost any type of Windows event.

Real-time Eventlog Monitoring Exclusion Database

Some sort of automation software which can be called with a Nagios XI quick action

Thanks to Nagios XI quick actions, you can quickly exclude noisy events by updating the NSClient++ configuration file with the correct filter. With the correct customization and scripts, this allows you to create a self-learning system. For this to work, you basically need one script which will store a new real-time eventlog exclusion in a database and another which generates the NSClient++ configuration file with the latest combination of real-time eventlog exclusions. We are using Rundeck, a free and open source automation tool to execute the above jobs.

Detailed NSClient ++ configuration

Minimal nsclient.ini ‘modules’ settings:

Minimal nsclient.ini ‘NSCA’ settings:

The above configuration doesn’t use any encryption. Once your tests work out, I advise you to configure some sort of encryption to prevent hackers from sniffing your NSCA packets. Please note that at this moment (31/05/17) the official Nagios NSCA project does not support aes, only Rijndael. This GitHub issue has been created to fix this problem. You’ll have to use one of the other less strong encryption methods at the moment.

Example nsclient.ini ‘eventlog’ settings:

This is an example configuration for getting real-time eventlog monitoring to work. Please note that this has been tested on NSClient++ 0.5.1.28. I’m not 100 % sure it works on earlier versions.

The above configuration template is just an example. As you can see it contains a DUMMYAPPLICATIONFILTER and a DUMMYSYSTEMFILTER. You can easily replace these with the generated exclusion filter. A few examples of how such a filter might look:

(id NOT IN (1,3,10,12,13,23,26,33,37,38,58,67,101,103,104,107,108,110,112,274,502,511,1000,1002,1004,1005,1009,1010,1026,1027,1053,1054,1085,1101,1107,1116,1301,1325,1334,1373,1500,1502,1504,1508,1511,1515,1521,1533)) AND (id NOT IN (1509) OR source NOT IN ('Userenv')) AND (id NOT IN (1055) OR source NOT IN ('Userenv')) AND (id NOT IN (1030) OR source NOT IN ('Userenv')) AND (id NOT IN (1006) OR source NOT IN ('Userenv')) 

Or

(id NOT IN (1,3,4,5,8,9,10,11,12,15,19,27,37,39,50,54,56,137,1030,1041,1060,1066,1069,1071,1111,1196,3621,4192,4224,4243,4307,5722,5723)) AND (id NOT IN (36888) OR source NOT IN ('Schannel')) AND (id NOT IN (36887) OR source NOT IN ('Schannel')) AND (id NOT IN (36874) OR source NOT IN ('Schannel')) AND (id NOT IN (36870) OR source NOT IN ('Schannel')) AND (id NOT IN (12292) OR source NOT IN ('VSS')) AND (id NOT IN (7030) OR source NOT IN ('ServiceControlManager')) 

Only errors which are not filtered by the real-time eventlog filters such as the examples above will be sent to your Nagios passive services.

Multiple NSCA Targets

This is an nsclient.ini config file where two NSCA targets are defined. This can be useful in scenarios where a backup Nagios server needs to be identical as the primary Nagios server:

How to generate errors in your Windows eventlogs?

In order to test, you will need a way to debug and hence a way to generate errors with specific sources or id’s. You can do this very easily with Powershell:

If you get an error saying that the source passed with the above command does not exist, you can create it like this:

Or another way:

(Almost) Final Words

As I can hear some people think “why don’t you post the code to generate the real-time eventlog exclusion filter?”. Well, the answer is simple, I don’t have the time to clean up all the code, so it doesn’t contain any sensitive information. But as a special gift for all my blog readers who got to the end of this post, I’ll post a snippet of the exclusion generating Powershell code here. The rest you will have to make your self for now.

I will open the comments section for now, but please only use it for constructive information. 

Grtz

Willem

Monitoring Windows Scheduled Tasks

Introduction

Tasks scheduler is a Microsoft Windows component that allows you to schedule programs or scripts to start at pre-defined intervals. There are two major versions of the task scheduler: In version 1.0, definitions and schedules are stored in binary .job files. Every task corresponds to a single action. This plugin will not work on version 1.0 of the task scheduler, which is running on Windows Server 2000 and 2003. In version 2.0, the Windows task scheduler got a redesigned user interface based on Management console. Version 2.0 also supports calendar and event-based triggers, such as starting a task when a particular event is logged to the event log, or when a combination of events has occurred. Also, several tasks that are triggered by the same event can be configured to run either simultaneously or in a pre-determined chained sequence of a series of actions.

Tasks can also be configured to run based on system status such as being idle for a pre-configured amount of time, on startup, logoff, or only during or for a specified time. Other new features are a credential manager to store passwords so they cannot be retrieved easily. Also, scheduled tasks are executed in their own session, instead of the same session as system services or the current user. You can find a list of all task scheduler 2.0 interfaces here.

Requirements

Starting from Windows Powershell 4.0, you can use a whole range of Powershell cmdlets to manage your scheduled tasks with Powershell. This plugin for Nagios does not use these cmdlets, as it has to be Powershell 2.0 compatible. Maybe in a few years, when Powershell 2.0 becomes obsolete, I’ll patch the script to make use of the new cmdlets. You can find the complete list of cmdlets here. Failing tasks will always end with some sort of error code. You can find the complete list of error codes here. This plugin will output the exitcodes for failing tasks in the Nagios service description. Output will also notify you on tasks that are still running. We have multiple Windows servers at work with a growing amount of scheduled tasks and each scheduled task needs to be monitored. With the help of Nagios and this plugin you can find out:

  • How many are running at the same time?
  • How many are failing?
  • How long are they running?
  • Who created them?

Versions

Disabled scheduled tasks are excluded by default from 3.14.12.06. In earlier versions, you had to manually exclude them by excluding them with -EF or -ET. It seemed like a logical decision to exclude disabled tasks by default and was suggested by someone on the Nagios Exchange reviewing the plugin.. Maybe one day I’ll make a switch to include them again if specified. As some scheduled tasks do not need to be monitored, the script enables you to exclude complete folders.

Since v5.13.160614 it is possible to include hidden tasks. Just add the ‘–Hidden 1’ switch to your parameters and your hidden tasks will be monitored.

One of the folders I tend to exclude almost all the time is the “Microsoft” folder. It seems like several tasks in the Microsoft folder tend to fail sometimes. So unless you absolutely need to know the state of every single scheduled task running on your Windows Server, I can advise you to exclude it too. You can find the folder and tasks in this locations: C:\Windows\System32\Tasks
It is possible to include tasks or task folders with the ‘–InclFolders’ and ‘–InclTasks’ parameters. This filter will get applied after the exclude parameter. Please note that including a folder is not recursive. Only tasks in the root of the folder will be included.

Help

This is the help of the plugin, which lists all valid parameters:

You could put every scheduled task  you don’t want to monitor in a separate  folder and exclude it with the -EF parameter. Alternatvely, you can use the -ET parameter to exclude based on name patterns. One quite important thing to know is that in order to exclude or include the root folder, you need to escape the backslash, like this: “\\”.

How to monitor your scheduled tasks?

  1. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell.
  2. In the nsclient.ini configuration file, define the script like this:

    For more information about external scripts configuration, please review the NSClient documentation. You can also consider defining a wrapped script in nsclient.ini to simplify configuration.

  3. Make a command in Nagios like this:

  4. Configure your service in Nagios. Make use of the above created command. Configure something similar like this as $ARG1$:

Some things to consider to make it work:

  • “set-exectionpolicy remotesigned”
  • Nscp service account permissions => Running with local system should suffice, but I had users telling me it only worked with a local admin. I found out that on some NSClient++ versions, more specific version 0.4.3.88 and probably some earlier versions too, the following error occured when running nscp service as local system: “CHECK_NRPE: Invalid packet type received from server”. After filing an issue on the GitHub project page of NSClient++, Michael Medin quickly acknowledged the issue and solved it from version 0.4.3.102, so the plugin should work again as local system.

Examples

If you would run the script in cli from you Nagios plugin folder, this would be the command:

If you would want to exclude one noisy unimportant scheduled task, the command used in cli would look like this:

If you only want the scheduled tasks in the root to be monitored, you can use this command:

This would only give you the scheduled tasks available in the root folder. The output look like this now.

Final Words

It seems the perfdata in the Highcharts graphs sometimes contains decimal numbers (see screenshot), which is kind of strange as I’m sure I only pass rounded numbers. Seems this is related to the way RRD files are working. To reduce the amount of storage space used, NPCD and RRD while average out the data, resulting in decimals, even when you don’t expect them.

This is a small to do list:

  • Add switches to change returned values and output.
  • Add array parameter with exit codes that should be excluded.
  • Test remote execution. In some cases it might be useful to be able to check remotely for failed windows tasks.
  • Include a warning / critical threshold when discovered tasks exceed a certain duration.
  • I was hoping to add some more exit codes to check, which would make failed tasks easier to troubleshoot. You can find the list of scheduled task exit codes here. The constants that begin with SCHED_S_ are success constants, and the constants that begin with SCHED_E_ are error constants.

Screenshots:

These are some screenshots of the Nagios XI Graph Explorer for two of our servers making use of the plugin to monitor scheduled tasks: Tasks 01 check_ms_win_tasks_graph_02 Let me know on the Nagios Exchange what you think of my plugin by rating it or submitting a review. Please also consider starring the project on GitHub.

Willem