Infoblox Logstash filter (named, dhcpd and httpd)

Introduction

Infoblox is a DDI (DNS, DHCP, and IP address management solution) which simplifies network management a lot. Over the past 8 years I was able to work with it and never looked into another solution, as it completely fulfills all our DNS and DHCP needs. During that time, I’ve been finetuning my Infoblox Logstash grok patterns and index template mappings. As I didn’t found any existing Infoblox Logstash grok patterns, I decided to make them open source. You can download the Logstash configuration file on GitHub here. There is also a template included with the mappings for Elasticsearch. 

Infoblox Logstash

Infoblox Logging

Thanks to Infoblox, we can:

  • Consolidate DNS, DHCP, IP address management, and other core network services into a single platform, managed from a common console
  • Centrally orchestrate DDI functions across diverse infrastructure
  • Boost IT efficiency and automation by seamlessly integrating with other IT systems (such as Rundeck) through RESTful APIs

Infoblox has integrated reporting & analytics capabilities, but imho DNS and DHCP related logs are on the top priority list for sending to a log aggregator, such as Elasticsearch or NLS. DHCP and DNS logs allow us to link ip addresses to device hostnames and mac addresses. As ip addresses are logged everywhere, this is a vital log source in order to trace what happened by who on your network. A good Logstash filter is able to parse all the relevant fields, so they can be used in aggregations. 

Infoblox Logstash Configuration

Please note hat I’m not using a syslog input, but a tcp input. I’ve had considerable issue with the default syslog patterns used by Elasticsearch.  Apart from that I prefer to apply my own field names for syslog data. Using my own custom syslog grok pattern allows me to match the parsed field to our internally used naming conventions. Feel free to adjust the field names as needed.

 

 

 

 

Realmd and SSSD Active Directory Authentication

Introduction to SSSD and Realmd

Starting from Red Hat 7 and CentOS 7, SSSD or ‘System Security Services Daemon’  and realmd have been introduced. SSSD’s main function is to access a remote identity and authentication resource through a common framework that provides caching and offline support to the system. SSSD provides PAM and NSS integration and a database to store local users, as well as core and extended user data retrieved from a central server. 

The main reason to transition from Winbind to SSSD is that SSSD can be used for both direct and indirect integration and allows to switch from one integration approach to another without significant migration costs. The most convenient way to configure SSSD or Winbind in order to directly integrate a Linux system with AD is to use the realmd service. Because it allows callers to configure network authentication and domain membership in a standard way. The realmd service automatically discovers information about accessible domains and realms and does not require advanced configuration to join a domain or realm.

The realmd system provides a clear and simple way to discover and join identity domains. It does not connect to the domain itself but configures underlying Linux system services, such as SSSD or Winbind, to connect to the domain.

Realmd Pam SSSD

Please read through this Windows integration guide from Red Hat if you want more information. This extensive guide contains a lot of useful information about more complex situations.

Realmd / SSSD Use Cases

How to join an Active Directory domain?

  1. First of all start you will need to install the required packages:
  2. Configure ntp to prevent time sync issues:
  3. Join the server to the domain:
  4. Also add the default domain suffix to the sssd configuration file:

    Add the following beneath [sssd]

  5. Finally move the computer object to an organizational unit in Active Directory.

How to leave an Active Directory domain?

I saw multiple times that although the computer object was created in Active Directory it was still not possible to login with an ad account. The solution was each time to remove the server from the domain and then just add it back.

How to permit only one Active Directory group to logon

As it can be very useful to only allow one Active Directory group. For example a group with Linux system administrators.

 How to give sudo permissions to an Active Directory group

Add

Or

Example sssd.conf Configuration

The following is an example sshd.conf configuration file. I’ve seen it happen once that somehow access_provider was set to ad. I haven’t got the chance to play with that setting, as simple worked almost every time for now.

Required security permissions in AD

A few months ago, we had a problem where some users were no longer able to authenticate. After an extended search we discovered the reason was a hardening change in permissions on some ou’s in our AD. My colleague Jenne and I discovered that the Linux server computer objects need minimal permissions on the ou which contains the users that want to authenticate on your Linux servers. After testing almost all obvious permissions, we came to the conclusions that the computer objects need “Read remote access information”!

sssd-permissions-ras

How to debug SSSD and realmd?

The logfile which contains information about successful or failed login attempts is /var/log/secure. It contains information related to authentication and authorization privileges. For example, sshd logs all the messages there, including unsuccessful login. Be sure to check that logfile if you experience problems logging in with an Active Directory user. 

How to clear the SSSD cache?

As suggested by AP in the comments, you can manage your cache with the sss_cache command.  It can be used to clear the cache and update all records:

The sss_cache command can also clear all cached entries for a particular domain:
If the administrator knows that a specific record (user, group, or netgroup) has been updated, then sss_cachecan purge the records for that specific account and leave the rest of the cache intact:

Please refer to the official documentation for more information.

In case the above doesn’t help, you can also remove the cache ‘hte hard way’:

Just wanted to add this command which also helped me in one case somehow. 

Final Words

I hope this guide helps people towards a better Windows Linux integration. Let me know if you think there is a better way to do the above or if you have some useful information you think I should add to this guide.

Greetings.

Willem

Monitoring Network Connections Nagios 2

Monitoring Windows Network Connections

Introduction

Monitoring the network connections on your Windows servers can be crucial to examine server load and investigate bottlenecks and anomalies. There are many ways to monitor your network connections. This blog post will go into detail of some of the tools that can be used to achieve optimal monitoring of your Windows network connections.

How To monitor your Windows Network Connections?

PerfMon

In the Windows Performance Monitor, you can find several counters for all kinds network connections. This set of counters is available for TCPv4 and TCPv6 connections.

Counter NameCounter Description
Connection FailuresConnection Failures is the number of times TCP connections have made a direct transition to the CLOSED state from the SYN-SENT state or the SYN-RCVD state, plus the number of times TCP connections have made a direct transition to the LISTEN state from the SYN-RCVD state.
Connections ActiveConnections Active is the number of times TCP connections have made a direct transition to the SYN-SENT state from the CLOSED state. In other words, it shows a number of connections which are initiated by the local computer. The value is a cumulative total.
Connections EstablishedConnections Established is the number of TCP connections for which the current state is either ESTABLISHED or CLOSE-WAIT.
Connections PassiveConnections Passive is the number of times TCP connections have made a direct transition to the SYN-RCVD state from the LISTEN state. In other words, it shows a number of connections to the local computer, which are initiated by remote computers. The value is a cumulative total.
Connections ResetConnections Reset is the number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state.
Segments Received/secSegments Received/sec is the rate at which segments are received, including those received in error. This count includes segments received on currently established connections.
Segments Retransmitted/secSegments Retransmitted/sec is the rate at which segments are retransmitted, that is, segments transmitted containing one or more previously transmitted bytes.
Segments Sent/secSegments Sent/sec is the rate at which segments are sent, including those on current connections, but excluding those containing only retransmitted bytes.
Segments/secSegments/sec is the rate at which TCP segments are sent or received using the TCP protocol.

At the moment there seems to be no Performance Monitor counter available  in Windows to show the UDP connection count.  Although the Windows Performance Monitor is an easy choice to have a quick glance at how many TCP connections are currently active, it is not an optimal tool to use for debugging or alerting. The PerfMon user interface also hasn’t changed much over the years. 

UDP Connection Count

This means that we will have to look at other options, such as Netstat:

Netstat

Netstat is a command-line tool that displays very detailed information about your network connections, both incoming and outgoing, routing tables, network interfaces and network protocol statistics.
It is mostly used for finding problems in the network and to determine the amount of traffic on the network as a performance measurement. 

Although Netstat is the perfect tool for looking in real-time at your network connections, you will need some way to graph the Netstat values. Being able to analyze the connection count over time really helps with getting a better understanding of what your servers and applications are doing.

Nagios

As I saw multiple plugins to check network connections with Netstat on Linux hosts, but not on Windows hosts, I decided to write a Powershell script which uses Netstat to monitor your TCP and UDP network connections on Windows hosts.

How to monitor your network connections with Nagios?

  1. Download the latest version of check_ms_win_network_connections on GitHub.
  2. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell.
  3. In the nsclient.ini configuration file, define the script like this:

  4. Make a command in Nagios like this:

  5. Configure your service in Nagios. Make use of the above created command. Configure something similar like this as $ARG1$:

Additional Information

The script initiates a ‘netstat -ano’ , which will display all active network connections with their respective ip addresses, port number and the corresponding process id’s, parse the results and apply the optional filters.
This could of course also be accomplished by just retrieving the ‘\TCPv4Connections Established’ performance countera and it’s UDP variant, but the real strength of the script are it’s parameters. If you think your systems have been compromised by a virus or other malicious software, you can distribute the check_ms_network_connections plugin to all Windows servers and then check your network connections for a given process, port or ip address. This could quickly result in an overview of all impacted systems.

Usage

Because the Powershell command  get-process  doesn’t add file extensions, the -P parameter also does not need it’s file extensions eg ‘.exe’. For example in order to look for all connections made by svchost.exe, the parameters would look like this: -H server.fqdn -P svchost 

Another usage example could be the need to monitor a server that needs a continuous link with another server. By specifying, the -wl and -cl parameters like this -H server.fqdn -wl 2 -cl 0 -wh 10 -ch 15  , you should get a warning alert when the amount of TCP connections drops below 2 and a critical alert when there is no TCP connection with the remote server.
Please note that when using different filter parameters, ‘or’ is used, not ‘and’. So if any of the filters apply’s, the connection should be added. 

If you don’t want to filter on IP address or port, I suggest you use the ‘-c’ parameter, which improves performance a lot. If you are running the plugin on a server with a very high amount of connections, I also suggest using the -c parameter.
The ‘-c’ parameter will execute  (netstat -abn -proto TCP).count which is way faster then having to loop through each individual connection. It does imply you will get less information, as it only counts the active TCP connections.

Results

The result of using Nagios XI to monitor your network connections looks like this:

Monitoring Network Connections Nagios

TIG

A third option is to use a TIG stack, which will use Telegraf to query the counters from PerfMon and sends them to an InfluxDB time series database. Visualization is done with Grafana.

The Telegraf agent configuration file needs this input:

TIG Network Connections

Grafana allows you to create a query which will show all values for all hosts with a certain tag. With the help of templates, it becomes very easy to create beautiful graphs with filterable, sortable min, max, avg and current values o all your network connections counters. And this with a one second granular interval.

TIG-Windows-Network-Connections-Top-Avg

A disadvantage of using Telegraf is that you are limited to using PerfMon counters. This means it’s not possible to get the UDP connection count. There seems to be a way to execute Powershell scripts with telegraf, but my guess is that the resulting load will be too high to execute this with a one second interval.

Final Words

As you can seen there are multiple options to monitor your Windows network connections. I’ll try to extend this documentation with some alerting examples.

Real-time Eventlog Monitoring with Nagios and NSClient++

Introduction to real-time eventlog monitoring

NSClient++ has a very powerful component that enables you to achieve real-time eventlog monitoring on Windows systems. This feature requires passive monitoring of Windows eventlogs via NSCA or NRDP.

The biggest benefits of real-time eventlog monitoring are:

  • It can help you find problems faster (real-time), as NSClient++ will send the events with NSCA the moment it occurs.
  • It is much more resource efficient then using active checks for monitoring eventlogs. It actually requires fewer resources on both the Nagios server, as on the client where NSClient is running!
  • There is no need to search through every application’s documentation, as you can just catch all the errors and filter them out if not needed.

The biggest drawbacks of real-time eventlog monitoring are:

  • As it are passive services, new events will overwrite the previous event, which could cause you to miss a problem on your Nagios dashboards. 
  • You need  a dedicated database table to store the real-time eventlog exclusions. 
  • You will need some basic scripting skills to automate building the real-time eventlog exclusion string in the NSClient configuration file.

General requirements for using real-time eventlog monitoring

NSCA Configuration of your NSClient++

As NSClient++’s real-time eventlog monitoring component will send the events passively to you Nagios server, you will need to setup NSCA. Please read through this documentation for configuring NSCA in NSClient++.

NSCA Configuration of your Nagios server

NSCA also requires some configuration on your Nagios server. Please read through this documentation for configuring NSCA in Nagios Core or this documentation for configuring NSCA in Nagios XI.

Passive services for each Windows host on your Nagios server

Each Windows host needs at least one passive service, which is able to accept the filtered Windows eventlogs. You can make as much of them as you require. I choose to use one for all application eventlog errors and one for all system eventlog errors:

Real-Time Eventlog Monitoring Passive Services

A database to store your real-time eventlog exclusions

If you want to generate a real-time eventlog exclusion filter, you need to somehow store a combination of hostnames, event id’s and event sources. We are using MSSQL at the moment and generate the exclusions with Powershell. This database needs at least a servername, eventlog, eventid, eventsource and comment column. The combination of those allow you to make an exclusion for almost any type of Windows event.

Real-time Eventlog Monitoring Exclusion Database

Some sort of automation software which can be called with a Nagios XI quick action

Thanks to Nagios XI quick actions, you can quickly exclude noisy events by updating the NSClient++ configuration file with the correct filter. With the correct customization and scripts, this allows you to create a self-learning system. For this to work, you basically need one script which will store a new real-time eventlog exclusion in a database and another which generates the NSClient++ configuration file with the latest combination of real-time eventlog exclusions. We are using Rundeck, a free and open source automation tool to execute the above jobs.

Detailed NSClient ++ configuration

Minimal nsclient.ini ‘modules’ settings:

Minimal nsclient.ini ‘NSCA’ settings:

The above configuration doesn’t use any encryption. Once your tests work out, I advise you to configure some sort of encryption to prevent hackers from sniffing your NSCA packets. Please note that at this moment (31/05/17) the official Nagios NSCA project does not support aes, only Rijndael. This GitHub issue has been created to fix this problem. You’ll have to use one of the other less strong encryption methods at the moment.

Example nsclient.ini ‘eventlog’ settings:

This is an example configuration for getting real-time eventlog monitoring to work. Please note that this has been tested on NSClient++ 0.5.1.28. I’m not 100 % sure it works on earlier versions.

The above configuration template is just an example. As you can see it contains a DUMMYAPPLICATIONFILTER and a DUMMYSYSTEMFILTER. You can easily replace these with the generated exclusion filter. A few examples of how such a filter might look:

(id NOT IN (1,3,10,12,13,23,26,33,37,38,58,67,101,103,104,107,108,110,112,274,502,511,1000,1002,1004,1005,1009,1010,1026,1027,1053,1054,1085,1101,1107,1116,1301,1325,1334,1373,1500,1502,1504,1508,1511,1515,1521,1533)) AND (id NOT IN (1509) OR source NOT IN ('Userenv')) AND (id NOT IN (1055) OR source NOT IN ('Userenv')) AND (id NOT IN (1030) OR source NOT IN ('Userenv')) AND (id NOT IN (1006) OR source NOT IN ('Userenv')) 

Or

(id NOT IN (1,3,4,5,8,9,10,11,12,15,19,27,37,39,50,54,56,137,1030,1041,1060,1066,1069,1071,1111,1196,3621,4192,4224,4243,4307,5722,5723)) AND (id NOT IN (36888) OR source NOT IN ('Schannel')) AND (id NOT IN (36887) OR source NOT IN ('Schannel')) AND (id NOT IN (36874) OR source NOT IN ('Schannel')) AND (id NOT IN (36870) OR source NOT IN ('Schannel')) AND (id NOT IN (12292) OR source NOT IN ('VSS')) AND (id NOT IN (7030) OR source NOT IN ('ServiceControlManager')) 

Only errors which are not filtered by the real-time eventlog filters such as the examples above will be sent to your Nagios passive services.

Multiple NSCA Targets

This is an nsclient.ini config file where two NSCA targets are defined. This can be useful in scenarios where a backup Nagios server needs to be identical as the primary Nagios server:

How to generate errors in your Windows eventlogs?

In order to test, you will need a way to debug and hence a way to generate errors with specific sources or id’s. You can do this very easily with Powershell:

If you get an error saying that the source passed with the above command does not exist, you can create it like this:

Or another way:

(Almost) Final Words

As I can hear some people think “why don’t you post the code to generate the real-time eventlog exclusion filter?”. Well, the answer is simple, I don’t have the time to clean up all the code, so it doesn’t contain any sensitive information. But as a special gift for all my blog readers who got to the end of this post, I’ll post a snippet of the exclusion generating Powershell code here. The rest you will have to make your self for now.

I will open the comments section for now, but please only use it for constructive information. 

Grtz

Willem