Realmd and SSSD Active Directory Authentication

Introduction to SSSD and Realmd

Starting from Red Hat 7 and CentOS 7, SSSD or ‘System Security Services Daemon’  and realmd have been introduced. SSSD’s main function is to access a remote identity and authentication resource through a common framework that provides caching and offline support to the system. SSSD provides PAM and NSS integration and a database to store local users, as well as core and extended user data retrieved from a central server. 

The main reason to transition from Winbind to SSSD is that SSSD can be used for both direct and indirect integration and allows to switch from one integration approach to another without significant migration costs. The most convenient way to configure SSSD or Winbind in order to directly integrate a Linux system with AD is to use the realmd service. Because it allows callers to configure network authentication and domain membership in a standard way. The realmd service automatically discovers information about accessible domains and realms and does not require advanced configuration to join a domain or realm.

The realmd system provides a clear and simple way to discover and join identity domains. It does not connect to the domain itself but configures underlying Linux system services, such as SSSD or Winbind, to connect to the domain.

Realmd Pam SSSD

Please read through this Windows integration guide from Red Hat if you want more information. This extensive guide contains a lot of useful information about more complex situations.

Realmd / SSSD Use Cases

How to join an Active Directory domain?

  1. First of all start you will need to install the required packages:
  2. Configure ntp to prevent time sync issues:
  3. Join the server to the domain:
  4. Also add the default domain suffix to the sssd configuration file:

    Add the following beneath [sssd]

  5. Finally move the computer object to an organizational unit in Active Directory.

How to leave an Active Directory domain?

I saw multiple times that although the computer object was created in Active Directory it was still not possible to login with an ad account. The solution was each time to remove the server from the domain and then just add it back.

How to permit only one Active Directory group to logon

As it can be very useful to only allow one Active Directory group. For example a group with Linux system administrators.

 How to give sudo permissions to an Active Directory group

Add

Or

Example sssd.conf Configuration

The following is an example sshd.conf configuration file. I’ve seen it happen once that somehow access_provider was set to ad. I haven’t got the chance to play with that setting, as simple worked almost every time for now.

Required security permissions in AD

A few months ago, we had a problem where some users were no longer able to authenticate. After an extended search we discovered the reason was a hardening change in permissions on some ou’s in our AD. My colleague Jenne and I discovered that the Linux server computer objects need minimal permissions on the ou which contains the users that want to authenticate on your Linux servers. After testing almost all obvious permissions, we came to the conclusions that the computer objects need “Read remote access information”!

sssd-permissions-ras

How to debug SSSD and realmd?

The logfile which contains information about successful or failed login attempts is /var/log/secure. It contains information related to authentication and authorization privileges. For example, sshd logs all the messages there, including unsuccessful login. Be sure to check that logfile if you experience problems logging in with an Active Directory user. 

How to clear the SSSD cache?

As suggested by AP in the comments, you can manage your cache with the sss_cache command.  It can be used to clear the cache and update all records:

The sss_cache command can also clear all cached entries for a particular domain:
If the administrator knows that a specific record (user, group, or netgroup) has been updated, then sss_cachecan purge the records for that specific account and leave the rest of the cache intact:

Please refer to the official documentation for more information.

In case the above doesn’t help, you can also remove the cache ‘hte hard way’:

Just wanted to add this command which also helped me in one case somehow. 

Final Words

I hope this guide helps people towards a better Windows Linux integration. Let me know if you think there is a better way to do the above or if you have some useful information you think I should add to this guide.

Greetings.

Willem

Real-time Eventlog Monitoring with Nagios and NSClient++

Introduction to real-time eventlog monitoring

NSClient++ has a very powerful component that enables you to achieve real-time eventlog monitoring on Windows systems. This feature requires passive monitoring of Windows eventlogs via NSCA or NRDP.

The biggest benefits of real-time eventlog monitoring are:

  • It can help you find problems faster (real-time), as NSClient++ will send the events with NSCA the moment it occurs.
  • It is much more resource efficient then using active checks for monitoring eventlogs. It actually requires fewer resources on both the Nagios server, as on the client where NSClient is running!
  • There is no need to search through every application’s documentation, as you can just catch all the errors and filter them out if not needed.

The biggest drawbacks of real-time eventlog monitoring are:

  • As it are passive services, new events will overwrite the previous event, which could cause you to miss a problem on your Nagios dashboards. 
  • You need  a dedicated database table to store the real-time eventlog exclusions. 
  • You will need some basic scripting skills to automate building the real-time eventlog exclusion string in the NSClient configuration file.

General requirements for using real-time eventlog monitoring

NSCA Configuration of your NSClient++

As NSClient++’s real-time eventlog monitoring component will send the events passively to you Nagios server, you will need to setup NSCA. Please read through this documentation for configuring NSCA in NSClient++.

NSCA Configuration of your Nagios server

NSCA also requires some configuration on your Nagios server. Please read through this documentation for configuring NSCA in Nagios Core or this documentation for configuring NSCA in Nagios XI.

Passive services for each Windows host on your Nagios server

Each Windows host needs at least one passive service, which is able to accept the filtered Windows eventlogs. You can make as much of them as you require. I choose to use one for all application eventlog errors and one for all system eventlog errors:

Real-Time Eventlog Monitoring Passive Services

A database to store your real-time eventlog exclusions

If you want to generate a real-time eventlog exclusion filter, you need to somehow store a combination of hostnames, event id’s and event sources. We are using MSSQL at the moment and generate the exclusions with Powershell. This database needs at least a servername, eventlog, eventid, eventsource and comment column. The combination of those allow you to make an exclusion for almost any type of Windows event.

Real-time Eventlog Monitoring Exclusion Database

Some sort of automation software which can be called with a Nagios XI quick action

Thanks to Nagios XI quick actions, you can quickly exclude noisy events by updating the NSClient++ configuration file with the correct filter. With the correct customization and scripts, this allows you to create a self-learning system. For this to work, you basically need one script which will store a new real-time eventlog exclusion in a database and another which generates the NSClient++ configuration file with the latest combination of real-time eventlog exclusions. We are using Rundeck, a free and open source automation tool to execute the above jobs.

Detailed NSClient ++ configuration

Minimal nsclient.ini ‘modules’ settings:

Minimal nsclient.ini ‘NSCA’ settings:

The above configuration doesn’t use any encryption. Once your tests work out, I advise you to configure some sort of encryption to prevent hackers from sniffing your NSCA packets. Please note that at this moment (31/05/17) the official Nagios NSCA project does not support aes, only Rijndael. This GitHub issue has been created to fix this problem. You’ll have to use one of the other less strong encryption methods at the moment.

Example nsclient.ini ‘eventlog’ settings:

This is an example configuration for getting real-time eventlog monitoring to work. Please note that this has been tested on NSClient++ 0.5.1.28. I’m not 100 % sure it works on earlier versions.

The above configuration template is just an example. As you can see it contains a DUMMYAPPLICATIONFILTER and a DUMMYSYSTEMFILTER. You can easily replace these with the generated exclusion filter. A few examples of how such a filter might look:

(id NOT IN (1,3,10,12,13,23,26,33,37,38,58,67,101,103,104,107,108,110,112,274,502,511,1000,1002,1004,1005,1009,1010,1026,1027,1053,1054,1085,1101,1107,1116,1301,1325,1334,1373,1500,1502,1504,1508,1511,1515,1521,1533)) AND (id NOT IN (1509) OR source NOT IN ('Userenv')) AND (id NOT IN (1055) OR source NOT IN ('Userenv')) AND (id NOT IN (1030) OR source NOT IN ('Userenv')) AND (id NOT IN (1006) OR source NOT IN ('Userenv')) 

Or

(id NOT IN (1,3,4,5,8,9,10,11,12,15,19,27,37,39,50,54,56,137,1030,1041,1060,1066,1069,1071,1111,1196,3621,4192,4224,4243,4307,5722,5723)) AND (id NOT IN (36888) OR source NOT IN ('Schannel')) AND (id NOT IN (36887) OR source NOT IN ('Schannel')) AND (id NOT IN (36874) OR source NOT IN ('Schannel')) AND (id NOT IN (36870) OR source NOT IN ('Schannel')) AND (id NOT IN (12292) OR source NOT IN ('VSS')) AND (id NOT IN (7030) OR source NOT IN ('ServiceControlManager')) 

Only errors which are not filtered by the real-time eventlog filters such as the examples above will be sent to your Nagios passive services.

Multiple NSCA Targets

This is an nsclient.ini config file where two NSCA targets are defined. This can be useful in scenarios where a backup Nagios server needs to be identical as the primary Nagios server:

How to generate errors in your Windows eventlogs?

In order to test, you will need a way to debug and hence a way to generate errors with specific sources or id’s. You can do this very easily with Powershell:

If you get an error saying that the source passed with the above command does not exist, you can create it like this:

Or another way:

(Almost) Final Words

As I can hear some people think “why don’t you post the code to generate the real-time eventlog exclusion filter?”. Well, the answer is simple, I don’t have the time to clean up all the code, so it doesn’t contain any sensitive information. But as a special gift for all my blog readers who got to the end of this post, I’ll post a snippet of the exclusion generating Powershell code here. The rest you will have to make your self for now.

I will open the comments section for now, but please only use it for constructive information. 

Grtz

Willem

Monitoring Microsoft Windows Updates

Introduction

Monitoring WSUS updates on Microsoft Windows Server is critical to ensure you get alerted when your systems need to be patched. The process to update Windows Updates on high priority servers implies proper planning to ensure no post-installation problems. If we could trust Microsoft patches for 100 %, installing WSUS updates on a system would be done the moment a maintenance schedule could be created for this system. Unfortunately in my personal experience, WSUS updates are more a cause of problems instead of a solution. That’s why we prefer to not install them too fast, as you might experience major issues with your production systems or with the software that is running on it. A recent example, a colleague accidentally patched some production SharePoint servers, which prohibited the creation of new sitecollections and caused issues with some icons. The only solution was to restore a backup…

Ideally the updates would first need to get tested on QA systems. If the QA servers are running for some times without issues, the production systems can get patched. The above is one of the reasons I spent some time combining the best features from the available Windows Update plugins on the Nagios Exchange.
Such as Christian Kaufmann’s idea to cache the list of Windows Updates into a file. This results in a much lower performance impact of the plugin on the servers you are monitoring. If you have any experience with WSUS updates, you will have noticed that the ‘TrustedInstaller.exe” process which is a MS Windows system process that takes care of querying the WSUS server and installing updates if requested. 

The plugin will count all available WSUS updates and output the count in every possible state. However it will only alert in case a set number of days have passed since the last successful update was installed. By using this method, you can then define a policy and agree to patch all systems which had no updates for a certain time. You could use different policies for QA and PR (production) systems to prevent problems. 

WSUS

 

Details

Some things you need to know about Windows Updates. Microsoft saves the date of the ‘last successful update’ in the registry. The location of the String Value is:

This date however is saved in the Greenwich Mean Time (GMT) or the Coordinated Universal Time (UTC) format. My plugin will try to translate this time to the local time format with the help of a function called Get-LocalTime. This function uses the [System.TimeZoneInfo] .NET class which is only usable if you have .NET 3.5 or higher. So keep in mind the ‘Last Successful Update’ date is in UTC format for servers where .NET 3.5 or higher is not installed.

The plugin will also check this registry key:

And give a warning if the system has a required reboot pending.

PSWindowsUpdate

Starting from Windows 10, Microsoft apparently decided to no longer make use of the above registry key. The only way I found to retrieve the last successful update date and time is with the help of the PSWindowsUpdate module. So I added another argument which allows you to select a different method named ‘PSWindowsUpdate’ to retrieve the necessary information. Please not that the default method is still the original method, I called ‘UpdateSearcher”

In order for this method to work, you will need to install the PSWindowsUpdate module in this location: C:\Windows\System32\WindowsPowerShell\v1.0\Modules. If you are using Powershell 5 you can just do:

I’ve included the 1.5.1.11 and 1.5.2 version of the module in the GitHub repository. Or you can download it on the Microsoft Script Center Repository.

How to monitor your WSUS updates?

  1. Please note that the default DaysBeforeWarning and DaysBeforeCritical parameters are set to 120 and 150. Feel free to adjust them as required or pass them as an argument.
  2. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell.
  3. In the nsclient.ini configuration file, define the script like this:
  4. Make a command in Nagios like this:
  5. Configure your service in Nagios. Make use of the above created command. Configure something similar like this as $ARG1$:
    QA servers =>

    PR servers =>

  6. If you want to make use of the new ‘PSWindowsUpdate’ method you will need to have an argument like this:

(Almost) Final words

So why did I create another pluging to check WSUS updates? Because I’m using a system which completely automates Windows Update installation with the help of Nagios XI and Rundeck. The existing plugins did not meet my requirements.

Please note that there are several known issues with WSUS on some operating systems. It’s recommended to always update to the latest ‘Windows Update Client’. Please check Windows 8.1 and Windows Server 2012 R2 update history for more information. More specific, when using WIndows Server 2012 R2, you will really want the following KB’s:

  • KB3172614 => “July 2016 update rollup for Windows 8.1 and Windows Server 2012 R2”
  • KB3179574 => “August 2016 update rollup for Windows 8.1 and Windows Server 2012 R2”
  • KB3185279 => “September 2016 update rollup for Windows 8.1 and Windows Server 2012 R2”

When you don’t have these update rollup’s, checking  for updates and updating your Windows 2012 R2 systems could go very slow. In our case an update check could take up to 40 minutes instead of 10 seconds. 

Let me know on the Nagios Exchange what you think of my plugin by rating it or submitting a review. Please also consider starring the project on GitHub.