Time is an issue..

It has been some time since I created a new post or updated old posts. Some information on my website might be outdated or no longer relevant. It seems like some other priorities have come up with require my attention. I’m keeping my webste online for now, as some info might still be relevant.

Please always check my GitHub profile for the latest version and information.

Monitoring @ FOSDEM 2018 => Observability, Tracing and the RED method


Yesterday 03/02/18, I went to FOSDEM in Brussels. FOSDEM is a two-day non-commercial event organised by volunteers to promote the widespread use of free and open source software. The goal is to provide free and open source software developers and communities a place to meet in order to:

  • get in touch with other developers and projects
  • be informed about the latest developments in the free software world
  • be informed about the latest developments in the open source world
  • attend interesting talks and presentations on various topics by project leaders and committers
  • to promote the development and benefits of free software and open source solutions

It seems like FOSDEM is getting more popular each year, resulting in rooms and aula’s filling up fast.


Take a look at the Saturday schedule to get an idea of the wide range of topics that were discussed. I’ll try to give a quick overview of the most interesting monitoring related topics I learned about.


Christiny Yen held an interesting lecture about Observability, which is an upcoming term that’s very trending in the devops monitoring world. Wikipedia describes observability as a measure of how well internal states of a system can be inferred from knowledge of its external outputs.

But is there a significant difference between monitoring and observability? It seems like monitoring vendors are quickly updating their websites and promise observability while a few months back, the word was nowhere to be found.

It seems to me like observabiltiy could be in the same list as other system attributes, such as:

  • functionality
  • performance
  • testability
  • debuggability
  • operability
  • maintainability
  • efficiency
  • teachability
  • usability

By making systems observable, we can make it easier to come to decisions and take action, making sure the OODA loop is actually a loop, instead of having a break in it.

Observability OODA Loop

Observability isn’t a substitute for monitoring, nor does it obviate the need for monitoring, they are complementary. Monitoring is best suited to report the overall health of systems. Monitoring, as such, is best limited to key business and systems metrics derived from time-series based instrumentation, known failure modes as well as blackbox tests. Observability, on the other hand, aims to provide highly granular insights into the behavior of systems along with rich context, perfect for debugging purposes. It can be used as a way to better understand system performance and behavior, even during the what can be perceived as “normal” operation of a system. Since it’s impossible to predict every single failure mode a system could potentially run into or predict every possible way in which a system could misbehave, it becomes important that we build systems that can be debugged armed with evidence and not conjecture. 

How can we make microservices observable?

  • Observable systems should emit events, metrics, logs and traces
  • All components (also non-critical) should be instrumented
  • Instrumentation should not be ‘opt-in’, manual or ‘hard to Do’
  • Need to find the right balance of metrics, logs and traces for a given service
  • Using advanced analytics for diagnosing services, such as anomaly detection and forecasting


Another interesting topic, trace buffering with LLTng, was discussed by Julien Desfossez.  Some LLTng features:

  • System-wide insights: LLTng allows to understand the interactions between multiple components of a given system. By producing a unified log of events, LLTng provides great insight into the system’s behavior.
  • High performance: Tracers achieve great performance through a combination of essential techniques such as per-CPU buffering, RCU data structures, a compact and efficient binary trace format, and more.
  • Flexibility by offering multiple modes:
    • Post-processing mode: Gather as much data as possible and take the time to investigate
    • Snapshot mode: When an error is detected, extract the last trace buffer from memory
    • Live mode: Continuously monitor a stream of events
    • Rotation mode: Periodically, or on a specific trigger, process a chunk of trace without stopping the tracing

Tracing tends to be quite expensive. Think long and hard whether the added complexity is warranted. You might be falling into the trap of premature optimisation? Is optimisation that important when you could just scale horizontally?

Tracing to disk with all kernel events enabled can quickly generate huge traces:

  • 54k events/sec on an idle 4-cores laptop, 2.2 MB/sec
  • 2.7M events/sec on a busy 8-cores server, 95 MB/sec

So make sure you watch you available storage closely..

Some service instrumentations Methods

Tom Wilkie gave a presentation about the RED Method, while also talking about the USE Method and the Four Golden Signals. He explained why consisteny is an important approach for reducing cognitive load.

The USE Method

For every resource, monitor

  • Utilisation: % time the resource was busy
  • Saturation: Amount of work resource has to do (often queue length)
  • Errors: Count of error events

The RED Method

For every service, monitor request

  • Rate: Number of requests per second
  • Errors: The number of those requests that are failing
  • Duration: The amount of time those requests take

Google’s Four Golden Signals

  • Latency: The time it takes to service a request, with a focus on distinguishing between the latency of successful requests and the latency of failed requests.
  • Traffic: A measure of how much demand is being placed on the service. This is measured using a high-level service-specific metric, like HTTP requests per second in the case of an HTTP REST API.
  • Errors: The rate of requests that fail. The failures can be explicit (e.g., HTTP 500 errors) or implicit (e.g., an HTTP 200 OK response with a response body having too few items).
  • Saturation: How “full” is the service. This is a measure of the system utilization, emphasizing the resources that are most constrained (e.g., memory, I/O or CPU). Services degrade in performance as they approach high saturation.


FOSDEM 2018 seems to be a very popular event which covers a wide spectrum of free and open source software projects. I learned some new things and met several interesting people. Hopefully I’m able to return next year. If open source software peakes your interest, I can certainly recommend attending this event. You will not be dissapointed. 

Realmd and SSSD Active Directory Authentication

Introduction to SSSD and Realmd

Starting from Red Hat 7 and CentOS 7, SSSD or ‘System Security Services Daemon’  and realmd have been introduced. SSSD’s main function is to access a remote identity and authentication resource through a common framework that provides caching and offline support to the system. SSSD provides PAM and NSS integration and a database to store local users, as well as core and extended user data retrieved from a central server. 

The main reason to transition from Winbind to SSSD is that SSSD can be used for both direct and indirect integration and allows to switch from one integration approach to another without significant migration costs. The most convenient way to configure SSSD or Winbind in order to directly integrate a Linux system with AD is to use the realmd service. Because it allows callers to configure network authentication and domain membership in a standard way. The realmd service automatically discovers information about accessible domains and realms and does not require advanced configuration to join a domain or realm.

The realmd system provides a clear and simple way to discover and join identity domains. It does not connect to the domain itself but configures underlying Linux system services, such as SSSD or Winbind, to connect to the domain.

Realmd Pam SSSD

Please read through this Windows integration guide from Red Hat if you want more information. This extensive guide contains a lot of useful information about more complex situations.

Realmd / SSSD Use Cases

How to join an Active Directory domain?

  1. First of all start you will need to install the required packages:
  2. Configure ntp to prevent time sync issues:
  3. Join the server to the domain:
  4. Also add the default domain suffix to the sssd configuration file:

    Add the following beneath [sssd]

  5. Finally move the computer object to an organizational unit in Active Directory.

How to leave an Active Directory domain?

I saw multiple times that although the computer object was created in Active Directory it was still not possible to login with an ad account. The solution was each time to remove the server from the domain and then just add it back.

How to permit only one Active Directory group to logon

As it can be very useful to only allow one Active Directory group. For example a group with Linux system administrators.

 How to give sudo permissions to an Active Directory group



Example sssd.conf Configuration

The following is an example sssd.conf configuration file. I’ve seen it happen once that somehow access_provider was set to ad. I haven’t got the chance to play with that setting, as simple worked almost every time for now.

As Sean suggests in the comments, it’s not a good idea to set krb5_store_password_if_offline to True since the passwords are stored in the keyring in plaintext. Alternatively “cache_credentials = Yes” stores passwords in the db using SHA512 hash and that may be more appropriate if this functionality is needed.



Required security permissions in AD

A few months ago, we had a problem where some users were no longer able to authenticate. After an extended search we discovered the reason was a hardening change in permissions on some ou’s in our AD. My colleague Jenne and I discovered that the Linux server computer objects need minimal permissions on the ou which contains the users that want to authenticate on your Linux servers. After testing almost all obvious permissions, we came to the conclusions that the computer objects need “Read remote access information”!


How to debug SSSD and realmd?

The logfile which contains information about successful or failed login attempts is /var/log/secure. It contains information related to authentication and authorization privileges. For example, sshd logs all the messages there, including unsuccessful login. Be sure to check that logfile if you experience problems logging in with an Active Directory user. 

How to clear the SSSD cache?

As suggested by AP in the comments, you can manage your cache with the sss_cache command.  It can be used to clear the cache and update all records:

The sss_cache command can also clear all cached entries for a particular domain:
If the administrator knows that a specific record (user, group, or netgroup) has been updated, then sss_cachecan purge the records for that specific account and leave the rest of the cache intact:

Please refer to the official documentation for more information.

In case the above doesn’t help, you can also remove the cache ‘hte hard way’:

Just wanted to add this command which also helped me in one case somehow. 

Final Words

I hope this guide helps people towards a better Windows Linux integration. Let me know if you think there is a better way to do the above or if you have some useful information you think I should add to this guide.



Monitoring F5 BIG-IP Platform


The F5 BIG-IP platform consists of software and hardware that acts as a reverse proxy and distributes network or application traffic across a number of servers. Load balancers are used to increase capacity and reliability of applications. They improve the overall performance of applications by decreasing the burden on servers associated with managing and maintaining application and network sessions, as well as by performing application-specific tasks. As it’s a critical and important part of your network, monitoring F5 BIG-IP health is critical to ensure operations are working as expected. This can be achieved via traditional SNMP polling, but in order to get a detailed view of the performance of F5 network services, you will require a combination of SNMP polling and F5 syslog message analysis.

The best platform to do syslog message analysis is still the Elastic stack. Over the past years I’ve been working on a set of F5 Logstash filters, which can be used to create beautiful Kibana dashboards which can give you detailed insights in the working and processes of your F5 BIG Load Balancer.

Load balancers are generally grouped into two categories: Layer 4 and Layer 7. Layer 4 load balancers act upon data found in network and transport layer protocols (IP, TCP, FTP, UDP). Layer 7 load balancers distribute requests based upon data found in application layer protocols such as HTTP. Requests are received by both types of load balancers and they are distributed to a particular server based on a configured algorithm. Some industry standard algorithms are:

  • Round robin
  • Weighted round robin
  • Least connections
  • Least response time

Layer 7 load balancers can further distribute requests based on application specific data such as HTTP headers, cookies, or data within the application message itself, such as the value of a specific parameter. Load balancers ensure reliability and availability by monitoring the “health” of applications and only sending requests to servers and applications that can respond in a timely manner.

Monitoring F5 BIG-IP Platform


Nagios allows you to actively monitor the health of your F5 Load Balancer with SNMP. I’ll add some examples here asap.


You can find the required configuration files on GitHub. The project includes F5 Logstash filters, F5 elasticsearch templates and F5 Logstash patterns.

Logstash configuration

F5 Logstash input

F5 Logstash filters

dcc => ASM related messages. BIG-IP Application Security Manager (ASM) enables organizations to protect against OWASP top 10 threats, application vulnerabilities, and zero-day attacks. Leading Layer 7 DDoS defenses, detection and mitigation techniques, virtual patching, and granular attack visibility thwart even the most sophisticated threats before they reach your servers.

apd => Access Policy Demon. The apd process runs a BIG-IP APM access policy for a user session.

tmm => The traffic management microkernel is the process running on the BIG-IP host O/S that performs all of the local / global traffic management for the system.

sshd => The ssh daemon provides remote access to the BIG-IP system command line interface

logger => If a BIG-IP high-availability redundant pair has the Detect ConfigSync Status feature enabled, each unit in the pair sends periodic iControl queries to its peer to determine if the redundant pair configuration is synchronized. These iControl requests occur approximately every 30 seconds on each unit. Each inbound request generates an entry in both the local /var/log/httpd/ssl_access_log file and the /var/log/httpd/ssl_request_log file. As I never saw anything useful coming out of it, I asked our F5 engineer to have a look at this F5 article , which describes how to exclude these messages in the F5 syslog configuration.

F5 Logstash custom grok patterns

You will need to add these F5 Logstash custom grok patterns to your Logstash patterns directory. For me it’s located in /etc/logstash/patterns

Elasticsearch configuration

Included in the GitHub project you can find my f5 elasticsearch template, with the correct mappings for each field. This enables you to use your data more efficiently and allow for advanced ip aggregations. You can find more information about mapping types here. If you have ideas about better mappings (I know they need some work), please let me know on GitHub by making an issue.

F5 Remote Logging Configuration

You will need to configure your F5 with one or more remote syslog servers  to send logs your Logstash nodes. Ideally you will want to specify a custom port dedicated for F5 syslog traffic. You can find the official F5 remote syslog documentation here

You can use the F5 Configuration Utility to add a remote syslog server like this:

  1. Log on to the Configuration utility.
  2. Navigate to System > Logs > Configuration > Remote Logging.
  3. Enter the destination syslog server IP address in the Remote IP text box.
  4. Enter the remote syslog server UDP port (default is 514) in the Remote Port text box.
  5. Enter the local IP address of the BIG-IP system in the Local IP text box (optional).

    Note: For BIG-IP systems in a high availability (HA) configuration, the non-floating self IP address is recommended if using a Traffic Management Microkernel (TMM) based IP address.

  6. Click Add.
  7. Click Update.

Kibana Dashboards

The Logstash filters I created allow you do some awesome things in Kibana. I’m working on a set of dashboards with a menu which will allow you to drilldown to interesting stuff, such as apd processors, session, dcc scraping and other violations. 

Elastic F5 Home Dashboard

F5 Logstash fitler Kibana dashboard

Elastic F5 dcc scraping dashboard

F5 Logstash filter dcc scraping

I will open source them when I consider them ready for public. If you are truly interested in helping me develop and expand them, send me an email (willem.dhaeseatgmail), and I’ll consider sending you my Kibana 5 dashboards json exports. The only requirement is that you use f5-* as index pattern.