Monitoring Linux Process Count, CPU and Memory

 
 
 

Introduction

As I had some issues with my Linode server related to mistuned mariadb settings, I was forced to find a way to monitor a Linux process, such as httpd, mysqld and php. Not only did I need to know if they were running, how many of them were running, but also their cpu and memory usage, so I could tune my Apache settings (located at /etc/httpd/conf/httpd.conf). I hoped to find a plugin which did all of the above, but couldn’t find one. The plugin that came closest to what I needed, was this one written bij Eli Keimig. 

As the last release date was 08/11/2010 and it missed some crucial features, I decided to make it better. At the moment I added the following features:

  • Performance data for Linux process CPU usage.
  • Performance data for Linux process Memory usage.
  • Added Linux process count with performance data.
  • Improved the plugin output.
  • Added minimum and maximum Linux process count.

How to monitor a Linux process?

The plugin uses ‘ps’ to retrieve the Linux process information. Logged in as root, type the following in your terminal to show active processes on the server:

The a option tells ps to list the processes of all users on the system rather than just those of the current user, with the exception of group leaders and processes not associated with a terminal. A group leader is the first member of a group of related processes.

The u option tells ps to provide detailed information about each process.

The x option adds to the list processes that have no controlling terminal, such as daemons, which are programs that are launched during boot and run unobtrusively in the background until they are activated by a particular event or condition.

As the list of processes can be quite long and occupy more than a single screen, the output of ps aux can be piped (transferred) to the less command, which lets it be viewed one screen full at a time. The output can be advanced one screen forward by pressing the SPACE bar and one screen backward by pressing the b key.

 

With the -C parameter you can specify the Linux process for which to show information.

And you can specify what specific information to show with the -o parameter:

After joining the results with paste and making the sum with bc, we get the result we want.

Check out this screenshot which shows information about the httpd, mysqld, nagios and php processes.

Linux process

This information can really help troubleshoot LAMP configuration issues. I haven’t got a lot of time to produce a decent post, but I’ll extend this post when I find some more time. As it’s a Bash script I’m guessing it doesn’t need to much explanation to get it working in Nagios.

Let’s Encrypt – How to generate your SSL certificates?

Introduction

I found some time to start working on migrating my ‘Free SSL Certificate for 90 Days’ SSL certificate from Comodo to a free Let’s Encrypt SSL certificate, so I will try to document some of the steps in this blog post. You will probably be better off reading through this Quick Start Guide though. 

Let’s Encrypt is a free, automated, and open certificate authority (CA), run for the public’s benefit. Let’s Encrypt is a service provided by the Internet Security Research Group (ISRG). ISRG is a California public benefit corporation.

The key principles behind Let’s Encrypt are:

  • Free: Anyone who owns a domain name can use Let’s Encrypt to obtain a trusted certificate at zero cost.
  • Automatic: Software running on a web server can interact with Let’s Encrypt to painlessly obtain a certificate, securely configure it for use, and automatically take care of renewal.
  • Secure: Let’s Encrypt will serve as a platform for advancing TLS security best practices, both on the CA side and by helping site operators properly secure their servers.
  • Transparent: All certificates issued or revoked will be publicly recorded and available for anyone to inspect.
  • Open: The automatic issuance and renewal protocol will be published as an open standard that others can adopt.
  • Cooperative: Much like the underlying Internet protocols themselves, Let’s Encrypt is a joint effort to benefit the community, beyond the control of any one organization.

Do you need any more convincing? For years people have been paying far too much for their SSL certificates to security companies such as Comodo, GlobalSign, Godaddy, Thawte and others. I have never really understood why they cost so much. And why would we trust them? Remember DigiNotar

DigiNotar was a Dutch certificate authority owned by VASCO Data Security International. On September 3, 2011, after it had become clear that a security breach had resulted in the fraudulent issuing of certificates, the Dutch government took over operational management of DigiNotar’s systems. That same month, the company was declared bankrupt.

After more than 500 fake DigiNotar certificates were found, major web browser makers reacted by blacklisting all DigiNotar certificates. The scale of the incident was used by some organizations like ENISA andAccessNow.org to call for a deeper reform of HTTPS in order to remove the weakest link possibility that a single compromised CA can affect that many users.

But what can other Certificate Authorities offer that Let’s Encrypt can’t?

There are three types of SSL certificates: Domain Validated (DV), Organization Validated (OV) and Extended Validation (EV). To get a DV cert you only need prove that you control the domain for which the certificate is assigned. For an OV cert the CA checks with third parties to ensure that the name of the applying organization is the same as that which owns the domain. For an EV cert, the kind that turn your browser address bar green, you need to provide much more extensive documentation, and there are no personal EV certs.
The very fact that the Let’s Encrypt process is automated means that they will not be able to offer anything other than DV certificates. To many companies this isn’t enough. 

How to use Let’s Encrypt?

The source code of LetsEncrypt can be found on GitHub, so start with cloning the GitHub repository with the following commands:

So now you have the necessary files, there are several ways to generate the LetsEncrypt certificates. I used the webroot method and did it like this:

After running the above command, you first need to enter your email address for recovery purposes. 

LetsEncrypt

After which you need to agree on the ‘Terms of Service”:

Let's Encrypt

All generated keys and issued certificates can be found in /etc/letsencrypt/live/$domain. In my cases this means after running the above command, I received the following files:

Rather than copying, please point your (web) server configuration directly to those files (or create symlinks). During the renewal, /etc/letsencrypt/live is updated with the latest necessary files.

So your Apache configuration file will need these entries:

You will have to renew your certificates every month. It should be easy to do this automatically with a cron job. I’ll see if I can add this to this blog post later.

NAF – Nagios – Reconfigure Host – Free Variables

 
 
 

Introduction

There needs to be a method for admins to store information about their infrastructure components in their Nagios configuration without imposing a set of specific variables on others. Nagios attempts to solve this problem by allowing users to define custom variables (or free variables) in their object definitions. Free variables allow users to define additional properties in their host, service, and contact definitions, and use their values in notifications, event handlers, and host and service checks.

Free variables are one of the most under-estimated features of Nagios XI. But since the release of Nagios XI 5, things have changed. The root cause was that they were only viewable and editable from the CCM. The new free variables component enables us to show these free variables to your users.

There are a few important things that you should note about custom variables:

  • Custom variable names must begin with an underscore (_) to prevent name collision with standard variables
  • Custom variable names are converted to all uppercase before use
  • Custom variables are inherited from object templates like normal variables
  • Scripts can reference custom variable values with macros and environment variables

The list of use cases is endless. A few examples of free variables that could be useful:

  • Serial Number
  • Support Contract
  • Address
  • Primary Contact Name
  • Primary Contact Phone
  • Primary Contact Email
  • Secondary Contact Name
  • Secondary Contact Phone
  • Secondary Contat Email
  • Infrastructure Type (Virtual – Physical)
  • Datastore
  • SNMP Community
  • Rack Number

How to update your Nagios free variables?

  1. Put the Bash script in the Nagios Reactor scripts folder:
  2. Make sure it’s executable with:
  3. Create the event chain in Nagios Reactor. Create a block that will call the Bash script and pass the parameters:

And suddenly you have build yourself a nice webservice which you can call from anywhere (like from Powershell) so you can automate your Nagios host free variables population.

Custom or free variables as Nagios Macros

Custom or free variable values can be referenced in scripts and executables that Nagios runs for checks, notifications, etc. by using macros or environment variables. Custom variable macros are trusted (because you define them) and therefore not cleaned/sanitized before they are made available to scripts.

In order to prevent name collision among custom variables from different object types, Nagios prepends “_HOST”, “_SERVICE”, or “_CONTACT” to the beginning of custom host, service, or contact variables, respectively, in macro and environment variable names. 

free variables

I will try to extend this documentation when I find some more time.

Greetings. Willem

Monitoring Scheduled Tasks on Windows

 
 
 

Introduction

Tasks scheduler is a Microsoft Windows component that allows you to schedule programs or scripts to start at pre-defined intervals. There are two major versions of the task scheduler: In version 1.0, definitions and schedules are stored in binary .job files. Every task corresponds to a single action. This plugin will not work on version 1.0 of the task scheduler, which is running on Windows Server 2000 and 2003. In version 2.0, the Windows task scheduler got a redesigned user interface based on Management console. Version 2.0 also supports calendar and event-based triggers, such as starting a task when a particular event is logged to the event log, or when a combination of events has occurred. Also, several tasks that are triggered by the same event can be configured to run either simultaneously or in a pre-determined chained sequence of a series of actions.

Tasks can also be configured to run based on system status such as being idle for a pre-configured amount of time, on startup, logoff, or only during or for a specified time. Other new features are a credential manager to store passwords so they cannot be retrieved easily. Also, scheduled tasks are executed in their own session, instead of the same session as system services or the current user. You can find a list of all task scheduler 2.0 interfaces here.

Starting from Windows Powershell 4.0, you can use a whole range of Powershell cmdlets to manage your scheduled tasks with Powershell. This plugin for Nagios does not use these cmdlets, as it has to be Powershell 2.0 compatible. Maybe in a few years, when Powershell 2.0 becomes obsolete, I’ll patch the script to make use of the new cmdlets. You can find the complete list of cmdlets here. Failing tasks will always end with some sort of error code. You can find the complete list of error codes here. This plugin will output the exitcodes for failing tasks in the Nagios service description. Output will also notify you on tasks that are still running. We have multiple Windows servers at work with a growing amount of scheduled tasks and each scheduled task needs to be monitored. With the help of Nagios and this plugin you can find out:

  • How many are running at the same time?
  • How many are failing?
  • How long are they running?
  • Who created them?

Disabled scheduled tasks are excluded by default from 3.14.12.06. In earlier versions, you had to manually exclude them by excluding them with -EF or -ET. It seemed like a logical decision to exclude disabled tasks by default and was suggested by someone on the Nagios Exchange reviewing the plugin.. Maybe one day I’ll make a switch to include them again if specified. As some scheduled tasks do not need to be monitored, the script enables you to exclude complete folders.
One of the folders I tend to exclude almost all the time is the “Microsoft” folder. It seems like several tasks in the Microsoft folder tend to fail sometimes. So unless you absolutely need to know the state of every single scheduled task running on your Windows Server, I can advise you to exclude it too. You can find the folder and tasks in this locations: C:\Windows\System32\Tasks Starting from version 5.6.21 it is possible to also use a parameter to include tasks or task folders. This filter will get applied after the exclude parameter. This is the help of the plugin, which lists all valid parameters:

You could put every scheduled task  you don’t want to monitor in a separate  folder and exclude it with the -EF parameter. Alternatvely, you can use the -ET parameter to exclude based on name patterns. One quite important thing to know is that in order to exclude or include the root folder, you need to escape the backslash, like this: “\\”.

How to monitor your scheduled tasks?

  1. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell.
  2. In the nsclient.ini configuration file, define the script like this:
  3. Make a command in Nagios like this:
  4. Configure your service in Nagios. Make use of the above created command. Configure something similar like this as $ARG1$:

Some things to consider to make it work:

  • “set-exectionpolicy remotesigned”
  • Nscp service account permissions => Running with local system should suffice, but I had users telling me it only worked with a local admin. I found out that on some NSClient++ versions, more specific version 0.4.3.88 and probably some earlier versions too, the following error occured when running nscp service as local system: “CHECK_NRPE: Invalid packet type received from server”. After filing an issue on the GitHub project page of NSClient++, Michael Medin quickly acknowledged the issue and solved it from version 0.4.3.102, so the plugin should work again as local system.

If you would run the script in cli from you Nagios plugin folder, this would be the command:

If you would want to exclude one noisy unimportant scheduled task, the command used in cli would look like this:

If you only want the scheduled tasks in the root to be monitored, you can use this command:

This would only give you the scheduled tasks available in the root folder. The output look like this now.

It seems the perfdata in the Highcharts graphs sometimes contains decimal numbers (see screenshot), which is kind of strange as I’m sure I only pass rounded numbers. Seems this is related to the way RRD files are working. To reduce the amount of storage space used, NPCD and RRD while average out the data, resulting in decimals, even when you don’t expect them.

  • Add switches to change returned values and output.
  • Add array parameter with exit codes that should be excluded.
  • Test remote execution. In some cases it might be useful to be able to check remotely for failed windows tasks.
  • Include a warning / critical threshold when discovered tasks exceed a certain duration.
  • I was hoping to add some more exit codes to check, which would make failed tasks easier to troubleshoot. You can find the list of scheduled task exit codes here. The constants that begin with SCHED_S_ are success constants, and the constants that begin with SCHED_E_ are error constants.

Screenshots:

These are some screenshots of the Nagios XI Graph Explorer for two of our servers making use of the plugin to monitor scheduled tasks: Tasks 01 check_ms_win_tasks_graph_02 Let me know on the Nagios Exchange what you think of my plugin by rating it or submitting a review. Please also consider starring the project on GitHub.

Willem