Monitoring NetApp Ontap

Introduction

There are of course numerous way to monitor your NetApp Ontap storage, but this post focusses for now on how to achieve quality monitoring with the help of a Nagios plugin, which was originally developed by John Murphy. The plugin definitely has some flaws, so all help is welcome to improve it. Read the post about debugging Perl scripts, make a fork of the project on Github and start experimenting.

The plugin is able monitor multiple critical NetApp Ontap components, from disk to aggregates to volumes. It can also alert you if it finds any unhealthy components.

NetApp Ontap Logical View

How to monitor Netapp Ontap with Nagios?

  • Download the latest release from GitHub to a temp directory and then navigate to it.
  • Copy the contents of NetApp/* to your /usr/lib/perl5 or /usr/lib64/perl5 directory to install the required version of the NetApp Perl SDK. (confirmed to work with SDK 5.1 and 5.2)
  • Copy check_netapp_ontap.pl script to your nagios libexec folder and configure the correct permissions

Parameters:

–hostname, -H => Hostname or address of the cluster administrative interface.

–node, -n => Name of a vhost or cluster-node to restrict this query to.

–user, -u => Username of a Netapp Ontapi enabled user.

–password, -p => Password for the netapp Ontapi enabled user.

–option, -o => The name of the option you want to check. See the option and threshold list at the bottom of this help text.

–warning, -w => A custom warning threshold value. See the option and threshold list at the bottom of this help text.

–critical, -c => A custom warning threshold value. See the option and threshold list at the bottom of this help text.

–modifier, -m => This modifier is used to set an inclusive or exclusive filter on what you want to monitor.

–help, -h => Display this help text.

Option list:

volume_health:

Check the space and inode health of a vServer volume on a NetApp Ontap cluster. If space % and space in *B are both defined the smaller value of the two will be used when deciding if the volume is in a warning or critical state. This allows you to accomodate large volume monitoring better. thresh: space % used, space in *B (i.e MB) remaining, inode count remaining, inode % used (Usage example: 80%i), “offline” keyword node: The node option restricts this check by vserver name.

aggregate_health:

Check the space and inode health of a cluster aggregate on a NetApp Ontap cluster. If space % and space in *B are both defined the smaller value of the two will be used when deciding if the volume is in a warning or critical state. This allows you to better accomodate large aggregate monitoring. thresh: space % used, space in *B (i.e MB) remaining, inode count remaining, inode % used (Usage example: 80%i), “offline” keyword, “is-home” keyword node: The node option restricts this check by cluster-node name.

snapshot_health:

Check the space and inode health of a vServer snapshot. If space % and space in *B are both defined the smaller value of the two will be used when deciding if the volume is in a warning or critical state. This allows you to better accomodate large snapshot monitoring. thresh: space % used, space in *B (i.e MB) remaining, inode count remaining, inode % used (Usage example: 80%i), “offline” keyword node: The node option restricts this check by vserver name.

quota_health:

Check that the space and file thresholds have not been crossed on a quota. thresh: N/A storage defined. node: The node option restricts this check by vserver name. snapmirror_health: Check the lag time and health flag of the snapmirror relationships. thresh: snapmirror lag time (valid intervals are s, m, h, d). node: The node options restricts this check by snapmirror destination cluster-node name.

filer_hardware_health:

Check the environment hardware health of the filers (fan, psu, temperature, battery). thresh: component name (fan, psu, temperature, battery). There is no default alert level they MUST be defined. node: The node option restricts this check by cluster-node name. port_health: Checks the state of a physical network port. thresh: N/A not customizable. node: The node option restricts this check by cluster-node name.

interface_health desc:

Check that a LIF is in the correctly configured state and that it is on its home node and port. Additionally checks the state of a physical port. thresh: N/A not customizable. node: The node option restricts this check by vserver name.

netapp_alarms:

Check for Netapp console alarms. thresh: N/A not customizable. node: The node option restricts this check by cluster-node name. cluster_health desc: Check the cluster disks for failure or other potentially undesirable states. thresh: N/A not customizable. node: The node option restricts this check by cluster-node name. disk_health: Check the health of the disks in the cluster. thresh: Not customizable yet. node: The node option restricts this check by cluster-node name. For keyword thresholds, if you want to ignore alerts for that particular keyword you set it at the same threshold that the alert defaults to.  

How to Debug Perl Scripts with EPIC Eclipse

Introduction

As about a year ago I took over development of John Murphy’s NetApp Ontap Cluster monitoring plugin, I was in need of some way to debug Perl scripts in Windows. After some online research, it seemed Eclipse with the EPIC plugin was the way to go. One small notice is that Eclipse is built with Java and hence consumes quite a bit of RAM. I wouldn’t recommend using it with less then 4 GB of RAM.

Debug Perl with EPIC Eclipse

 

To make it easier for people to debug Perl scripts from a Wndows client, I’ll list the steps here how to get things running smoothly.

How to debug Perl scripts?

  1. Download and install Eclipse.
  2. Download and install ActivePerl
  3. Verify installation by opening cmd.exe and type perl -v
  4. Open Eclipse, go to Help menu and select Eclipse Marketplace. Search for EPIC or Eclipse Perl Integration and install the EPIC components. You can find more info about EPIC on their website.
  5. In order to see local variables, PadWalker needs to be installed. Before you can use the Perl Package Manager, you client needs a reboot (after installation of ActivePerl).  Open command windows (cmd.exe) and type ‘ppm install PadWalker’, which would result in something similar like this output:
  6. Next thing would be to show line numbers, as looking for line 1085 in a 2000 line script is quite hard without it. Go to the Window menu, and choose Preferences.  Next, choose Perl EPIC in the left column and enable the checkbox left of “Show line numbers”.

Enjoy your debug Perl hunt. I hope it can help you finding and solving issues in the check_netapp_ontap script.

Let me know if there are better free Perl debugging methods on Windows in a comment!

Willem

Linux Vulnerabilities Overview

Introduction

Linux is considered to be much more secure then Windows. Over the last years however, several big Linux vulnerabilities were discovered . This definitely doesn’t mean that Linux is suddenly an insecure operating system. What it does mean is that you need to monitor and patch your systems. The same goes of course for Windows server, but I’l try to go into detail about WSUS updates in another post.

When you look at the latest Red Hat security advisories, it becomes very clear that you need to implement a system which automatically installs security updates. Doing this manually on 500+ servers would be crazy and a big waste of time. You also need make sure you always have a recent snapshot or backup in place, preferably right before the time the security updates are installed.

RunDeck allows you to do such a thing. After adding your Linux server as nodes to RunDeck, you can easily schedule a job containing a workflow where a VMware snapshot could be taken after which the installation of the security updates can be started safely.

I’ll try to go over the most famous Linux vulnerabilities and summarize some very basic information abut them.

Heartbleed

Security bug disclosed 01/04/2014 by Neel Mehta (Google) in the OpenSSL cryptography library, qualified as a buffer over-read situation where more data can be read than should be allowed.

  • CVE-2014-0160

Linux vulnerabilities Hearthbleed

Shellshock (Bashdoor)

Everybody must have heard of Heartbleed, discovered 24/09/14 by Stephane Chazelas. Shellshock allows attackers to execute any kind of code, smuggled in environment variables. Anything that invokes the flawed open-source shell and passes in malicious variables, which seems to be surprisingly easy to do, is vulnerable to being hijacked.

Just in case specific CGI scripts are vulnerable, you could use Shellshock Tester or Shellshock Test Tool.

  • CVE-2014-6271
  • CVE-2014-6277
  • CVE-2014-6278
  • CVE-2014-7169
  • CVE-2014-7186
  • CVE-2014-7187

Linux vulnerabilities Shellshock

Ghost

The last critical security flaw to hit the news 16/01/2016 was Ghost. It’s a stack-based buffer overflow in the glibc DNS client-side resolver that puts Linux machines at risk for remote code execution. It was discovered by a Google engineer. The glibc maintainers had previously been alerted of the issue via their bug tracker in July 2015. The issue was solved by a combined effort of two engineers o the Red Hat team, the Google team and the glibc team. Check out the Google blogpost.

  • CVE-2015-7547: glibc getaddrinfo stack-based buffer overflow

Linux vulnerabilities Ghost

Kernel Zero-Day Flaw

19/01/2016 a new critical zero-day Linux vulnerability has been found in the kernel that could allow attackers to gain root privileges. It has been discovered by a research group named Perception Point. The issue was apparently present since 2012 and is the result of a reference leak in the keyrings facility built into Linux. The keyrings facility is a way to encrypt and store login data, encryption keys and certificates and make them available to applications. 

A PoC was released on GitHub with an example exploit code.

  • CVE-2016-0728

Patch your impacted systems against Linux vulnerabilities

Ensure that you are running the latest patch level. If it’s a virtual machine, take a VMware snapshot first, so that in worst case scenario, you can go back.

CentOS / Red Hat / Fedora

Ubuntu / Debian

You can schedule this easily with for example Nagios Reactor. It allows you execute commands over SSH on scheduled intervals. In combination with the VMware snapshot chain, you easily create a robust patching ecosystem. Please note that Nagios reactor is completely free, but is still in beta. It also only seems to work on CentOS 6.

RunDeck

You can use an inline script such as this to start a yum update on your Linux serves:

The job only requires one variable and that I called reboot. This can be set to true or false.

This is a screenshot of the Log Output of a RunDeck job:

DAF Linux Yum

 

 

 

Monitoring WordPress Updates

Introduction

WordPress regularly releases updates with new features and bug fixes. It is advised to update asap, so monitoring your website for new WordPress updates is critical to ensure your website is fully patched against potential attacks. Automatic background updates were introduced in WordPress 3.7 in an effort to promote better security, and to streamline the update experience overall. By default, only minor releases – such as for maintenance and security purposes – and translation file updates are enabled on most sites. In special cases, plugins and themes may be updated.

In WordPress, there are four types of automatic background updates:

  1. Core updates
  2. Plugin updates
  3. Theme updates
  4. Translation file updates

Core Updates

Core updates are subdivided into three types:

  1. Core development updates, known as the “bleeding edge”
  2. Minor core updates, such as maintenance and security releases
  3. Major core release updates

By default, every site has automatic updates enabled for minor core releases and translation files. 

Update Configuration

Automatic updates can be configured using one of two methods: defining constants in wp-config.php, or adding filters using a Plugin.

Configuration via wp-config.php

Using wp-config.php, automatic updates can be disabled completely, and core updates can be disabled or configured based on update type.

Constant to Disable All WordPress Updates

The core developers made a conscious decision to enable automatic updates for minor releases and translation files out of the box. Going forward, this will be one of the best ways to guarantee your site stays up to date and secure and, as such, disabling these updates is strongly discouraged.

To completely disable all types of automatic updates, core or otherwise, add the following to your wp-config.php file:

Constant to Configure Core WordPress Updates

To enable automatic updates for major releases or development purposes, the place to start is with the WP_AUTO_UPDATE_CORE constant. Defining this constant one of three ways allows you to blanket-enable, or blanket-disable several types of core updates at once.

WP_AUTO_UPDATE_CORE can be defined with one of three values, each producing a different behavior:

  • Value of true – Development, minor, and major updates are all enabled
  • Value of false – Development, minor, and major updates are all disabled
  • Value of 'minor' – Minor updates are enabled, development, and major updates are disabled

Note that only sites already running a development version will receive development updates. For other sites, setting WP_AUTO_UPDATE_CORE to true will mean that it will only get minor and major updates.

For development sites, the default value of WP_AUTO_UPDATE_CORE is true. For other sites sites, the default value of WP_AUTO_UPDATE_CORE is minor.

How to check for WordPress Updates

  1.  Add the IP addresses of the Nagios servers that need access to the Allowed array in check_wordpress_updates.php”
  2. Put check_wordpress_updates.php in the root of you WordPress installation”
  3. Put check_wordpress_updates.sh in you Nagios plugin folder and call it from Nagios interface”

The result should look like this:

Wordpress updates plugin output

Thanks to Kong Jin Jie and hteske on whose scripts this is based.

Monitoring Linux Processes

Introduction

As I had some issues with my Linode server related to mistuned MariaDB settings, I was forced to find a way to monitor a Linux process, such as httpd, mysqld and php. Not only did I need to know if they were running, how many of them were running, but also their cpu and memory usage, so I could tune my Apache settings (located at /etc/httpd/conf/httpd.conf). I hoped to find a plugin which did all of the above, but couldn’t find one. The plugin that came closest to what I needed, was this one written bij Eli Keimig. 

As the last release date was 08/11/2010 and it missed some crucial features, I decided to make it better. At the moment I added the following features:

  • Performance data for Linux process CPU usage.
  • Performance data for Linux process Memory usage.
  • Added Linux process count with performance data.
  • Improved the plugin output.
  • Added minimum and maximum Linux process count.

How to monitor a Linux process?

The plugin uses ‘ps’ to retrieve the Linux process information. Logged in as root, type the following in your terminal to show active processes on the server:

The a option tells ps to list the processes of all users on the system rather than just those of the current user, with the exception of group leaders and processes not associated with a terminal. A group leader is the first member of a group of related processes.

The u option tells ps to provide detailed information about each process.

The x option adds to the list processes that have no controlling terminal, such as daemons, which are programs that are launched during boot and run unobtrusively in the background until they are activated by a particular event or condition.

As the list of processes can be quite long and occupy more than a single screen, the output of ps aux can be piped (transferred) to the less command, which lets it be viewed one screen full at a time. The output can be advanced one screen forward by pressing the SPACE bar and one screen backward by pressing the b key.

With the -C parameter you can specify the Linux process for which to show information.

And you can specify what specific information to show with the -o parameter:

After joining the results with paste and making the sum with bc, we get the result we want.

Check out this screenshot which shows information about the httpd, mysqld, nagios and php processes.

Linux process

This information can really help troubleshoot LAMP configuration issues. I haven’t got a lot of time to produce a decent post, but I’ll extend this post when I find some more time. As it’s a Bash script I’m guessing it doesn’t need to much explanation to get it working in Nagios.