Monitoring NetApp Ontap

Introduction

I’d like to start with thanking the original developer, John Murphy. Thanks to his plugin to monitor NetApp Ontap storage, we don’t need to buy the expensive NetApp plugin from Quorum.  He also inspired me to continue developing Nagios plugins in my free time. So if you want to monitor your NetApp Ontap Cluster, this plugin could help you do that. It is written in Perl and is not being actively developed, like my other Powershell plugins, as my Perl knowledge is less evolved then my Powershell knowledge. Somestimes user send me a piece of code to add, if you want to test the latest additions, give the dev branch a try.  All help is welcome to improve this plugin. Read the post about debugging Perl scripts, make a fork of the project on Github and start experimenting.

The plugin is able monitor NetApp Ontap components, from disk to aggregates to volumes and alert you if if finds any unhealthy components.

NetApp Ontap Logical View

How to monitor your Netapp Ontap?

  1. Download the latest release from GitHub to a temp directory and then navigate to it.
  2. Copy the contents of NetApp/* to your /usr/lib/perl5 or /usr/lib64/perl5 directory to install the required version of the NetApp Perl SDK. (confirmed to work with SDK 5.1 and 5.2)
  3. Copy check_netapp_ontap.pl script to your nagios libexec folder and configure the correct permissions

Parameters:

–hostname, -H => Hostname or address of the cluster administrative interface.

–node, -n => Name of a vhost or cluster-node to restrict this query to.

–user, -u => Username of a Netapp Ontapi enabled user.

–password, -p => Password for the netapp Ontapi enabled user.

–option, -o => The name of the option you want to check. See the option and threshold list at the bottom of this help text.

–warning, -w => A custom warning threshold value. See the option and threshold list at the bottom of this help text.

–critical, -c => A custom warning threshold value. See the option and threshold list at the bottom of this help text.

–modifier, -m => This modifier is used to set an inclusive or exclusive filter on what you want to monitor.

–help, -h => Display this help text.

Option list:

volume_health: Check the space and inode health of a vServer volume on a NetApp Ontap cluster. If space % and space in *B are both defined the smaller value of the two will be used when deciding if the volume is in a warning or critical state. This allows you to accomodate large volume monitoring better. thresh: space % used, space in *B (i.e MB) remaining, inode count remaining, inode % used (Usage example: 80%i), “offline” keyword node: The node option restricts this check by vserver name.

aggregate_health: Check the space and inode health of a cluster aggregate on a NetApp Ontap cluster. If space % and space in *B are both defined the smaller value of the two will be used when deciding if the volume is in a warning or critical state. This allows you to better accomodate large aggregate monitoring. thresh: space % used, space in *B (i.e MB) remaining, inode count remaining, inode % used (Usage example: 80%i), “offline” keyword, “is-home” keyword node: The node option restricts this check by cluster-node name.

snapshot_health: Check the space and inode health of a vServer snapshot. If space % and space in *B are both defined the smaller value of the two will be used when deciding if the volume is in a warning or critical state. This allows you to better accomodate large snapshot monitoring. thresh: space % used, space in *B (i.e MB) remaining, inode count remaining, inode % used (Usage example: 80%i), “offline” keyword node: The node option restricts this check by vserver name.

quota_health: Check that the space and file thresholds have not been crossed on a quota. thresh: N/A storage defined. node: The node option restricts this check by vserver name. snapmirror_health: Check the lag time and health flag of the snapmirror relationships. thresh: snapmirror lag time (valid intervals are s, m, h, d). node: The node options restricts this check by snapmirror destination cluster-node name.

filer_hardware_health: Check the environment hardware health of the filers (fan, psu, temperature, battery). thresh: component name (fan, psu, temperature, battery). There is no default alert level they MUST be defined. node: The node option restricts this check by cluster-node name. port_health: Checks the state of a physical network port. thresh: N/A not customizable. node: The node option restricts this check by cluster-node name.

interface_health desc: Check that a LIF is in the correctly configured state and that it is on its home node and port. Additionally checks the state of a physical port. thresh: N/A not customizable. node: The node option restricts this check by vserver name.

netapp_alarms: Check for Netapp console alarms. thresh: N/A not customizable. node: The node option restricts this check by cluster-node name. cluster_health desc: Check the cluster disks for failure or other potentially undesirable states. thresh: N/A not customizable. node: The node option restricts this check by cluster-node name. disk_health: Check the health of the disks in the cluster. thresh: Not customizable yet. node: The node option restricts this check by cluster-node name. For keyword thresholds, if you want to ignore alerts for that particular keyword you set it at the same threshold that the alert defaults to.  

How to Debug Perl Scripts with EPIC Eclipse

Introduction

As about a year ago I took over development of John Murphy’s NetApp Ontap Cluster monitoring plugin, I was in need of some way to debug Perl scripts in Windows. After some online research, it seemed Eclipse with the EPIC plugin was the way to go. One small notice is that Eclipse is built with Java and hence consumes quite a bit of RAM. I wouldn’t recommend using it with less then 4 GB of RAM.

Debug Perl with EPIC Eclipse

 

To make it easier for people to debug Perl scripts from a Wndows client, I’ll list the steps here how to get things running smoothly.

How to debug Perl scripts?

  1. Download and install Eclipse.
  2. Download and install ActivePerl
  3. Verify installation by opening cmd.exe and type perl -v
  4. Open Eclipse, go to Help menu and select Eclipse Marketplace. Search for EPIC or Eclipse Perl Integration and install the EPIC components. You can find more info about EPIC on their website.
  5. In order to see local variables, PadWalker needs to be installed. Before you can use the Perl Package Manager, you client needs a reboot (after installation of ActivePerl).  Open command windows (cmd.exe) and type ‘ppm install PadWalker’, which would result in something similar like this output:
  6. Next thing would be to show line numbers, as looking for line 1085 in a 2000 line script is quite hard without it. Go to the Window menu, and choose Preferences.  Next, choose Perl EPIC in the left column and enable the checkbox left of “Show line numbers”.

Enjoy your debug Perl hunt. I hope it can help you finding and solving issues in the check_netapp_ontap script.

Let me know if there are better free Perl debugging methods on Windows in a comment!

Willem