Monitoring Windows Scheduled Tasks

Introduction

Tasks scheduler is a Microsoft Windows component that allows you to schedule programs or scripts to start at pre-defined intervals. There are two major versions of the task scheduler: In version 1.0, definitions and schedules are stored in binary .job files. Every task corresponds to a single action. This plugin will not work on version 1.0 of the task scheduler, which is running on Windows Server 2000 and 2003. In version 2.0, the Windows task scheduler got a redesigned user interface based on Management console. Version 2.0 also supports calendar and event-based triggers, such as starting a task when a particular event is logged to the event log, or when a combination of events has occurred. Also, several tasks that are triggered by the same event can be configured to run either simultaneously or in a pre-determined chained sequence of a series of actions.

Tasks can also be configured to run based on system status such as being idle for a pre-configured amount of time, on startup, logoff, or only during or for a specified time. Other new features are a credential manager to store passwords so they cannot be retrieved easily. Also, scheduled tasks are executed in their own session, instead of the same session as system services or the current user. You can find a list of all task scheduler 2.0 interfaces here.

Requirements

Starting from Windows Powershell 4.0, you can use a whole range of Powershell cmdlets to manage your scheduled tasks with Powershell. This plugin for Nagios does not use these cmdlets, as it has to be Powershell 2.0 compatible. Maybe in a few years, when Powershell 2.0 becomes obsolete, I’ll patch the script to make use of the new cmdlets. You can find the complete list of cmdlets here. Failing tasks will always end with some sort of error code. You can find the complete list of error codes here. This plugin will output the exitcodes for failing tasks in the Nagios service description. Output will also notify you on tasks that are still running. We have multiple Windows servers at work with a growing amount of scheduled tasks and each scheduled task needs to be monitored. With the help of Nagios and this plugin you can find out:

  • How many are running at the same time?
  • How many are failing?
  • How long are they running?
  • Who created them?

Versions

Disabled scheduled tasks are excluded by default from 3.14.12.06. In earlier versions, you had to manually exclude them by excluding them with -EF or -ET. It seemed like a logical decision to exclude disabled tasks by default and was suggested by someone on the Nagios Exchange reviewing the plugin.. Maybe one day I’ll make a switch to include them again if specified. As some scheduled tasks do not need to be monitored, the script enables you to exclude complete folders.

Since v5.13.160614 it is possible to include hidden tasks. Just add the ‘–Hidden 1’ switch to your parameters and your hidden tasks will be monitored.

One of the folders I tend to exclude almost all the time is the “Microsoft” folder. It seems like several tasks in the Microsoft folder tend to fail sometimes. So unless you absolutely need to know the state of every single scheduled task running on your Windows Server, I can advise you to exclude it too. You can find the folder and tasks in this locations: C:\Windows\System32\Tasks
It is possible to include tasks or task folders with the ‘–InclFolders’ and ‘–InclTasks’ parameters. This filter will get applied after the exclude parameter. Please note that including a folder is not recursive. Only tasks in the root of the folder will be included.

Help

This is the help of the plugin, which lists all valid parameters:

You could put every scheduled task  you don’t want to monitor in a separate  folder and exclude it with the -EF parameter. Alternatvely, you can use the -ET parameter to exclude based on name patterns. One quite important thing to know is that in order to exclude or include the root folder, you need to escape the backslash, like this: “\\”.

How to monitor your scheduled tasks?

  1. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell.
  2. In the nsclient.ini configuration file, define the script like this:

    For more information about external scripts configuration, please review the NSClient documentation. You can also consider defining a wrapped script in nsclient.ini to simplify configuration.

  3. Make a command in Nagios like this:

  4. Configure your service in Nagios. Make use of the above created command. Configure something similar like this as $ARG1$:

Some things to consider to make it work:

  • “set-exectionpolicy remotesigned”
  • Nscp service account permissions => Running with local system should suffice, but I had users telling me it only worked with a local admin. I found out that on some NSClient++ versions, more specific version 0.4.3.88 and probably some earlier versions too, the following error occured when running nscp service as local system: “CHECK_NRPE: Invalid packet type received from server”. After filing an issue on the GitHub project page of NSClient++, Michael Medin quickly acknowledged the issue and solved it from version 0.4.3.102, so the plugin should work again as local system.

Examples

If you would run the script in cli from you Nagios plugin folder, this would be the command:

If you would want to exclude one noisy unimportant scheduled task, the command used in cli would look like this:

If you only want the scheduled tasks in the root to be monitored, you can use this command:

This would only give you the scheduled tasks available in the root folder. The output look like this now.

Final Words

It seems the perfdata in the Highcharts graphs sometimes contains decimal numbers (see screenshot), which is kind of strange as I’m sure I only pass rounded numbers. Seems this is related to the way RRD files are working. To reduce the amount of storage space used, NPCD and RRD while average out the data, resulting in decimals, even when you don’t expect them.

This is a small to do list:

  • Add switches to change returned values and output.
  • Add array parameter with exit codes that should be excluded.
  • Test remote execution. In some cases it might be useful to be able to check remotely for failed windows tasks.
  • Include a warning / critical threshold when discovered tasks exceed a certain duration.
  • I was hoping to add some more exit codes to check, which would make failed tasks easier to troubleshoot. You can find the list of scheduled task exit codes here. The constants that begin with SCHED_S_ are success constants, and the constants that begin with SCHED_E_ are error constants.

Screenshots:

These are some screenshots of the Nagios XI Graph Explorer for two of our servers making use of the plugin to monitor scheduled tasks: Tasks 01 check_ms_win_tasks_graph_02 Let me know on the Nagios Exchange what you think of my plugin by rating it or submitting a review. Please also consider starring the project on GitHub.

Willem

Monitoring Microsoft IIS Application Pools

Introduction

For those who are not aware, IIS is a HTTP web server from Microsoft which can host both static and dynamic content. This is done by a Windows kernel-mode driver named http.sys. It listens for incoming TCP requests on a configured port, performs some basic security checks and passes the request to a user-mode process. The worker fulfills the request and sends the response back to the requester. Web application are grouped into IIS application pools which has it’s own process assigned to it.

As we are migrated al our IIS applications to a new IIS 8.5 farm on Windows 2012 R2 servers, we needed a way to reliably monitor the state of our most critical IIS application pools. So I created a Powershell script which is able to check the state of an application pool and count the number of web application using it. As each IIS application pool has one w3wp.exe IIS worker process assigned, I added the % processor usage and memory usage to the perfdata.

The latest version also contains a new method to retrieve the IIS application pool information. As Get-ChildItem IIS:\AppPools has a weird bug where the command hangs sometimes I had to look for an alternative. This method uses C:\Windows\system32\inetsrv\appcmd.exe   instead, which seems much more performant.  

How to monitor your MS IIS Application Pools with Nagios?

  • Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell.
  • In the nsclient.ini configuration file, define the script like this:
  • Make a command in Nagios like this:
  • Configure your service in Nagios. Make use of the above created command. Configure something similar like this as $ARG1$:

    Or if you want to monitor an application pool which has OnDemand startmode where there is no IIS worker process when it isn’t used.

    IIS application pools OnDemand Startmode
    When you want to use the AppCmd.exe method:

Final Words

I only had the chance to test this on a Windows Server 2012 R2. It’s very possible you will experience issues on lower IIS versions. You need to install the IIS Management Scripts and Tools feature for the script to work properly.

IIS Application Pool

When you got it up and running your Nagios server should look like this:

monitoring iis application pools

 

check-ms-win-disk-load-graph-01

Monitoring Windows Disk Load

Introduction

Monitoring disk load is one of the harder things to monitor, but also one of the most crucial things you should monitor. Disk load problems can really give your applications a hard time, slowing them down or crippling them completely. On Linux servers it’s easy, as the CPU wait counter gives clear hints of issues with your disk io.

I rolled out check_diskstat on our Linux servers in September 2014  and really missed a similar plugin for monitoring disk load on Windows servers. Hence, I started thinking about a new Powershell script, which would use the Powershell command ‘get-counter’, to gather all disk related information from the Performance Monitor. I started with making a list of the requirements:

  • The main requirement was that it had to be multilingual, as I work on English and Dutch versions of Windows Server 2003, 2003 R2, 2008 and 2008 R2. 
  • Another requirement was that the script had to allow an argument that specifies the amount of samples over which an average could be calculated.
  • The perfdata output should be outputted in a way where all disk load related values had to be visible in a graph. I had to deal with very high values, eg 8763098004 and very small decimals, eg 0,00014. This implied I had to find some way to make it visually attractive and correct in Highcharts, for example by outputting in milliseconds instead of seconds or megabytes instead of bytes.
  • The plugin also had to work culture independent. Some culture use ‘,’ and other use ‘.’ as decimal. I solved this by replacing [System.Threading.Thread]::CurrentThread.CurrentCulture with ‘en-US’ ans setting it back to the original value once I’m done.

Monitoring disk load may be useful in finding the cause of performance issues. If a component of an application starts writing huge logs or big amounts of data in a database on your Windows disks, a bottleneck could be created in your application’s flow. This bottleneck could quickly result in any kind of lag, latency or slowness for end-users, resulting in more incidents, calls or complaints. An integral part of the job as monitoring engineer, is to avoid  situations as described above. Here Nagios can help you, by alerting you before applications start getting slow. Up until now, the only way to monitor performance counters for Windows servers, was using an agent like NSClient++ (or NCPA?) to retrieve one performance counter. My check_ms_windows_disk_load plugin enables you to combine several disk load related performance counters with only one service. This method has several advantages:

  • You don’t need to worry what counters to monitor. The plugin will do that for you.
  • As the plugin monitors 8 performance counters, and you only need one service, this would save you 7 services for each disk. So your Nagios server has less work, which enables you to monitor other stuff instead or increase the monitor interval on your checks.
  • As you can pass maxsamples (-ms or –MaxSamples) as a parameter, you can choose yourself how long you want the plugin to run before calculating averages. Each sample should be one second.

You could also prove to your application engineers that the storage is or is not the cause of their application’s performance. You can use comprehensive graphs visualizing a collection of disk performance related information. You also need knowledge about your disk load in order to choose the right disk type for the job. Are your 3TB SATA disks strong enough to handle the job or will you have to buy more expensive SSD’s to achieve the performance you need?

How to monitor your disk load?

  1. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell.
  2. In the nsclient.ini configuration file, define the script like this:

  3. Make a command in Nagios like this:

  4. Configure your service in Nagios. Make use of the above created command. Configure something similar like this as $ARG1$:

Examples:

One day after everything is configured correctly, your Highcharts graphs should look like this:

disk load graph 01

If you want to test the load on your Windows disks, you can use this Storage Load Generator DiskSPD from Microsoft to play. (Yes Microsoft has a GitHub account!!)

I hope this plugin can help you monitor the disk load on your Windows hosts. Please rate it on the Nagios Exchange if you like my work.

Monitoring Microsoft Exchange 2010 Mailbox

 
 

Error: Your Requested widget " wp-github-commits-6" is not in the widget list.
  • [do_widget_area sidebar-1]
    • [do_widget_area sidebar-2]
      • [do_widget id="tag_cloud-2"]
    • [do_widget_area widgets_for_shortcodes]
      • [do_widget id="wp-github-commits-14"]
      • [do_widget id="wp-github-commits-17"]
      • [do_widget id="wp-github-commits-15"]
      • [do_widget id="wp-github-commits-25"]
      • [do_widget id="wp-github-commits-3"]
      • [do_widget id="wp-github-commits-13"]
      • [do_widget id="wp-github-commits-20"]
      • [do_widget id="wp-github-commits-16"]
      • [do_widget id="wp-github-commits-2"]
      • [do_widget id="wp-github-commits-24"]
      • [do_widget id="wp-github-commits-23"]
      • [do_widget id="wp-github-commits-21"]
      • [do_widget id="wp-github-commits-5"]
      • [do_widget id="wp-github-commits-6"]
      • [do_widget id="wp-github-commits-26"]
      • [do_widget id="wp-github-commits-9"]
      • [do_widget id="wp-github-commits-22"]
      • [do_widget id="wp-github-commits-19"]
      • [do_widget id="wp-github-commits-8"]
      • [do_widget id="wp-github-commits-7"]
      • [do_widget id="wp-github-commits-18"]
      • [do_widget id="wp-github-commits-11"]
      • [do_widget id="wp-github-commits-4"]
      • [do_widget id="wp-github-commits-12"]
      • [do_widget id="wp-github-commits-27"]
    • [do_widget_area wp_inactive_widgets]
      • [do_widget id="recent-posts-2"]
      • [do_widget id="recent-comments-2"]
     

    Introduction

    I based this scipt on the MS Exchange 2010 DAG Mailbox check by Matt Haynes, but made the following changes:

    • Recovery mailbox databases are excluded. This way we don’t get any false alerts when colleagues are restoring a backup to a temporary recovery database.
    • For MS Exchange, backups are crucial, as your logs will grow very fast when they don’t get backupped. Therefore a warning is generated if the last backup date is over 26 hours and an error is generated if it was over 50 hours. (these time periods can be altered, depending on the total time your backup software requires to backup all your mailbox databases)
    • Performance data is gathered from the amount of mounted, healthy, and unhealthy mailbox databases.
    • If no mailbox databases are found on the Exchange server, an error is also generated.

    I would like to thank my colleague, De Clerck John, who created the exclusion for the recovery mailbox database and the check for last backup. His knowledge of MS Exchange and Powershell seems to be unlimited. As previously said, the script will count all healthy, failed and mounted mailbox databases and output performance data. This way, you can easily see when exactly something went wrong with your mailbox databases:

    check-ms-exchange-2010-health-graph-01

    In this screenshot it appears we had an issue on 28 April. In fact our Exchange administrators are busy migrating the mailbox databases to different MS Exchange servers, which is also clear in the next days of the graph.

    exchange

    How to monitor your MS Exchange databases?

    1. Copy the ‘check_ms_exchange_2010_health.ps1’ Powershell script to the NSClient++ scripts folder, preferably in a sub-folder ‘Powershell’ on all the Exchange 2010 servers you wish to monitor.
    2. In the nsclient.ini configuration file, define the external command like this (and restart the NSClient++ service (nscp) afterwards):
    3. Make a command in Nagios like this:
    4. Configure your service in Nagios. Use the command you previously made. Argument 1 should be the hostname of the MS Exchange server

    Grtz Willem

    Unified Monitoring Framework

    Download Project

    You can find the latest official release of this project on GitHub.


    Error: Your Requested widget " wp-github-commits-12" is not in the widget list.
    • [do_widget_area sidebar-1]
      • [do_widget_area sidebar-2]
        • [do_widget id="tag_cloud-2"]
      • [do_widget_area widgets_for_shortcodes]
        • [do_widget id="wp-github-commits-14"]
        • [do_widget id="wp-github-commits-17"]
        • [do_widget id="wp-github-commits-15"]
        • [do_widget id="wp-github-commits-25"]
        • [do_widget id="wp-github-commits-3"]
        • [do_widget id="wp-github-commits-13"]
        • [do_widget id="wp-github-commits-20"]
        • [do_widget id="wp-github-commits-16"]
        • [do_widget id="wp-github-commits-2"]
        • [do_widget id="wp-github-commits-24"]
        • [do_widget id="wp-github-commits-23"]
        • [do_widget id="wp-github-commits-21"]
        • [do_widget id="wp-github-commits-5"]
        • [do_widget id="wp-github-commits-6"]
        • [do_widget id="wp-github-commits-26"]
        • [do_widget id="wp-github-commits-9"]
        • [do_widget id="wp-github-commits-22"]
        • [do_widget id="wp-github-commits-19"]
        • [do_widget id="wp-github-commits-8"]
        • [do_widget id="wp-github-commits-7"]
        • [do_widget id="wp-github-commits-18"]
        • [do_widget id="wp-github-commits-11"]
        • [do_widget id="wp-github-commits-4"]
        • [do_widget id="wp-github-commits-12"]
        • [do_widget id="wp-github-commits-27"]
      • [do_widget_area wp_inactive_widgets]
        • [do_widget id="recent-posts-2"]
        • [do_widget id="recent-comments-2"]

      Unified Monitoring Framework (UMF) – Introduction

      I wanted to make the ‘Nagios XI Downtime Framework‘ (NDF) better for quite some time, but was limited by the forms generated by an old community edition of Powershell Studio. As Sapien Technologies asks 389 $ for their Powershell Studio 2014, and is no longer distributing their Powershell Studio Community Edition, I started looking for an alternative. And it seems I did not have to search for a long time, as Microsoft enabled users to use their Community Edition of Visual Studio 2013 since beginning of 2014.

      Visual Studio means I can use XAML, which provides a way to write a GUI. XAML, which stands for ‘Extensible Application Markup Language’, is a declarative XML-based language used extensively in .NET Framework 3.0 and 4.0 technologies. The Powershell ability to load XAML into variables, made it the perfect combo to make rich and beautifull gui’s.

      So I started a project from scratch somewhere in November 2014, named ‘Windows Monitoring Framework’.  As I recently realized Microsoft would probably not be very happy if I released a project with ‘Windows’ in it, I decided today to change it’s name to ‘Unified Monitoring Framework’, shortnamed UMF. It’s a perfect name as the plan is to show outputs and issues gathered from multiple sources, such as Nagios, logfiles, backup reports, custom Powershell scripts etc.

      Important note is that my ‘Unified Monitoring Framework’ will only work on Windows boxes with Powershell v4 or later installed.. So on servers where you cannot yet install Powershell v4 or later, you will still have to use Nagios Downtime Framework to enable logged in users to schedule downtime.

      Unified Monitoring Framework (UMF) – Features

      So what’s better in UMF compared to my old ‘Nagios Downtime Framework’ (NDF)? How will it help you to improve your monitoring environment?

      – It is possible to specify the amount of days, hours or minutes you would like the selected hosts to go into downtime. NDF was limited, because it only gave you a few possible duration options.

      – It does not make use of a local configuration file stored, but will use a json query to the Nagios server to get all members of a hostgroup. This does require you to make a hostgroup for each host which has related hosts on it, but at least all the configs are in Nagios and do not have to be saved locally on all your servers. The hostgroup name needs to follow certain naming convention in order to work properly. The form is “hg_relations_<servername>”. So if your host would be named srvwindows01, the hostgroup containing the related host should be named “hg_relations_srvwindows01”. All members of this hostgroup will be detected and shown in the host relations listbox. Feel free to customize the Powershell code as you wish to reflect the naming conventions of your Nagios setup.

      – Besides quering the hostgroup that will contain all hosts related to the Windows host on which WMF is executed, it will also query the host itself and list all the hostgroups it is a member of. This seemed like an important feature, as at the moment I’m quite sure several hosts are not in the correct hostgroups. Showing this information when users enable downtime, should result in a more consistent and  less error prone host configuration. It is my hope that in time I find the time to exxpand the tool to enable users to edit the hostgroups from the tool, but this is a rather long term project.

      – Besides enabling users to start downtime, the tool will also be able to reset the state of passive services. As we send all critical application and system events to Nagios, thanks to the real-time eventlog monitoring capabilities of NSClient++, this is also an important feature for us. As setting downtime does not prevent events from coming into their respective services, we have multiple support incidents for critical events, which were in fact generated during maintenance windows. Every Windows host in my Nagios setup has one EVT_Application and one EVT_System service, which will receive all errors of their respective Windows eventlogs (minus excluded ones). These services will be reset to an OK state at the moment the “Reset Passive Services” button is pushed.

      – The System tab started from a tab with Powershell queried information to an integrated webbrowser xaml component making use of the backend api from Nagios XI to show the information of the host where UMF is launched.

      – The config tab enables you to see the Nagios and UMF related settings.

      UMF – Requirements and how to?

      In order to work several requirements have to be met.

      1. Nagios Core 4 is required, as UMF uses Json to query all the required information. Read through this article for more information about the new Json explorer in Nagios Core 4.
      2. Nagios Remote Data Processor (NRDP) is required, as UMF uses the NRDP token to start certain tasks, such as starting downtime. Please read through this document about configuring inbound checks in Nagios XI
      3. Powershell v4 or later is required, as some commands in the code will not work on Powershell v3 or lower. You can use Powershell v4 by installing Windows Management Framework 4.0.
      4. A Nagios XI read-only user is required with adequate permissions.
      5. Backend API token for this read-only user is required to enable passwordless login on the System tab.

      You will need a way to distribute and update UMF to your Windows Servers. I will release a script to do this soon, just need some time to clean it up.

      UMF_Tab_Config_01

      All the UMF settings are saved in umf_options.xml which needs to be in the same folder as the umf main script. Please start the script on one Windows Server, go to the configuration tab and fill inn all the required fields. Alternatively you can also open the XML containing the settings and edit it in your favorite XML editor.

      <Objs Version=”1.1.0.1″ xmlns=”http://schemas.microsoft.com/powershell/2004/04″>
      <Obj RefId=”0″>
      <TN RefId=”0″>
      <T>System.Collections.Hashtable</T>
      <T>System.Object</T>
      </TN>
      <DCT>
      <En>
      <S N=”Key”>MaxJobs</S>
      <S N=”Value”>4</S>
      </En>
      <En>
      <S N=”Key”>BackendApiToken</S>
      <Nil N=”Value” />
      </En>
      <En>
      <S N=”Key”>ReadOnlyUser</S>
      <S N=”Value”>unknown-read-only-user</S>
      </En>
      <En>
      <S N=”Key”>ReadOnlyPw</S>
      <S N=”Value”>unknown-read-only-password</S>
      </En>
      <En>
      <S N=”Key”>ReportPath</S>
      <S N=”Value”>C:UsersWillemDesktopTest</S>
      </En>
      <En>
      <S N=”Key”>NrdpToken</S>
      <S N=”Value”>unknown-nrdp-token</S>
      </En>
      <En>
      <S N=”Key”>NagiosServer</S>
      <S N=”Value”>unknown-nagios-server</S>
      </En>
      </DCT>
      </Obj>
      </Objs>

      After hitting the ‘Save Configuration’ button in the UMF gui, all configuration fields are saved to the XML file.

      More screenshots:

      UMF_Tab_System_02

       

      UMF_Tab_Nagios_01

      I hope you can make it work with the above information. I will try to add info when I find some time. As you can see in the later version, I added a recent backups listview. We are using Netbackup, so the script will read OpsCenter xml files with backup information ad fill up the listview with it. If you take the time to analyze the script, you should be able to do the same for you backup solution.

      Willem