Monitoring Windows Scheduled Tasks

Introduction

Tasks scheduler is a Microsoft Windows component that allows you to schedule programs or scripts to start at pre-defined intervals. There are two major versions of the task scheduler: In version 1.0, definitions and schedules are stored in binary .job files. Every task corresponds to a single action. This plugin will not work on version 1.0 of the task scheduler, which is running on Windows Server 2000 and 2003. In version 2.0, the Windows task scheduler got a redesigned user interface based on Management console. Version 2.0 also supports calendar and event-based triggers, such as starting a task when a particular event is logged to the event log, or when a combination of events has occurred. Also, several tasks that are triggered by the same event can be configured to run either simultaneously or in a pre-determined chained sequence of a series of actions.

Tasks can also be configured to run based on system status such as being idle for a pre-configured amount of time, on startup, logoff, or only during or for a specified time. Other new features are a credential manager to store passwords so they cannot be retrieved easily. Also, scheduled tasks are executed in their own session, instead of the same session as system services or the current user. You can find a list of all task scheduler 2.0 interfaces here.

Requirements

Starting from Windows Powershell 4.0, you can use a whole range of Powershell cmdlets to manage your scheduled tasks with Powershell. This plugin for Nagios does not use these cmdlets, as it has to be Powershell 2.0 compatible. Maybe in a few years, when Powershell 2.0 becomes obsolete, I’ll patch the script to make use of the new cmdlets. You can find the complete list of cmdlets here. Failing tasks will always end with some sort of error code. You can find the complete list of error codes here. This plugin will output the exitcodes for failing tasks in the Nagios service description. Output will also notify you on tasks that are still running. We have multiple Windows servers at work with a growing amount of scheduled tasks and each scheduled task needs to be monitored. With the help of Nagios and this plugin you can find out:

  • How many are running at the same time?
  • How many are failing?
  • How long are they running?
  • Who created them?

Versions

Disabled scheduled tasks are excluded by default from 3.14.12.06. In earlier versions, you had to manually exclude them by excluding them with -EF or -ET. It seemed like a logical decision to exclude disabled tasks by default and was suggested by someone on the Nagios Exchange reviewing the plugin.. Maybe one day I’ll make a switch to include them again if specified. As some scheduled tasks do not need to be monitored, the script enables you to exclude complete folders.

Since v5.13.160614 it is possible to include hidden tasks. Just add the ‘–Hidden 1’ switch to your parameters and your hidden tasks will be monitored.

One of the folders I tend to exclude almost all the time is the “Microsoft” folder. It seems like several tasks in the Microsoft folder tend to fail sometimes. So unless you absolutely need to know the state of every single scheduled task running on your Windows Server, I can advise you to exclude it too. You can find the folder and tasks in this locations: C:\Windows\System32\Tasks
It is possible to include tasks or task folders with the ‘–InclFolders’ and ‘–InclTasks’ parameters. This filter will get applied after the exclude parameter. Please note that including a folder is not recursive. Only tasks in the root of the folder will be included.

Help

This is the help of the plugin, which lists all valid parameters:

You could put every scheduled task  you don’t want to monitor in a separate  folder and exclude it with the -EF parameter. Alternatvely, you can use the -ET parameter to exclude based on name patterns. One quite important thing to know is that in order to exclude or include the root folder, you need to escape the backslash, like this: “\\”.

How to monitor your scheduled tasks?

  1. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell.
  2. In the nsclient.ini configuration file, define the script like this:

    For more information about external scripts configuration, please review the NSClient documentation. You can also consider defining a wrapped script in nsclient.ini to simplify configuration.

  3. Make a command in Nagios like this:

  4. Configure your service in Nagios. Make use of the above created command. Configure something similar like this as $ARG1$:

Some things to consider to make it work:

  • “set-exectionpolicy remotesigned”
  • Nscp service account permissions => Running with local system should suffice, but I had users telling me it only worked with a local admin. I found out that on some NSClient++ versions, more specific version 0.4.3.88 and probably some earlier versions too, the following error occured when running nscp service as local system: “CHECK_NRPE: Invalid packet type received from server”. After filing an issue on the GitHub project page of NSClient++, Michael Medin quickly acknowledged the issue and solved it from version 0.4.3.102, so the plugin should work again as local system.

Examples

If you would run the script in cli from you Nagios plugin folder, this would be the command:

If you would want to exclude one noisy unimportant scheduled task, the command used in cli would look like this:

If you only want the scheduled tasks in the root to be monitored, you can use this command:

This would only give you the scheduled tasks available in the root folder. The output look like this now.

Final Words

It seems the perfdata in the Highcharts graphs sometimes contains decimal numbers (see screenshot), which is kind of strange as I’m sure I only pass rounded numbers. Seems this is related to the way RRD files are working. To reduce the amount of storage space used, NPCD and RRD while average out the data, resulting in decimals, even when you don’t expect them.

This is a small to do list:

  • Add switches to change returned values and output.
  • Add array parameter with exit codes that should be excluded.
  • Test remote execution. In some cases it might be useful to be able to check remotely for failed windows tasks.
  • Include a warning / critical threshold when discovered tasks exceed a certain duration.
  • I was hoping to add some more exit codes to check, which would make failed tasks easier to troubleshoot. You can find the list of scheduled task exit codes here. The constants that begin with SCHED_S_ are success constants, and the constants that begin with SCHED_E_ are error constants.

Screenshots:

These are some screenshots of the Nagios XI Graph Explorer for two of our servers making use of the plugin to monitor scheduled tasks: Tasks 01 check_ms_win_tasks_graph_02 Let me know on the Nagios Exchange what you think of my plugin by rating it or submitting a review. Please also consider starring the project on GitHub.

Willem

Monitoring Microsoft Windows Updates

Introduction

Monitoring WSUS updates on Microsoft Windows Server is critical to ensure you get alerted when your systems need to be patched. The process to update Windows Updates on high priority servers implies proper planning to ensure no post-installation problems. If we could trust Microsoft patches for 100 %, installing WSUS updates on a system would be done the moment a maintenance schedule could be created for this system. Unfortunately in my personal experience, WSUS updates are more a cause of problems instead of a solution. That’s why we prefer to not install them too fast, as you might experience major issues with your production systems or with the software that is running on it. A recent example, a colleague accidentally patched some production SharePoint servers, which prohibited the creation of new sitecollections and caused issues with some icons. The only solution was to restore a backup…

Ideally the updates would first need to get tested on QA systems. If the QA servers are running for some times without issues, the production systems can get patched. The above is one of the reasons I spent some time combining the best features from the available Windows Update plugins on the Nagios Exchange.
Such as Christian Kaufmann’s idea to cache the list of Windows Updates into a file. This results in a much lower performance impact of the plugin on the servers you are monitoring. If you have any experience with WSUS updates, you will have noticed that the ‘TrustedInstaller.exe” process which is a MS Windows system process that takes care of querying the WSUS server and installing updates if requested. 

The plugin will count all available WSUS updates and output the count in every possible state. However it will only alert in case a set number of days have passed since the last successful update was installed. By using this method, you can then define a policy and agree to patch all systems which had no updates for a certain time. You could use different policies for QA and PR (production) systems to prevent problems. 

WSUS

 

Details

Some things you need to know about Windows Updates. Microsoft saves the date of the ‘last successful update’ in the registry. The location of the String Value is:

This date however is saved in the Greenwich Mean Time (GMT) or the Coordinated Universal Time (UTC) format. My plugin will try to translate this time to the local time format with the help of a function called Get-LocalTime. This function uses the [System.TimeZoneInfo] .NET class which is only usable if you have .NET 3.5 or higher. So keep in mind the ‘Last Successful Update’ date is in UTC format for servers where .NET 3.5 or higher is not installed.

The plugin will also check this registry key:

And give a warning if the system has a required reboot pending.

PSWindowsUpdate

Starting from Windows 10, Microsoft apparently decided to no longer make use of the above registry key. The only way I found to retrieve the last successful update date and time is with the help of the PSWindowsUpdate module. So I added another argument which allows you to select a different method named ‘PSWindowsUpdate’ to retrieve the necessary information. Please not that the default method is still the original method, I called ‘UpdateSearcher”

In order for this method to work, you will need to install the PSWindowsUpdate module in this location: C:\Windows\System32\WindowsPowerShell\v1.0\Modules. If you are using Powershell 5 you can just do:

I’ve included the 1.5.1.11 and 1.5.2 version of the module in the GitHub repository. Or you can download it on the Microsoft Script Center Repository.

How to monitor your WSUS updates?

  1. Please note that the default DaysBeforeWarning and DaysBeforeCritical parameters are set to 120 and 150. Feel free to adjust them as required or pass them as an argument.
  2. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell.
  3. In the nsclient.ini configuration file, define the script like this:
  4. Make a command in Nagios like this:
  5. Configure your service in Nagios. Make use of the above created command. Configure something similar like this as $ARG1$:
    QA servers =>

    PR servers =>

  6. If you want to make use of the new ‘PSWindowsUpdate’ method you will need to have an argument like this:

(Almost) Final words

So why did I create another pluging to check WSUS updates? Because I’m using a system which completely automates Windows Update installation with the help of Nagios XI and Rundeck. The existing plugins did not meet my requirements.

Please note that there are several known issues with WSUS on some operating systems. It’s recommended to always update to the latest ‘Windows Update Client’. Please check Windows 8.1 and Windows Server 2012 R2 update history for more information. More specific, when using WIndows Server 2012 R2, you will really want the following KB’s:

  • KB3172614 => “July 2016 update rollup for Windows 8.1 and Windows Server 2012 R2”
  • KB3179574 => “August 2016 update rollup for Windows 8.1 and Windows Server 2012 R2”
  • KB3185279 => “September 2016 update rollup for Windows 8.1 and Windows Server 2012 R2”

When you don’t have these update rollup’s, checking  for updates and updating your Windows 2012 R2 systems could go very slow. In our case an update check could take up to 40 minutes instead of 10 seconds. 

Let me know on the Nagios Exchange what you think of my plugin by rating it or submitting a review. Please also consider starring the project on GitHub.

Monitoring MS SharePoint Health

 

 


Error: Your Requested widget " wp-github-commits-19" is not in the widget list.
  • [do_widget_area sidebar-1]
    • [do_widget_area sidebar-2]
      • [do_widget id="tag_cloud-2"]
    • [do_widget_area widgets_for_shortcodes]
      • [do_widget id="wp-github-commits-14"]
      • [do_widget id="wp-github-commits-17"]
      • [do_widget id="wp-github-commits-15"]
      • [do_widget id="wp-github-commits-25"]
      • [do_widget id="wp-github-commits-3"]
      • [do_widget id="wp-github-commits-13"]
      • [do_widget id="wp-github-commits-20"]
      • [do_widget id="wp-github-commits-16"]
      • [do_widget id="wp-github-commits-2"]
      • [do_widget id="wp-github-commits-24"]
      • [do_widget id="wp-github-commits-23"]
      • [do_widget id="wp-github-commits-21"]
      • [do_widget id="wp-github-commits-5"]
      • [do_widget id="wp-github-commits-6"]
      • [do_widget id="wp-github-commits-26"]
      • [do_widget id="wp-github-commits-9"]
      • [do_widget id="wp-github-commits-22"]
      • [do_widget id="wp-github-commits-19"]
      • [do_widget id="wp-github-commits-8"]
      • [do_widget id="wp-github-commits-7"]
      • [do_widget id="wp-github-commits-18"]
      • [do_widget id="wp-github-commits-11"]
      • [do_widget id="wp-github-commits-4"]
      • [do_widget id="wp-github-commits-12"]
      • [do_widget id="wp-github-commits-27"]
    • [do_widget_area wp_inactive_widgets]
      • [do_widget id="recent-posts-2"]
      • [do_widget id="recent-comments-2"]

    Introduction

    SharePoint is a web application platform in the Microsoft Office server suite. Launched in 2001, SharePoint combines various functions which are traditionally separate applications: intranet, extranet, content management, document management, personal cloud, enterprise social networking, enterprise search, business intelligence, workflow management, web content management, and an enterprise application store.
    SharePoint Health Analyzer is a feature in Microsoft SharePoint Foundation 2010 that enables administrators to schedule regular, automatic checks for potential configuration, performance, and usage problems in the server farm. Any errors that SharePoint Health Analyzer finds are identified in status reports that are made available to farm administrators in Central Administration. Status reports explain each issue, list the servers where the problem exists, and outline the steps that an administrator can take to remedy the problem.
    SharePoint Health Analyzer monitors the farm by applying a set of health rules. A number of these rules ship with SharePoint Foundation. You can create and deploy additional rules by writing code that uses the SharePoint Foundation object model. When a health rule executes, SharePoint Health Analyzer creates a status report and adds it to the Health Analyzer Reports list in the Monitoring section of Central Administration.
    This plugin will create a PSObject for each item in the status report that has no ‘Success’ severity and return a critical state if any problems are found, together with information about each problem in the status report, such as the failed service and date modified. 

    SharePoint_2010

    How to use check_ms_sharepoint_health?

    1. Put the script in the NSClient++ scripts folder, preferably in a subfolder Powershell, enabling you to use this Reactor action to update your plugins folder without having to edit the script.
    2. In the nsclient.ini configuration file, define the script like this:
      check_ms_sharepoint health=cmd /c echo scripts/powershell/check_ms_sharepoint_health.ps1; exit $LastExitCode | powershell.exe -command –
    3. Make a command in Nagios like this:
      check_ms_sharepoint_health => $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -t 60 -c check_ms_sharepoint_health 
    4. Configure your service in Nagios, make use of the above created command. 

    Monitoring Microsoft Failover Cluster Preferred Node

     

     


    Error: Your Requested widget " wp-github-commits-5" is not in the widget list.
    • [do_widget_area sidebar-1]
      • [do_widget_area sidebar-2]
        • [do_widget id="tag_cloud-2"]
      • [do_widget_area widgets_for_shortcodes]
        • [do_widget id="wp-github-commits-14"]
        • [do_widget id="wp-github-commits-17"]
        • [do_widget id="wp-github-commits-15"]
        • [do_widget id="wp-github-commits-25"]
        • [do_widget id="wp-github-commits-3"]
        • [do_widget id="wp-github-commits-13"]
        • [do_widget id="wp-github-commits-20"]
        • [do_widget id="wp-github-commits-16"]
        • [do_widget id="wp-github-commits-2"]
        • [do_widget id="wp-github-commits-24"]
        • [do_widget id="wp-github-commits-23"]
        • [do_widget id="wp-github-commits-21"]
        • [do_widget id="wp-github-commits-5"]
        • [do_widget id="wp-github-commits-6"]
        • [do_widget id="wp-github-commits-26"]
        • [do_widget id="wp-github-commits-9"]
        • [do_widget id="wp-github-commits-22"]
        • [do_widget id="wp-github-commits-19"]
        • [do_widget id="wp-github-commits-8"]
        • [do_widget id="wp-github-commits-7"]
        • [do_widget id="wp-github-commits-18"]
        • [do_widget id="wp-github-commits-11"]
        • [do_widget id="wp-github-commits-4"]
        • [do_widget id="wp-github-commits-12"]
        • [do_widget id="wp-github-commits-27"]
      • [do_widget_area wp_inactive_widgets]
        • [do_widget id="recent-posts-2"]
        • [do_widget id="recent-comments-2"]

      Introduction

      Clustering is a very important technology to ensure application availability and performance. Accurate monitoring of your clusters is crucial to keep your applications stable. Not monitoring is not an option, as you might never know when a failover has taken place and waste valuable time looking in the wrong places for solutions to your problems. There are different options and levels of monitoring you can choose for monitoring Microsoft Windows failover clusters.

      Preferred Node Option

      One of the easiest is checking each cluster service if it is still running on it’s preferred node. The plugin from Nedstars (link) to check if MS cluster services are running on their preferred node only works for Windows 2008 R2 or later versions. Apart from that I noticed that there seemed to be a bug in Nedstars script. If a MS cluster contained more then one cluster service, it seemed to only check one of them. I did not investigate this further, so forgive me if I’m wrong here. As we are still using multiple Windows 2003 failover clusters,  I decided to write a completely new plugin that uses WMI to get information about failover cluster services on Microsoft Windows 2003 failover clusters. Windows Server 2008 R2 Failover Cluster Preferred Node: preferred node on 2008 Windows Server 2003 R2 Failover Cluster Preferred Node: check_ms_cluster_preferrede_node_2003R2 The plugin starts by checking the version of the Windows server where it is running on with WMI. If the version is lesser then 6.1 (which is the version number of Windows Server 2008 R2), the script will continue to use WMI to get all cluster services and then check each cluster service individually to see where it is running and compare that with it’s preferred node. If the OS version is not lesser then 6.1, the plugin will import the module ‘failoverclusters’ and start using module commands instead of WMI. I did not check if the script works on a cluster with more then two nodes, as we don’t have clusters with three or more nodes. If anyone could test this for me and let me know… 

      How to use the check_ms_cluster_preferred_node?

      1. Copy the ‘check_ms_cluster_preferred_node.ps1’ Powershell script to the NSClient++ scripts folder, preferably in a sub-folder ‘Powershell’ on all the failover cluster nodes you wish to monitor.
      2. In the nsclient.ini configuration file, define the external command like this (and restart the NSClient++ service (nscp) afterwards):
      3. Make a command in Nagios like this:
      4. Configure your service in Nagios. Use the command you previously made. A healthy check in Nagios XI would look like this:check_ms_cluster_preferred_node

      Other Microsoft Failover Cluster monitoring options?

      So is monitoring the preferred node the best way to monitor clusters. Definitely not. A cluster might have no or multiple preferred nodes for some reason. If you happen to own some MS clusters for critical applications, you will most likely want to be alerted for more issues then only a cluster service failover. I’ve seen multiple clusters with issues that did not consist of a failover. Instead one or more cluster service just went offline and online on the same node. In order to monitor this, I wrote another Powershell script that checks for certain event id’s in the Windows application eventlog and alerts when these events contain information about failing cluster services.