Clock Drift Kills…

Well … not kills but many systems and applications go crazy when the system clock drifts too far.

In this post we’ll focus on adding a new service check or plugin to the op5 Monitor software.  If you’re using Nagios, this will work for you as well but you’ll have to make all the changes in the txt files by hand … sorry.

Be sure to read the op5 walk-through of writing your own plugin for more.

Let’s first talk about time .. NTP is used by many sites and seems to work well.  The issue is that if the clock becomes WAY out of wack, it will not sync the clock (this is by design).  This means if your clock jumps 1 day off (as an example), it will never get sync’ed by NTP without someone stepping in.

One way we can combat this and proactively check the drift is by installing the check_time plugin.  I’ve used Nagios for many years (and love it) but you’ll see below how much easier it is (in a few mouse clicks) to add a new check and automagically graph  the perf data!

Be sure to install and run NTP on your op5 server.

There’s a time service installed (but not usually activated) on *nix boxes called daytime.  Check to see if yours is running by issuing the command:

(BTW you’ll want to install this service on the nodes you’re checking.  We set ours up on the op5 server / localhost as an example)

[cce][[email protected] op5_custom_plugins]# chkconfig –list | grep -i time

time-stream: off
daytime-dgram: off
daytime-stream: off
time-dgram: off[/cc]

You can turn these daytime ones on with the following commands:
[cce][[email protected] op5_custom_plugins]# chkconfig daytime-dgram on
[[email protected] op5_custom_plugins]# chkconfig daytime-stream on[/cc]

Once that is done, you should be able to telnet to localhost port 13

[cce][[email protected] op5_custom_plugins]# telnet localhost 13
Trying 127.0.0.1…
Connected to localhost.localdomain (127.0.0.1).
Escape character is ‘^]’.
02 DEC 2009 09:50:32 MST
Connection closed by foreign host.
[[email protected] op5_custom_plugins]#[/cc]

Now that we have the time service up and running, let’s move onto the check program….

There are 1000s of plugins avail to download and use for free.  We’ll be using the check_time plugin.  Go find and download this onto your op5 Monitor server:

http://www.nagioswiki.org/wiki/Plugin:check_time

http://www.nagioswiki.org/wiki/Plugin:check_time

Install the perl program in a safe place.  We picked:

[cce]/opt/local/tpt/op5_custom_plugins/check_time.pl[/cc]

As with any program you download from the net … be sure to look it over before running it on your prod boxes!

Run the program by hand to see if you have everything you need

[cce][[email protected] op5_custom_plugins]# ./check_time.pl -H localhost -Pdaytime -w 5 -c 60 -f
TIME OK – localhost time is Wed Dec  2 10:01:07 2009 (0 seconds variance) | time=1259773267 diff=0[/cc]

You might have to install a few perl modules to get this rolling.  In our case, we had to install the TimeDate:

[cce][[email protected] op5_custom_plugins]# yum install perl-TimeDate[/cc]

As with most (well written) plugins, there’s a useful help when run with -h

[cce][[email protected] op5_custom_plugins]# ./check_date.pl -h

check_daytime plugin for nagios version 0.2
by William Leibzon – william(at)leibzon.org

Usage: ./check_time.pl -v | -h | -P daytime|daytime-udp|time|time-udp [-p <port>] -H <hostname> [-w <warning variance>] [-c <critical variance>] [-t <timeout seconds>] [-o str|usec] [-f]

<…snipped…>[/cc]

With the check command and time service working, let’s setup op5 now.  Click on ‘Configure’ and then ‘Commands’

op5-plugin-001

Now enter in the following information info the correct lines:

  • command_name: check_time (you can pick any name here you like)
  • command_line: /opt/local/tpt/op5_custom_plugins/check_time.pl -H $HOSTADDRESS$ -Pdaytime -w $ARG1$ -c $ARG2$ -f
  • FILE: etc/checkcommand.cfg (DEFAULT DONT CHANGE)

op5-plugin-002

  • $HOSTADDRESS$ will get replaced with the address of the device you are checking.
  • -p specifies the protocol to use.  In this case, daytime.
  • -w number of seconds we will WARN if time drifts beyond (1st passed argument)
  • -c number of seconds we will CRITICAL if time drifts beyond (2nd passed argument)
  • -f spits out some performance numbers that op5 will auto graph on (nice!)

Once you hit ‘Apply Changes’ we can go into a host and add this new check as a new service.

If not already there, click ‘Configure’ -> Select host and then click Go.  Top right click ‘Services for host XYZ’

Here’s where we’ll add the new service.

op5-plugin-003

  • service_description: We picked Check Clock Time (this appears in the Service Details section)
  • check_command: here’s where we want to select the new check command we created above
  • check_command_args: This specifies $ARG1$ (no spaces) !  (no spaces) $ARG2$ that we want to pass.  5!60
  • Once you have this click on ‘Test this service’ to see if you got it right

Click ‘Apply Changes’ when you are finished.

op5-plugin-004