Clock Drift Kills…

Well ... not kills but many systems and applications go crazy when the system clock drifts too far.

In this post we'll focus on adding a new service check or plugin to the op5 Monitor software.  If you're using Nagios, this will work for you as well but you'll have to make all the changes in the txt files by hand ... sorry.

Be sure to read the op5 walk-through of writing your own plugin for more.

Let's first talk about time .. NTP is used by many sites and seems to work well.  The issue is that if the clock becomes WAY out of wack, it will not sync the clock (this is by design).  This means if your clock jumps 1 day off (as an example), it will never get sync'ed by NTP without someone stepping in.

One way we can combat this and proactively check the drift is by installing the check_time plugin.  I've used Nagios for many years (and love it) but you'll see below how much easier it is (in a few mouse clicks) to add a new check and automagically graph  the perf data!

Be sure to install and run NTP on your op5 server.

There's a time service installed (but not usually activated) on *nix boxes called daytime.  Check to see if yours is running by issuing the command:

(BTW you'll want to install this service on the nodes you're checking.  We set ours up on the op5 server / localhost as an example)

[root@op5 op5_custom_plugins]# chkconfig --list | grep -i time

time-stream:    off
daytime-dgram:  off
daytime-stream: off
time-dgram:     off

You can turn these daytime ones on with the following commands:

[root@op5 op5_custom_plugins]# chkconfig daytime-dgram on
[root@op5 op5_custom_plugins]# chkconfig daytime-stream on

Once that is done, you should be able to telnet to localhost port 13

[root@op5 op5_custom_plugins]# telnet localhost 13
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
02 DEC 2009 09:50:32 MST
Connection closed by foreign host.
[root@op5 op5_custom_plugins]#

Now that we have the time service up and running, let's move onto the check program....

There are 1000s of plugins avail to download and use for free.  We'll be using the check_time plugin.  Go find and download this onto your op5 Monitor server:

http://www.nagioswiki.org/wiki/Plugin:check_time

http://www.nagioswiki.org/wiki/Plugin:check_time

Install the perl program in a safe place.  We picked:

/opt/local/tpt/op5_custom_plugins/check_time.pl

As with any program you download from the net ... be sure to look it over before running it on your prod boxes!

Run the program by hand to see if you have everything you need

[root@op5 op5_custom_plugins]# ./check_time.pl -H localhost -Pdaytime -w 5 -c 60 -f
TIME OK - localhost time is Wed Dec  2 10:01:07 2009 (0 seconds variance) | time=1259773267 diff=0

You might have to install a few perl modules to get this rolling.  In our case, we had to install the TimeDate:

[root@op5 op5_custom_plugins]# yum install perl-TimeDate

As with most (well written) plugins, there's a useful help when run with -h

[root@op5 op5_custom_plugins]# ./check_date.pl -h

check_daytime plugin for nagios version 0.2
by William Leibzon - william(at)leibzon.org

Usage: ./check_time.pl -v | -h | -P daytime|daytime-udp|time|time-udp [-p <port>] -H <hostname> [-w <warning variance>] [-c <critical variance>] [-t <timeout seconds>] [-o str|usec] [-f]

<...snipped...>

With the check command and time service working, let's setup op5 now.  Click on 'Configure' and then 'Commands'

op5-plugin-001

Now enter in the following information info the correct lines:

  • command_name: check_time (you can pick any name here you like)
  • command_line: /opt/local/tpt/op5_custom_plugins/check_time.pl -H $HOSTADDRESS$ -Pdaytime -w $ARG1$ -c $ARG2$ -f
  • FILE: etc/checkcommand.cfg (DEFAULT DONT CHANGE)

op5-plugin-002

  • $HOSTADDRESS$ will get replaced with the address of the device you are checking.
  • -p specifies the protocol to use.  In this case, daytime.
  • -w number of seconds we will WARN if time drifts beyond (1st passed argument)
  • -c number of seconds we will CRITICAL if time drifts beyond (2nd passed argument)
  • -f spits out some performance numbers that op5 will auto graph on (nice!)

Once you hit 'Apply Changes' we can go into a host and add this new check as a new service.

If not already there, click 'Configure' -> Select host and then click Go.  Top right click 'Services for host XYZ'

Here's where we'll add the new service.

op5-plugin-003

  • service_description: We picked Check Clock Time (this appears in the Service Details section)
  • check_command: here's where we want to select the new check command we created above
  • check_command_args: This specifies $ARG1$ (no spaces) !  (no spaces) $ARG2$ that we want to pass.  5!60
  • Once you have this click on 'Test this service' to see if you got it right

Click 'Apply Changes' when you are finished.

op5-plugin-004

Leave a Reply

Your email address will not be published.