Monitoring of ISP distribution network speed with CheckMK and iPerf

As end users, we really want to know we are getting the value we are promised from our Internet connection. It’s even more important for Internet Service Providers to know in near real time if one of their lines that feeds multiple customers is not performing to the speeds it should be. This was a challenge one of our clients brought to us. They had already figured out they wanted to place Raspberry PI SBCs throughout their distribution network and have them perform speed tests to a dedicated internal server running the iPerf3 software. They are using the recently released version 3.7 in order to do bidirectional testing of the connection. The challenge was how to get the results of these tests into CheckMK by Tribe29. iPerf3 logs the test results into a file in a JSON like format.

We took the challenge to figure out how to parse these logs and get them into CheckMK with variable thresholds. The variable thresholds were critical because some of these RPI nodes were on fiber to the home connections and others were on DSL connections. The solution was a three part one: 

  • a local plugin installed with the CheckMK agent on the internal iPerf server
  • a file that told CheckMK to look for the agent plugin’s output (inventory) and how to handle it (actual check)
  • a customizable threshold script file. 

We started by installing the agent on the iPerf server and got the local agent plugin properly parsing the iPerf3 server log file.

From there we built the inventory and check file. When the automatic service discovery runs, or a manual service discovery is done, any new test clients show up as undecided and any that are no longer present move to vanished. We chose not to enable automatic adding and removal of services on this speed test server because of additional requirements this client had.

After getting the iPerf3 test clients recognized and monitored, we focused on getting WATO to allow us to edit the sent and received speeds for each individual test client/service.

After all that was done, we created a simple service view in CheckMK for the client to be able to check quickly for a single pane of glass of how their speeds on their distribution network were doing.

We then worked with the client to create another more detailed Grafana dashboard that shows the test speeds, the ping latency to each test host, and a geomap of where each test client is and a colored circle corresponding to the ping latency. (Image zoomed out to show more of what the dashboard looks like)