[Disclaimer: This particular post is really for the technical readers of this blog. I’m just warning you. It’s a long one, and not for the faint of heart, or the people who are here for the witty things Ursula and I do. But I figured, this is something I’m doing that almost nobody else is doing, and I should share for others who are might find it useful.]
I work for a company named RadarFind (yes, the website needs love. No, this is NOT part of my job there). You can look at the particulars of what we do at the homepage. In general, what I do is manage customer devices at remote sites. One of these is the primary appliance, which is a Linux server. This server talks to a bunch of other devices that handle the actual tracking and data collection.
So I’ve got a TON of remote devices that I need to know, at any given time, if they are up or down. And if they go down, I need to know it. If they come back up, I need to know it. The original implementer used a shell script on each Linux box to poll the devices and send an alert if a device was up or down. However, this told me NOTHING about the site other than up or down information. What we need is a monitoring solution that can tell me not only the up/down state, but also alert me to same AND gather useful statistics about the site.
I love OpenNMS. It sets up easily, will poll a network range for devices, and start gathering statistics and data pretty much out of the box. HOWEVER, it’s really designed around the idea that it always has direct access to the things it’s polling. It does, however, include a remote poller – a package which I can run on a remote box to gather data about devices I DON’T have direct access to.
So, I have my main OpenNMS server. And each customer appliance has a OpenNMS Remote Poller that checks out the remote sites for me, and tells me if something’s wrong. Here’s how I set it up.
Many thanks to the OpenNMS community. If it weren’t for all the docs, the wiki, and the help on #opennms, it would have taken forever to get this together.
- You have direct access from the OpenNMS server to the remote server and vice-versa. I use a Point-to-Point VPN tunnel for this.
- You can use a text editor and have a passing familiarity with XML
- You have root on the remote boxes, *AND* you have remote X11 for them. This is also VERY important.
- You can read java strack traces and errors. It’s not that hard, but feel free to grab the nearest Java Programmer for help if you can’t.
The following Bugs in OpenNMS bugzilla (http://bugzilla.opennms.org) impact the data display of remotely polled nodes. Be aware of these, and add your comments to them!
HOW I DO IT
- After installing OpenNMS, get the OpenNMS Remote Poller package. Transfer that down to the remote site.
- On the OpenNMS server, go to /opt/opennms/etc
- create a directory called remote-include
- open up poller-configuration.xml
- Under the “default” package, add include-range tags for your local network. This is INSANELY important. THis makes sure we poll the things on our local network. It’ll look something like this:
<filter> IPADDR != '0.0.0.0'</filter>
<include-range begin="10.1.0.1" end="10.1.0.254"/>
<include-range begin="10.2.0.1" end="10.2.0.254"/>
and so on
- Also add exclude-range tags for the remote networks. Again, this is INSANELY IMPORTANT. This makes sure the server doesn’t try to poll the remote devices with the local machine.
<exclude-range begin="172.16.1.1" end="172.16.1.254"/>
<exclude-range begin="192.168.10.1" end="192.168.10.254"/>
and so on
- scroll down until you get to the “monitor” tags.
- above the monitor tags, we’re going to create a new polling package for each remote site we’re looking to monitor. It will look like this :
<package name="remotesite" remote="true">
<filter>((IPADDR IPLIKE 172.16.1.*) | (IPADDR IPLIKE 192.168.10.*))</filter>
<service name="SSH" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="1" />
<parameter key="banner" value="SSH" />
<parameter key="port" value="22" />
<parameter key="timeout" value="3000" />
<parameter key="rrd-repository" value="/opt/opennms/share/rrd/response" />
<parameter key="rrd-base-name" value="ssh" />
<parameter key="ds-name" value="ssh" />
<!-- downtime interval="30000" begin="0" end="300000" / --><!-- 30s, 0, 5m -->
<downtime interval="299999" begin="300000" end="43200000" /><!-- 5m, 5m, 12h -->
<downtime interval="600000" begin="43200000" end="432000000" /><!-- 10m, 12h, 5d -->
<downtime begin="432000000" delete="true" /><!-- anything after 5 days delete -->
So let me break it down really quick. The filter defines what IPs we’ll be polling. The include-url is the path to file with JUST the
IPs to poll. the rrd block defines the graph information for the services we’re polling. And the service block defines the services to check out.
An important note here. Do NOT put in the ICMP (i.e. Ping) service for remote sites. There’s a good discussion in the wiki about how and why this doesn’t work.
The downtime tags define the alerting status on a device being up or down. The devices I monitor remotely can send a lot of 30-second outages (i.e. false-positives) so I turn OFF alerting for that here. otherwise I can get a flood of logged alerts about things going up and down that aren’t really going up and down.
- Save the file.
- Open up remote-include/remotesite in a text editor, and put in the actual device IP addresses, one on each line. Save the file.
- Run xmllint on poller-configuration.xml to check for errors. This is also important. If there are any errors in this file, OpenNMS will not start.
- Open up the file monitoring-locations.xml. This file tells OpenNMS what locations to poll. You’ll need to add the remote location to this file, and it’ll look something like this :
<?xml version="1.0" encoding="UTF-8"?>
<location-def location-name="Remote Site"
geolocation="" coordinates="" priority="50">
You’ll need to add a location-dev for each remote site to monitor, just like you’ll need to add a polling package for each site. I use the same text for the monitoring-are, name, and polling package name so there is no confusion about what is talking to what.
- Again, save the file, and run xmllint against it to verify integrity.
- Restart OpenNMS to reload the polling configuration and the monitoring location files.
- Open up the OpenNMS admin console with your web browser.
- Go to the Admin pane
- Add a user for the remote poller to connect as. I use “remoting” for this purpose.
- Set up a some Surveillance Categories for the remote sites. I have one for each site, as well as one for “Servers” and “Devices”
- Go to Provisioning Groups and add a group named (guess!) “remotesite”
- Click “Edit” for the Requisition of the new group.
- Add a Node. I use the following setup when I’m adding these :
Node : [name].remotesite
ForeignID : [Mac Address of Device]
Site : remotesite
- Save that, and click “Add interface” Fill in the IP address of the Device in question. Save that.
- Click “Add Service” and from the drop-down, choose “SSH” and Save.
- Click “Add Node Category” and from the dropdown, choose a category for the node. Save, and repeat for each category the device needs to belong to. In my case it’s typially three – “Production” “Remote Site” and either “Devices” or “Servers”
- After you have finished with this, click “Done”
- Go back to the main Admin screen (note I did NOT say to sychronize – YET).
- Go to “Manage Applications” and create an Application for “Remote Site”
OK, we’ve set up the base in OpenNMS. Now it’s time to set up the remote poller itself. We’ll come back to this afterwards, to import the nodes, and test the monitors. A couple of the things I do here are to work around bugs in the remote poller installer and package (OpenNMS Bugs 3937, 3938). All of these commands NEED to be run as root, and not under “sudo” – actually logged in as root.
- Install the sun JDK on the remote server. Transfer the opennms remote poller to the remote server and install it if you haven’t done this already. I use CentOS, so I’ve got a custom repository with all the custom or non-standard packages I use in it. So for me it goes something like “yum install jdk opennms-remote-poller.”
- go to /opt/opennms and mkdir etc
- inside of /opt/opennms/etc create the file java.conf. java.conf has only the path to your java installation, like this :
- edit /etc/sysconfig/opennms-remote-poller, and modify it to “talk” to your main OPENNMS server, like so:
URI=http://[your opennms server]/opennms-remoting
- Save that.
- This is the part where you NEED the GUI. The first time you run the remote poller it MUST be as the gui, and not the service. So, at a command line run the following :
/opt/opennms/bin/remote-poller.sh -u http://[your opennms server]/opennms-remoting -l remotesite -g -n remoting -p [password]
- A screen will pop-up, asking for what location to register as. Choose the remote site from the drop-down.
- A blank “form” should show up saying it’s running. If it crashes, or doesn’t start, check out ~/.opennms/opennms-remote-poller.log – that will have some useful messages in there as to WHY. Usually it’s a bad password or the URL is formatted wrong or something similar. Sometimes it’s because the command line parameters MUST be in the order above. But read the trace and you should be able to figure it out.
- Don’t close this screen yet. We’ll want it for testing.
- Go back to the OpenNMS web console. If all has gone well, under Admin->Manage Location Monitors remotesite has appeared, with an ID, a Definition Name, and a Status of “started”
- Go to Admin->Manage Provisioning Groups
- Click “Synchronize” for the remotesite Provisioning Group.
- Go get some coffee
- When you get back, the nodes should be imported into the OpenNMS Node list.
- Go to Admin->Manage Applications, and choose the application you added earlier for the site.
- Click Edit. From the Available Services List on the left, Select the Nodes we just imported, and Add them.
- As the remote poller updates from the opennms server, the services you just added to the Application should start showing up in the GUI. If they don’t, you probably misconfigured either the device IP, or the ip ranges & setup in poller-configuration.xml
- One you see the devices showing up in the remote poller, shut it down, and run it as a service on the server. That’s “service opennms-remote-poller start” under CentOS/RHEL. Now is also the time to make sure this will properly start when the server reboots.
At this point, you should be able to configure notifications in the OpenNMS Admin console – most notably, “OpenNMS defined remote poller events” – I have mine set for node up and down messages.
If you have any questions, comments, or criticisms, feel free to comment on this post, drop me a note via the contact form on sonney.com or ping me on #opennms (user “alchemist”).