Configuring Your NMS (Essential SNMP)

Mục Lục

6.1. HP’s OpenView Network Node Manager

Network Node Manager (NNM) is a licensed software product. The package includes a feature called “Instant-On” that allows you to use the product for a limited time (60 days) while you are waiting for your real license to arrive. During this period, you are restricted to a 250-managed-node license, but the product’s capabilities aren’t limited in any other way. When you install the product, the Instant-On license is enabled by default.

TIP:
Check out the OpenView scripts
located in OpenView’s bin directory
(normally /opt/OV/bin). One particularly
important group of scripts sets environment variables that allow you
to traverse OpenView’s directory structure much more easily.
These scripts are named ov.envvars.csh,
ov.envvars.sh, etc. (that is,
ov.envvars followed by the name of the shell
you’re using). When you run the appropriate script for your
shell, it defines environment variables such as $OV_BIN, $OV_MAN, and
$OV_TMP, which point to the OpenView bin,
man, and tmp directories.
Thus, you can easily go to the directory containing OpenView’s
manual pages with the command cd $OV_MAN. These
environment variables are used throughout this book and in all of
OpenView’s documentation.

6.1.1. Running NNM

$OV_BIN/ovw. This starts OpenView’s NNM. If
your NNM has performed any discovery, the nodes it has found should
appear under your Internet (top-level) icon. If you have problems
starting NNM, run the command $OV_BIN/ovstatus -c
and then $OV_BIN/ovstart or
$OV_BIN/ovstop, respectively, to start or stop it.
By default, NNM installs the necessary scripts to start its daemons
when the machine boots. OpenView will perform all of its functions in
the background, even when you aren’t running any maps. This
means that you do not have to keep a copy of NNM running on your
console at all times and you don’t have to start it explicitly
when your machine reboots.
To start the OpenView GUI on a Unix machine, define your DISPLAY environment variable and run the command. This starts OpenView’s NNM. If your NNM has performed any discovery, the nodes it has found should appear under your Internet (top-level) icon. If you have problems starting NNM, run the commandand then $or, respectively, to start or stop it. By default, NNM installs the necessary scripts to start its daemons when the machine boots. OpenView will perform all of its functions in the background, even when you aren’t running any maps. This means that you do not have to keep a copy of NNM running on your console at all times and you don’t have to start it explicitly when your machine reboots.

When the GUI
starts, it presents you with a clickable high-level map. This map,
called the Root map, provides a top-level view of your network. The
map gives you the ability to see your network without having to see
every detail at once. If you want more information about any item in
the display, whether it’s a subnet or an individual node, click
on it. You can drill down to see any level of detail you
want — for example, you can look at an interface card on a
particular node. The more detail you want, the more you click. Figure 6-1 shows a typical NNM map.

Figure 6-1

Figure 6-1. A typical NNM map

The menu bar (see Figure 6-2 ) allows you to traverse the map with a bit more ease. You have options such as closing NNM (the leftmost button), going straight to the Home map (second from the left), [18] the Root map (third-left), the parent or previous map (fourth-left), or the quick navigator. [19] There is also a button that lets you pan through the map or zoom in on a portion of it.

Figure 6-2. OpenView NNM menu bar

TIP:
Before you get sick looking at your newly discovered network, keep in
mind that you can add some quick and easy customizations that will
transform your hodgepodge of names, numbers, and icons into a
coordinated picture of your network.

6.1.3. Configuring Polling Intervals

Figure 6.1.3 SNMP Configuration.” A window similar to
the one in The SNMP Configuration page is located off of the main screen in “OptionsSNMP Configuration.” A window similar to the one in Figure 6-7 should appear. This window has four sections: Specific Nodes, IP Address Wildcards, Default, and the entry area (chopped off for viewing purposes). Each section contains the same general areas: Node or IP Address, Get Community, Set Community, Proxy (if any), Timeout, Retry, Port, and Polling. The Default area, which unfortunately is at the bottom of the screen, sets up the default behavior for SNMP on your network — that is, the behavior (community strings, etc.) for all hosts that aren’t listed as “specific nodes” or that match one of the wildcards. The Specific Nodes section allows you to specify exceptions, on a per node basis. IP Address Wildcards allows you to configure properties for a range of addresses. This is especially useful if you have networks that have different get and set community names. [21] All areas allow you to specify a Timeout in seconds and a Retry value. The Port field gives you the option of inserting a different port number (the default port is 161). Polling is the frequency at which you would like to poll your nodes.

Figure 6-7

Figure 6-7. OpenView’s SNMP Configuration page

It’s important to understand how timeouts and retries work. If
we look at Specific Nodes, we see a Timeout of .9 seconds and a Retry
of 2 for 208.166.230.1. If OpenView doesn’t get a response
within .9 seconds, it tries again (the first retry) and waits 1.8
seconds. If it still doesn’t get anything back, it doubles the
timeout period again to 3.6 seconds (the second retry); if it still
doesn’t get anything back it declares the node unreachable and
paints it red on the NNM’s map. With these Timeout and Retry
values, it takes about 6 seconds to identify an unreachable
node.

It’s important to understand how timeouts and retries work. If we look at Specific Nodes, we see a Timeout of .9 seconds and a Retry of 2 for 208.166.230.1. If OpenView doesn’t get a response within .9 seconds, it tries again (the first retry) and waits 1.8 seconds. If it still doesn’t get anything back, it doubles the timeout period again to 3.6 seconds (the second retry); if it still doesn’t get anything back it declares the node unreachable and paints it red on the NNM’s map. With these Timeout and Retry values, it takes about 6 seconds to identify an unreachable node.

Imagine what would happen if we had a Timeout of 4 seconds and a
Retry of 5. By the fifth try we would be waiting 128 seconds, and the
total process would take 252 seconds. That’s over four minutes!
For a mission-critical device, four minutes can be a long time for a
failure to go unnoticed.

This example shows that you must be
very careful about your Timeout and Retry settings — particularly
in the Default area, because these settings apply to most of your
network. Setting your Timeout and Retry too high and your Polling
periods too low will make netmon fall behind; it
will be time to start over before the poller has worked through all
your devices.[22] This is a
frequent problem when you have many nodes, slow networks, small
polling times, and high numbers for Timeout and Retry.[23] Once a system falls behind, it will take a long time to
discover problems with the devices it is currently monitoring, as
well as to discover new devices. In some cases, NNM may not discover
problems with downed devices at all! If your Timeout and Retry values
are set inappropriately, you won’t be able to find problems and
will be unable to respond to outages.

Falling behind can be very frustrating. We recommend starting your
Polling period very high and working your way down until you feel
comfortable. Ten to twenty minutes is a good starting point for the
Polling period. During your initial testing phase, you can always set
a wildcard range for your test servers, etc.

Falling behind can be very frustrating. We recommend starting your Polling period very high and working your way down until you feel comfortable. Ten to twenty minutes is a good starting point for the Polling period. During your initial testing phase, you can always set a wildcard range for your test servers, etc.

6.1.4. A Few Words About NNM Map Colors

ping, its color will be
green. If the device cannot be reached, it will turn red. If
something “underneath” the device fails, the device will
become off-green, indicating that the device itself is okay, but
something underneath it has reached a nonnormal status. For example,
a router may be working, but a web server on the LAN behind it may
have failed. The status source for an object like this is Compound or
Propagated. (The other types of status source are Symbol and Object.)
The Compound status source is a great way to see if there is a
problem at a lower level while still keeping an eye on the big
picture. It alerts you to the problem and allows you to start
drilling down until you reach the object that is under duress.
By now discovery should be taking place, and you should be starting to see some new objects appear on your map. You should see a correlation between the colors of these objects and the colors in NNM’s Event Categories (see Chapter 10, “Traps” for more about Event Categories). If a device is reachable via, its color will be green. If the device cannot be reached, it will turn red. If something “underneath” the device fails, the device will become off-green, indicating that the device itself is okay, but something underneath it has reached a nonnormal status. For example, a router may be working, but a web server on the LAN behind it may have failed. The status source for an object like this is Compound or Propagated. (The other types of status source are Symbol and Object.) The Compound status source is a great way to see if there is a problem at a lower level while still keeping an eye on the big picture. It alerts you to the problem and allows you to start drilling down until you reach the object that is under duress.

It’s always fun to shut off or unplug a machine and watch its
icon turn red on the map. This can be a great way to demonstrate the
value of the new management system to your boss. You can also learn
how to cheat and make OpenView miss a device, even though it was
unplugged. With a relatively long polling interval, it’s easy
to unplug a device and plug it back in before OpenView has a chance
to notice that the device isn’t there. By the time OpenView
gets around to it, the node is back up and looks fine. Long polling
intervals make it easy to miss such temporary failures. Lower polling
intervals make it less likely that OpenView will miss something, but
more likely that netmon will fall behind, and in
turn miss other failures. Take small steps so as not to crash or
overload netmon or your network.

6.1.5. Using OpenView Filters

Your map may include some devices you don’t need, want, or care about. For example, you may not want to poll or manage users’ PCs, particularly if you have many users and a limited license. It may be worthwhile for you to ignore these user devices to open more slots for managing servers, routers, switches, and other more important devices. netmon has a filtering mechanism that allows you to control precisely which devices you manage. It lets you filter out unwanted devices, cleans up your maps, and can reduce the amount of management traffic on your network.

In this book, we warn you repeatedly that
polling your network the wrong way can generate huge amounts of
management traffic. This happens when people or programs use default
polling intervals that are too fast for the network or the devices on
the network to handle. For example, a management system might poll
every node in your 10.1.0.0 network — conceivably thousands of
them — every two minutes. The poll may consist of SNMP
get or set requests, simple
pings, or both. OpenView’s NNM uses a
combination of these to determine if a node is up and running.
Filtering saves you (and your management) the trouble of having to
pick through a lot of useless nodes and reduces the load on your
network. Using a filter allows you to keep the critical nodes on your
network in view. It allows you to poll the devices you care about and
ignore the devices you don’t care about. The last thing you
want is to receive notification each time a user turns off his PC
when he leaves for the night.

Filters
also help network management by letting you exclude DHCP users from
network discovery and polling. DHCP and BOOTP are used in many
environments to manage large IP address pools. While these protocols
are useful, they can make network management a nightmare, since
it’s often hard to figure out what’s going on when
addresses are being assigned, deallocated, and recycled.

In my environment we use DHCP only for our users. All servers and
printers have hardcoded IP addresses. With our setup, we can specify
all the DHCP clients and then state that we want everything
but these clients in our discovery, maps, etc.
The following example should get most users up and running with some
pretty good filtering. Take some time to review OpenView’s
“A Guide to Scalability and Distribution for Network Node
Manager” manual for more in-depth information on filtering.

The default filter file, which is located in
$OV_CONF/C, is broken up into three sections:

Sets
Filters
FilterExpressions

In addition, lines that begin with // are comments. // comments can appear anywhere; some of the other statements have their own comment fields built in.

Sets allow you to place individual
nodes into a group. This can be useful if you want to separate users
based on their geographic locations, for example. You can then use
these groups or any combination of IP addresses to specify your
Filters, which are also grouped by name. You then can take all of
these groupings and combine them into FilterExpressions. If this
seems a bit confusing, it is! Filters can be very confusing,
especially when you add complex syntax and not so logical logic
(&&, ||, etc.). The basic syntax for defining Sets, Filters,
and FilterExpressions looks like this:

name "comments or description" { contents }

Every definition contains a name, followed by comments that appear in
double quotes, and then the command surrounded by brackets. Our
default filter,filters, is located in
$OV_CONF/C and looks like this:

Every definition contains a name, followed by comments that appear in double quotes, and then the command surrounded by brackets. Our default filter, [24] named, is located in $OV_CONF/C and looks like this:

// lines that begin with // are considered COMMENTS and are ignored!
// Begin of MyCompanyName Filters

Sets {

    dialupusers "DialUp Users" { "dialup100", " dialup101", \
                 " dialup102" }
}

Filters { 

    ALLIPRouters "All IP Routers" { isRouter }

    SinatraUsers "All Users in the Sinatra Plant" { \
        ("IP Address" ~ 199.127.4.50-254) || \
        ("IP Address" ~ 199.127.5.50-254) || \
        ("IP Address" ~ 199.127.6.50-254) }

    MarkelUsers "All Users in the Markel Plant" { \
        ("IP Address" ~ 172.247.63.17-42) }

    DialAccess "All DialAccess Users" { "IP Hostname" in dialupusers }
}

FilterExpressions
{
    ALLUSERS "All Users" { SinatraUsers || MarkelUsers || DialAccess }

    NOUSERS "No Users " { !ALLUSERS }
}

Now let’s break this file down into pieces to see what it does.

6.1.5.1. Sets

dialupusers
containing the hostnames (from DNS) that our dial-up users will
receive when they dial into our facility. These are perfect examples
of things we don’t want to manage or monitor in our OpenView
environment.
First, we defined a Set [25] calledcontaining the hostnames (from DNS) that our dial-up users will receive when they dial into our facility. These are perfect examples of things we don’t want to manage or monitor in our OpenView environment.

6.1.5.2. Filters

ALLIPRouters, SinatraUsers,
MarkelUsers, and DialAccess.
The first filter says to discover nodes that have field value
isRouter. OpenView can set the object attribute
for a managed device to values such as isRouter,
isHub, isNode, etc. The Filters section is the only nonoptional section. We defined four filters:, and. The first filter says to discover nodes that have field value. OpenView can set the object attribute for a managed device to values such as, etc. [26] These attributes can be used in Filter expressions to make it easier to filter on groups of managed objects, as opposed to IP address ranges, for example.

SinatraUsers filter is the
more complex of the two. In it, we specify three IP address ranges,
each separated by logical OR symbols (||). The
first range (("IP Address" ~
199.127.6.50-254)) says that if the IP address is
in the range 199.127.6.50-199.127.6.254, then filter it and ignore
it. If it’s not in this range, the filter looks at the next
range to see if it’s in that one. If it’s not, the filter
looks at the final IP range. If the IP address isn’t in any of
the three ranges, the filter allows it to be discovered and
subsequently managed by NNM. Other logical operators should be
familiar to most programmers: && represents a logical AND,
and ! represents a logical NOT.
The next two filters specify IP address ranges. Thefilter is the more complex of the two. In it, we specify three IP address ranges, each separated by logical OR symbols (). The first range () says that if the IP address is in the range 199.127.6.50-199.127.6.254, then filter it and ignore it. If it’s not in this range, the filter looks at the next range to see if it’s in that one. If it’s not, the filter looks at the final IP range. If the IP address isn’t in any of the three ranges, the filter allows it to be discovered and subsequently managed by NNM. Other logical operators should be familiar to most programmers: && represents a logical AND, and ! represents a logical NOT.

The final filter, DialAccess, allows us to exclude
all systems that have a hostname listed in the
dialupusers set, which was defined at the
beginning of the file.

6.1.5.3. FilterExpressions

contents parts of their expressions. You can then
use FilterExpressions to create simpler yet more robust expressions.
In our case, we take all the filters from above and place them into a
FilterExpression called ALLUSERS. Since we want
our NNM map to contain nonuser devices, we then define a group called
NOUSERS and tell it to ignore all user-type
devices with the command !ALLUSERS. As you can
see, FilterExpressions can also aid in making things more readable.
When you have finished setting up your filter file, use the
$OV_BIN/ovfiltercheck program to check
your new filters’ syntax. If there are any problems, it will
let you know so you can fix them.
The next section, FilterExpressions, allows us to combine the filters we have previously defined with additional logic. You can use a FilterExpression anywhere you would use a Filter. Think of it like this: you create complex expressions using Filters, which in turn can use Sets in theparts of their expressions. You can then use FilterExpressions to create simpler yet more robust expressions. In our case, we take all the filters from above and place them into a FilterExpression called. Since we want our NNM map to contain nonuser devices, we then define a group calledand tell it to ignore all user-type devices with the command. As you can see, FilterExpressions can also aid in making things more readable. When you have finished setting up your filter file, use the $OV_BIN/ovfiltercheck program to check your new filters’ syntax. If there are any problems, it will let you know so you can fix them.

Now that we have our filters defined, we can apply them by using the
ovtopofix command or the polling configuration
menu shown in Figure 6-3.

If you want to remove nodes from your
map, use $OV_BIN/ovtopofix -f FILTER_NAME.
Let’s say that someone created a new DHCP scope without telling
you and suddenly all the new users are now on the map. You can edit
the filters file, create a new group with the IP address range of the
new DHCP scope, add it to the ALLUSERS
FilterExpression, run ovfiltercheck, and, if there
are no errors, run $OV_BIN/ovtopofix -f NOUSERS to
update the map on the fly. Then stop and restart
netmon — otherwise it will keep discovering
these unwanted nodes using the old filter. I find myself running
ovtopofix every month or so to take out some
random nodes.

6.1.6. Loading MIBs into OpenView

Before

Figure 6.1.6
Load/Unload MIBs: SNMP.” This presents you with a window in
which you can add vendor-specific MIBs to your database.
Alternatively, you can run the command
$OV_BIN/xnmloadmib and bypass having to go through
NNM directly.

Before you continue exploring OpenView’s NNM, take time to load some vendor-specific MIBs. [27] This will help you later on when you start interacting (polling, graphing, etc.) more with SNMP-compatible devices. Go to “OptionsLoad/Unload MIBs: SNMP.” This presents you with a window in which you can add vendor-specific MIBs to your database. Alternatively, you can run the commandand bypass having to go through NNM directly.

That’s the end of our brief tour of OpenView configuration.
It’s impossible to provide a complete introduction to
configuring OpenView in this chapter, so we tried to provide a survey
of the most important aspects of getting it running. There can be no
substitute for the documentation and manual pages that come with the
product itself.

That’s the end of our brief tour of OpenView configuration. It’s impossible to provide a complete introduction to configuring OpenView in this chapter, so we tried to provide a survey of the most important aspects of getting it running. There can be no substitute for the documentation and manual pages that come with the product itself.