Sophie: nisca-2.3.2-1mdk noarch

nisca-2.3.2-1mdk.noarch.rpm

             Welcome to N.I.S.C.A. v2.3.2 (XX February, 2003)

About
-----

Network Interface Statistics Collection Agent (or N.I.S.C.A.)
is a complete network statistics collector and graph generator
aimed at helping network administrators do their job by
providing functionality that MRTG doesn't offer.


You moved the Changelog???
--------------------------

Yep, sure did. It's now located in the CHANGELOG file instead.
And the To Do list has moved to the TODO file.


Why does NISCA exist?
---------------------

NISCA was born to replace the popular MRTG package
(http://www.ee.ethz.ch/~oetiker/webtools/mrtg/mrtg.html).
Although MRTG is a fine application respected all over
the world, I've always found it lacking some features
that I really wanted in a network statistics analyzer;
things like true-type fonts, collection of data into a
database, the "time zoom" feature, and the ability to
generate graphs from any period in the past without
losing any detail due to data compression.


NISCA 2.3.2 features
--------------------

- Supports creating graphs which combine the stats of more than one
  interface onto one graph (only supports bytes transferred). This
  differs from MRTG's method, which requires that you have a special
  rule set up to *collect* the stats additively. Nisca allows you
  to just suddenly decide that you want to see the sum of the
  transfer rates on any number of interfaces on any number of
  machines over any time period; all that's required is that
  they have normal statistics already collected for them.

- Graphs can have any watermark image of your choosing embossed into
  them, for things like a corporate logo (tm), a copyright, an ego
  booster, whatever. Just so long as it isn't a GIF (thanks, Unisys).

- Allows for re-averaging statistics in the database over any time
  period, like MRTG does automatically; however, NISCA's re-averaging utility
  must be run manually, and it makes a compressed backup of everything it
  removes so you can re-import it later should you need stats from some time
  period in the distant past that are as accurate as they used to be.
  Compression usually results in an 85-90% decrease in file size, and the
  data is actually compressed *before* it's written to disk, assuming you
  have libz (gzip/bzip) support in your PHP installation.

- Interface names can be collected via several methods to keep them
  unique; "ifDescr", "ifIndex", "ifName", "Catalyst Port Name", or
  "MAC Address". ("ifIndex" is the one MRTG uses by default.)

- Committed Information Rates (CIR) can be defined on any interface
  and graphs for it will clearly show the CIR so you can see if you're
  going over it.

- The "ifSpeed" entry for any interface can be modified; some interfaces
  don't report an ifSpeed value, and others report the wrong one (like
  100BaseT ethernet interfaces usually claiming to only be 10-megabit
  on my box). ifSpeed is used while generating graphs and is shown
  on the report page for each interface being reported on.

- Human-readable aliases can be defined for all interfaces and hostnames,
  or the ones stored in the machine itself can be used.

- Context-sensitive help buttons available on almost every page.

- Graphs can be PNG or JPG format.

- Uses MySQL to store all the gathered data.

- Simultaneously supports SNMPv1 and SNMPv2 for 64-bit counters.
  (Read http://www.isthisthingon.org/nisca/SNMPv2.html
   to know *why* you would want 64-bit counters.)

- Graph sizes can be customized. (Default is 700x250 pixels.)

- Date and time formats can be customized to satisfy
  all possible users' locale/nationalization requirements.

- All colors used to generate graphs can be customized,
  both the default set of colors and any single report
  request.

- Uses either TrueType fonts or the built-in libGD fonts.
  (The use of TTFs require a bigger graph size
  in order to prevent text overlapping. It is suggested
  you use at least 750x200 with TTFs.)

- NISCA collects statistics for transferred bytes,
  transferred packets, transmission errors, and dropped packet
  counts for both incoming and outgoing traffic on each monitored
  interface.

- Collection interval can be customized from one second to
  whatever is needed.
  (ATTENTION! the misuse of this parameter can generate
  a lot of problems. Please read the FINE TUNING section.)

- Ability to collect statistics from the PC where NISCA
  is running without the use of SNMP (though it appears only
  Linux with the proc filesystem supports this method).

- SNMP setup uses actual SNMP data collected from the agent
  to make the list of interfaces you can choose to monitor.

- There is no limit to the number of interfaces that can be
  monitored at the same time, other than disk space.
  (Please read the FINE TUNING section.)

- Interface naming has little restriction. Names can include
  blanks and/or symbols. The only exceptions are "!" and "*";
  these are used to separate interfaces from hostnames in
  certain places, so must not appear in either.

- Different SNMP communities can be used on the same host
  for any of its interfaces.

- Reports can be generated for any time period.

- Reports can contain graphs and/or text summaries.

- Graphs can be restricted to incoming data only, outgoing
  data only, or both.

- Text summaries can be averaged/summarized per any number
  of seconds, minutes, hours, days, months, or years, and a
  grand total of all traffic for that period is displayed.

- Uses persistent MySQL connections to cut down on overhead.

- A script is provided to import data from your existing MRTG logs.
  (I'd be grateful if someone could show me how to import RRDTool logs.)

- A script is provided to make sure your collectors stay running.
  You can also run them from crontab if you want to for some reason.

- Report settings can be stored and reloaded at any time to speed
  up viewing the reports you view most often.

- Most configuration options are kept in the database.
  The rest are in two .conf files which can be located
  anywhere on your filesystem you wish.

- A Web Administration GUI is provided to configure NISCA.

- A statistics deletion utility is provided to manage manual
  database entry removal/cleanup.

- The collection script (snmp_collect) will cache collected data if its
  link to the database goes down, then keep collecting and caching data
  until it comes back up, then send it all when it can.


How does one use it?
--------------------

First, install Apache, MySQL, and PHP4 (see the PHP_HINTS file).
Then, install NISCA (see the INSTALL file). Then, configure it
(see the INSTALL file). Then, use it (see below).

The form on the index.php page is fairly self-explanatory,
but here are some things to keep in mind while using it.

     NOTE:
     -----
The list of hosts/interfaces to choose from is generated from
the actual collected stats in the database, NOT the interfaces
you have configured in the administration section! I'm hoping
the reason for this is obvious. :)

Also note that communities are not displayed or used at all on either the
index page or the actual report page for security reasons, so if you
monitor more than one community on the same host, and the same interface
name exists in both communities, you'll only see it on this page *once*..
but a report generated for that interface will merge the stats for every
community it exists under. Try to avoid monitoring the same interface via
two different communities on the same host for this reason.

If using the "fancy" Javascript host/interface selection method,
here's what you do...

1)   Pick the hostname out of the first select box.
     This will set the bottom-left select box to a list
     of the interfaces in every community on that host.

2)   Click on an interface you want a report on.
     This will add it to the bottom-right select box.

3)   Repeat as necessary, changing the hostname as needed
     to get the interfaces you want in the select box.

4)   The "Clear Deselected" button will clear any
     interfaces from the bottom-right select box that
     aren't selected, but it tends to crash Netscape
     4.7 on Xwindows, and Mozilla 0.8 has some major
     trouble with multiple select boxes, so don't rely
     on it.

5)   The "Clear All" button erases all interfaces in the
     bottom-right list; it shouldn't crash anything.

6)   Once you have the perfect list of interfaces to report
     on in the bottom-right box, set the other report
     options on the page as desired and submit it.

If not using the fancy selection method, you'll get a list of
every hostname/interface combination to pick from.
This is more convenient if you only have a few interfaces
monitored, but can be annoying if there are hundreds to
choose from. You can turn off the fancy method in the Global
Options config section. If you're using Mozilla, you'll then get
to see the <OPTGROUP> tags in action; they separate each host
to make the list easier to navigate. If you have any other
browser in existence, they probably won't show up. (Even though
OPTGROUP is in the HTML4.0 standard, Mozilla (and any other
browsers based on that Gecko rendering engine thing) is the only
browser claiming to support HTML4 that supports OPTGROUPs.
That I could find, at least. I know all versions of Netscape
and Internet Explorer (tm) don't support it even though they
claim to be HTML4-compliant. Are you surprised?)

If you have the "Select how much data to view here" drop-down
box set to "A date/time range, set below", it will use the
"from" and "to" dates and times at the bottom of that section
to restrict which data to analyze; otherwise, it will ignore
everything in the "From" and "To" boxes.

If you have one of the options with an "X" in it selected, it
will use whatever is in the "X = ___" field in place of the
"X". So if you select "The past X hours" and put "3" in the
"X = ___" field, you'll get a report covering the past three
hours and nothing more. You might get something *less* though,
since the odds are great that it won't return *exactly* three
hours' worth of data; you're more likely to get two hours and
59 minutes' worth, depending on what you have the "$delay" set
to. NISCA always uses the actual time stamps in the database
rather than trying to force it to precisely match a particular
time frame. You can also put decimals in it; for example,
"1.5" in "X" and "Days" in the dropdown box will give you stats
for the past day and a half.

If you select "The entire contents of the database", that's
exactly what you'll get... so be careful if you've got years
and years of data collected four times a minute in it. NISCA
ain't *that* fast yet... :)

Sometimes, like during fsck-laden reboots or periods during which
you didn't collect data, there will be gaps in the data. In
this case, NISCA will point out the places where it filled in the
intervening space as best it could by putting the From and To
times in red. Its detection of this condition is done by adding
twice the requested summary interval (last section of the form)
to the previous timestamp and if that's still less than the
current stamp, it will assume there was a gap and make it red
just to call your attention to it. This doesn't catch all gaps,
though, only the ones *larger* than the Summary Interval you
specified on the report form. However, it always calculates
averages using the *actual* time period of each line, so gaps
are always averaged right whether the intervening time
matches the requested summary interval or not.

Each graph contains a red circle around the largest Y-axis values
found on it, so you can quickly find the peaks. Peak values and
times are placed on the top of the graph.

Each graph generated is given a unique filename using a rather large
random number, so every time you run it it'll give you a different
image filename. This is thanks to the (mis)behavior of the caching
mechanisms of almost all browsers. Also, every time you run it, any
graphs older than one minute are deleted, so there shouldn't be any
build-up of them.

Reports can be saved under any name you wish. Once you've set the
options on the form the way you want to save them, enter a name
for the report (near the bottom of the page) and then hit the
"Run It" button. The report options will be saved, then used to
display the requested information. But if a report already exists
with the name you choose, it won't be overwritten; you have to
use the admin section to either delete it and then try to save
it again, or save it under a different name (it will still show
you the results of the options you chose, it just won't *save*
them as that report name).

To recall a saved report, just click on it in the drop-down list
at the top of the index page. If you have Javascript disabled,
you'll have to then click the "Run It" button to view it; if
it's enabled, the report will be displayed as soon as you change
the value of the drop-down list.

Report administration is handled via the administration pages;
that's the only place an existing report can be changed or deleted.

One more thing; if you submit a report and then hit "escape" to stop
loading it before it displays, the servers and scripts will continue
to grind along working on it even though you'll never see the
results. Try to avoid doing it... it can cause slowness. :)

Oh. See the end of the INSTALL file for instructions on using
the fancy new administration section.


Fine Tuning NISCA
-----------------

This will be one of the hardest tasks for a NISCA user,
but all the people involved in developing and contributing
to this project are working hard in order to provide as
much information as possible.

The NISCA user has to take into account many parameters
in order to setup the COLLECTION INTERVAL and the number
of hosts/interfaces monitored with NISCA. The interval is
rather different than MRTG's, which is done via crontab and
thus can generate overlapping statistics if collection takes
longer than the crontab interval (300 seconds, usually).
The way the collectors in NISCA work is, they will poll all
your monitored hosts and THEN go to sleep for the delay
interval you have configured; thus, if collection takes
six minutes, and your delay is 5 minutes, the effective
delay time will be eleven minutes. Running the command
"snmp_collect t" will help you determine how long each
collection cycle takes, and you can adjust your interval
time accordingly. (The "t" puts it in debug mode.)

Another thing about the collection interval. The smaller it
is, the "fuzzier" your graphs will be. Anything less than
15 seconds or so will be just about useless. A 5-minute
delay will probably look best on fast interfaces (and take
up much less database space. :)

People need to evaluate many parameters in order to not
generate overload of the whole system (nisca, network,
monitored hosts, etc.) Estimating all these parameters
is very very complex, especially because various systems
react in different ways to SNMP requests and the network
conditions can change from moment to moment.

DO NOT overestimate your setup's abilities!

One thing about NISCA is that it uses memory, a lot of it,
while it's generating reports for you (and only then).
And the more datapoints being analyzed, the more memory it
takes. This means you can quickly get several httpd processes 
taking up lots and lots of memory. To help fix this, I've
changed the "MaxRequestsPerChild" setting in Apache's
httpd.conf file from its default of "0" (unlimited) to "1".
This will force every child server process to die as soon as
it's done with its request, and thus it won't consume all your
memory. Setting this to "2" or higher doesn't seem to do much
good; the children don't die, and new children are spawned
which will take up just as much space, so if you run four
60-meg reports one after the other you could bring your
machine to a complete halt if it's set higher than "1".
Your mileage, as always, may vary; tune it for you. This seems
to be much more well-behaved with later Apache's (1.3.27 is
what I use now and it plays nice).

I've also seen PHP die with an error similar to "Maximum
allowed memory usage exceeded" when viewing large reports. 
If this happens to you a lot, you can edit your "php.ini"
file and change the max memory allowed (it defaults to 8
meg, 8388608). This setting is called "memory_limit". Don't
forget to HUP or restart your web server if you change this.

In the future, more technical details will be provided, but
for now the user should start with a minimum setup: a Delay
value of 300 seconds at first, then slowly increase the
number of interfaces and decrease the Delay time in order
to not overload both the NISCA server and the network(s)
over which the server is polling the monitored hosts. Trial
and error is the best way to see what you can get away with.


Benchmark
---------

The report generator currently generates a graph from 66,000
datapoints (230 days' worth) in about 6 seconds running on a
1.2Ghz AMD Thunderbird with 512M of RAM. A multi-interface report
which adds the transfer averages of 2 interfaces over a one-month
period (some 16,000 entries) takes 45 seconds (it's a much more
intense operation). Your Mileage May Vary. It's the graph generation
that takes so long. Yes, I'm working on ways of speeding it up...
it ain't easy.

Apparently the report generation time isn't entirely cumulative;
getting reports one interface at a time takes more time than one
report on many interfaces.

As for disk space, the 1,000,000 entries in my database take up 104
meg of disk space in the form of MySQL tables/indices. Since I moved
the hosts, communities, and interfaces to another table and now use
medium integers to refer to them in the "stats" table, the disk
space usage has been cut in half and response time of just about
everything (except reports) has become instantaneous since it doesn't
have to look through hundreds of thousands of rows to find every
unique host/community/if now. Even graph generation speed has been
doubled just from this one change. Just the opposite of the effect
I thought it would have; live and learn, I always say.


A Detailed Description of the Multiple-Interface Graphing Method
----------------------------------------------------------------

I'm including this just to satisfy people's curiosity. I'm sure
there are other geeks out there who'd love to know how it works.
So here we go... warning; it may get a bit technical.

When I set out to write the multi-IF graphing code, I had no idea
how complicated it was, or how simple the final solution would be.
I had to rewrite it all from scratch four times to get it right,
making all kinds of notes and diagrams and drawings and stuff.
Here's what I finally came up with.

First of all, I didn't want Nisca to do it the MRTG way: require
you to be collecting pre-summed statistics from each interface
desired before you can draw a graph of it. It just seemed silly to
me, especially since it requires that you poll each interface
TWICE... once for the regular single stats, and once again for the
summed-interfaces stats. There had to be a way to take any
existing set of statistics for any combination of interfaces on
any number of hosts over any time period, whether all the
interfaces involved had identical time periods or not, and add
them together in the same time periods. My first attempt was
horrible; I won't bore you with the grisly details of how a one-
month report took half an hour and 500 meg of memory, and then
delivered a graph that looked like something a drunk centipede had
walked all over after wading through a few pools of paint. Let's
just say, I wasn't satisfied.

So after the second rewrite attempt, I'm sitting there staring into
space trying to think of an answer, and I realized I was staring at
a CD storage rack. And my mind whispered to me, "Pigeonholing!"
Just make the entries fall into the right slots, and make the slots
as wide as the collection interval. But even that delivered pretty
shoddy results. And then I realized something else, something that
probably would have occurred instantly to anyone who does statistical
analysis for a living.

Statistics are always measured in pairs. There's a starting point
for both the counter itself (which is a running total) and the
timestamp it was collected, and a corresponding ending point.
You find the amount of traffic transferred by subtracting the
earlier counter from the later counter. If the machine has
rebooted in between them, this *should* result in a negative
number; if that happens, the later counter is used by itself to
determine the change in count; there's no way to know exactly
how much data was transferred between the earlier counter and
the later counter because it got zeroed out *somewhere* in between
them. So in that case, the entire value of the later counter is
used as the amount transferred between the two entries (because it
was *at least* that many bytes, but probably a lot more). So once
you have a value for the amount of bytes transferred between the
two points, you figure out the interval between them; divide A by
B, and you have the average. But as it turned out, that's useless.

I had divided the report period up into "pigeonholes," or slots,
that were as wide as the requested averaging interval (300 seconds,
or 5 minutes, by default). Sometimes an entry would lie entirely
within one slot; sometimes its starting point was in one slot and
its ending was in the very next slot; and sometimes there was one
or more slots without a datapoint in it in between them. So I
re-re-rewrote it, again, and it worked. Imagine my shock.

It keeps track of the time stamps of the current stat and the
previous one. When start and end are in the same slot, it just adds
that whole counter change to the slot and keeps going. When the end
goes past a slot boundary, it starts doing math. It finds the time
between the start time and the slot boundary and divides it by
the time between the start and end times. This gives it the percentage
of the total time which lies in the earlier slot. It multiplies
that by the whole counter change between them, which tells it how
much of the data belongs to the earlier slot, and it adds it to
the earlier slot's counter array. Then there are two possibilities.

It adds the averaging interval to the slot boundary. That will
either put the boundary *past* the end time of the stat, or it
won't... meaning there are intervening slots without a stat
entry. If so, it repeats the earlier percentage operation, but
instead it divides the value of the averaging interval by the
total time of the entry and adds that to that slot. It does this
until the slot boundary passes the ending timestamp.

Once the slot boundary is past the ending timestamp, however it
got there, it does the percentage thing again, this time using
the time between the slot's beginning and the stat's end timestamp
to calculate the percentage, which it multiplies by the total
and adds to that slot's counter. And by the way, these percentages
*can* be zero; that just means an entry's start or end lies
exactly on a slot boundary, so 100% goes on one side and 0%
goes on the other. Pretty neat how that worked out.

Now, this has one unfortunate side-effect. The very last entry
for an interface won't have an ending point with which to
calculate a transferred count for the last slot. This means
the last slot will almost always fall sharply downwards, since
it will almost always have far less data transferred in it than
all the previous slots. So when viewing these graphs, please
don't panic; it doesn't mean all your interfaces went down
sometime in the past five minutes or anything. :)

Now, that was just for one type of data; incoming bytes, say.
It has to do all that separately for the incoming and outgoing
stats of every entry. And that's why it doesn't support making
multiple-interface graphs of packets, or drops, or errors; not
only is it kinda pointless, it would also mean another hundred
lines of code for each added report type.

Once it's done every entry, it passes the data to the
makegraph() function, which converts them from counts to
average per-second rates, just as it does for regular graphs.
And that's a whole other story in itself. :)


Who has actually contributed?
-----------------------------

Pierfrancesco Caci (pf@gusp.dyndns.org)
Fabio Massimo Di Nitto (fabio.m.d.nitto@ted.ericsson.dk)
  (It wouldn't have been possible without you two... :)
Mark Motley (mmotley@la-mirada.net)
Sean, of Hotlinks Internet Services (www.hotlinks.co.uk)
Jimmy Kaplowitz (jimmy@kaplowitz.org)
Eddy Lai
Tomaso Vasella

New ideas, requests, job offers, and comments are always welcome
from everyone. Contributing to NISCA will only improve its quality.


Oddities
--------

We now have the first confirmed Nisca-related tech support call,
made from Milano to Rome, Italy, at or about 4:00PM (Italy time,
+0200) on Monday, June the 25th, 2001. If anyone knows of an earlier
call, let me know. :)


Mumbo-Jumbo
-----------

This program is released under the GNU General Public License
(see LICENSE). This means you have my permission to do
anything you like with it except printing it out, rolling it up,
and swatting your pet with it. I will not condone cruelty to animals.
And if you make any money off of it, please think of my
poor unemployed self and have pity on me as you count
your millions.

You know that really long boring bit about "as-is"
and "merchantability" and "fitness for a particular purpose"
and all that crap? Insert it here.

Note that I am not affiliated with Team Nisca, who makes ID card
printers; the National Interscholastic Swimming Coaches Association;
the Northern Ireland Society for Computing in Anaesthesia; or
the NISCA protocol, which is used to connect systems in an OpenVMS
cluster. Anyone claiming otherwise will certainly be ridiculed into
an embarassing extinction, because to me NISCA means "the Network
Interface Statistics Collection Agent" and nothing more.

Note that I am affiliated with isthisthingon.org, a very, very
non-profit non-organization of no one in particular.

Any resemblance to actual programs, living or dead, is purely
coincidental. I ask you all to reflect upon how often form
follows function.


Contact Info-Mation
-------------------

Author's email: phee@isthisthingon.org
Official Site: http://nisca.sourceforge.net/
My ICQ #: 13130273