Welcome to N.I.S.C.A. v2.3.2 (XX February, 2003) About ----- Network Interface Statistics Collection Agent (or N.I.S.C.A.) is a complete network statistics collector and graph generator aimed at helping network administrators do their job by providing functionality that MRTG doesn't offer. You moved the Changelog??? -------------------------- Yep, sure did. It's now located in the CHANGELOG file instead. And the To Do list has moved to the TODO file. Why does NISCA exist? --------------------- NISCA was born to replace the popular MRTG package (http://www.ee.ethz.ch/~oetiker/webtools/mrtg/mrtg.html). Although MRTG is a fine application respected all over the world, I've always found it lacking some features that I really wanted in a network statistics analyzer; things like true-type fonts, collection of data into a database, the "time zoom" feature, and the ability to generate graphs from any period in the past without losing any detail due to data compression. NISCA 2.3.2 features -------------------- - Supports creating graphs which combine the stats of more than one interface onto one graph (only supports bytes transferred). This differs from MRTG's method, which requires that you have a special rule set up to *collect* the stats additively. Nisca allows you to just suddenly decide that you want to see the sum of the transfer rates on any number of interfaces on any number of machines over any time period; all that's required is that they have normal statistics already collected for them. - Graphs can have any watermark image of your choosing embossed into them, for things like a corporate logo (tm), a copyright, an ego booster, whatever. Just so long as it isn't a GIF (thanks, Unisys). - Allows for re-averaging statistics in the database over any time period, like MRTG does automatically; however, NISCA's re-averaging utility must be run manually, and it makes a compressed backup of everything it removes so you can re-import it later should you need stats from some time period in the distant past that are as accurate as they used to be. Compression usually results in an 85-90% decrease in file size, and the data is actually compressed *before* it's written to disk, assuming you have libz (gzip/bzip) support in your PHP installation. - Interface names can be collected via several methods to keep them unique; "ifDescr", "ifIndex", "ifName", "Catalyst Port Name", or "MAC Address". ("ifIndex" is the one MRTG uses by default.) - Committed Information Rates (CIR) can be defined on any interface and graphs for it will clearly show the CIR so you can see if you're going over it. - The "ifSpeed" entry for any interface can be modified; some interfaces don't report an ifSpeed value, and others report the wrong one (like 100BaseT ethernet interfaces usually claiming to only be 10-megabit on my box). ifSpeed is used while generating graphs and is shown on the report page for each interface being reported on. - Human-readable aliases can be defined for all interfaces and hostnames, or the ones stored in the machine itself can be used. - Context-sensitive help buttons available on almost every page. - Graphs can be PNG or JPG format. - Uses MySQL to store all the gathered data. - Simultaneously supports SNMPv1 and SNMPv2 for 64-bit counters. (Read http://www.isthisthingon.org/nisca/SNMPv2.html to know *why* you would want 64-bit counters.) - Graph sizes can be customized. (Default is 700x250 pixels.) - Date and time formats can be customized to satisfy all possible users' locale/nationalization requirements. - All colors used to generate graphs can be customized, both the default set of colors and any single report request. - Uses either TrueType fonts or the built-in libGD fonts. (The use of TTFs require a bigger graph size in order to prevent text overlapping. It is suggested you use at least 750x200 with TTFs.) - NISCA collects statistics for transferred bytes, transferred packets, transmission errors, and dropped packet counts for both incoming and outgoing traffic on each monitored interface. - Collection interval can be customized from one second to whatever is needed. (ATTENTION! the misuse of this parameter can generate a lot of problems. Please read the FINE TUNING section.) - Ability to collect statistics from the PC where NISCA is running without the use of SNMP (though it appears only Linux with the proc filesystem supports this method). - SNMP setup uses actual SNMP data collected from the agent to make the list of interfaces you can choose to monitor. - There is no limit to the number of interfaces that can be monitored at the same time, other than disk space. (Please read the FINE TUNING section.) - Interface naming has little restriction. Names can include blanks and/or symbols. The only exceptions are "!" and "*"; these are used to separate interfaces from hostnames in certain places, so must not appear in either. - Different SNMP communities can be used on the same host for any of its interfaces. - Reports can be generated for any time period. - Reports can contain graphs and/or text summaries. - Graphs can be restricted to incoming data only, outgoing data only, or both. - Text summaries can be averaged/summarized per any number of seconds, minutes, hours, days, months, or years, and a grand total of all traffic for that period is displayed. - Uses persistent MySQL connections to cut down on overhead. - A script is provided to import data from your existing MRTG logs. (I'd be grateful if someone could show me how to import RRDTool logs.) - A script is provided to make sure your collectors stay running. You can also run them from crontab if you want to for some reason. - Report settings can be stored and reloaded at any time to speed up viewing the reports you view most often. - Most configuration options are kept in the database. The rest are in two .conf files which can be located anywhere on your filesystem you wish. - A Web Administration GUI is provided to configure NISCA. - A statistics deletion utility is provided to manage manual database entry removal/cleanup. - The collection script (snmp_collect) will cache collected data if its link to the database goes down, then keep collecting and caching data until it comes back up, then send it all when it can. How does one use it? -------------------- First, install Apache, MySQL, and PHP4 (see the PHP_HINTS file). Then, install NISCA (see the INSTALL file). Then, configure it (see the INSTALL file). Then, use it (see below). The form on the index.php page is fairly self-explanatory, but here are some things to keep in mind while using it. NOTE: ----- The list of hosts/interfaces to choose from is generated from the actual collected stats in the database, NOT the interfaces you have configured in the administration section! I'm hoping the reason for this is obvious. :) Also note that communities are not displayed or used at all on either the index page or the actual report page for security reasons, so if you monitor more than one community on the same host, and the same interface name exists in both communities, you'll only see it on this page *once*.. but a report generated for that interface will merge the stats for every community it exists under. Try to avoid monitoring the same interface via two different communities on the same host for this reason. If using the "fancy" Javascript host/interface selection method, here's what you do... 1) Pick the hostname out of the first select box. This will set the bottom-left select box to a list of the interfaces in every community on that host. 2) Click on an interface you want a report on. This will add it to the bottom-right select box. 3) Repeat as necessary, changing the hostname as needed to get the interfaces you want in the select box. 4) The "Clear Deselected" button will clear any interfaces from the bottom-right select box that aren't selected, but it tends to crash Netscape 4.7 on Xwindows, and Mozilla 0.8 has some major trouble with multiple select boxes, so don't rely on it. 5) The "Clear All" button erases all interfaces in the bottom-right list; it shouldn't crash anything. 6) Once you have the perfect list of interfaces to report on in the bottom-right box, set the other report options on the page as desired and submit it. If not using the fancy selection method, you'll get a list of every hostname/interface combination to pick from. This is more convenient if you only have a few interfaces monitored, but can be annoying if there are hundreds to choose from. You can turn off the fancy method in the Global Options config section. If you're using Mozilla, you'll then get to see the <OPTGROUP> tags in action; they separate each host to make the list easier to navigate. If you have any other browser in existence, they probably won't show up. (Even though OPTGROUP is in the HTML4.0 standard, Mozilla (and any other browsers based on that Gecko rendering engine thing) is the only browser claiming to support HTML4 that supports OPTGROUPs. That I could find, at least. I know all versions of Netscape and Internet Explorer (tm) don't support it even though they claim to be HTML4-compliant. Are you surprised?) If you have the "Select how much data to view here" drop-down box set to "A date/time range, set below", it will use the "from" and "to" dates and times at the bottom of that section to restrict which data to analyze; otherwise, it will ignore everything in the "From" and "To" boxes. If you have one of the options with an "X" in it selected, it will use whatever is in the "X = ___" field in place of the "X". So if you select "The past X hours" and put "3" in the "X = ___" field, you'll get a report covering the past three hours and nothing more. You might get something *less* though, since the odds are great that it won't return *exactly* three hours' worth of data; you're more likely to get two hours and 59 minutes' worth, depending on what you have the "$delay" set to. NISCA always uses the actual time stamps in the database rather than trying to force it to precisely match a particular time frame. You can also put decimals in it; for example, "1.5" in "X" and "Days" in the dropdown box will give you stats for the past day and a half. If you select "The entire contents of the database", that's exactly what you'll get... so be careful if you've got years and years of data collected four times a minute in it. NISCA ain't *that* fast yet... :) Sometimes, like during fsck-laden reboots or periods during which you didn't collect data, there will be gaps in the data. In this case, NISCA will point out the places where it filled in the intervening space as best it could by putting the From and To times in red. Its detection of this condition is done by adding twice the requested summary interval (last section of the form) to the previous timestamp and if that's still less than the current stamp, it will assume there was a gap and make it red just to call your attention to it. This doesn't catch all gaps, though, only the ones *larger* than the Summary Interval you specified on the report form. However, it always calculates averages using the *actual* time period of each line, so gaps are always averaged right whether the intervening time matches the requested summary interval or not. Each graph contains a red circle around the largest Y-axis values found on it, so you can quickly find the peaks. Peak values and times are placed on the top of the graph. Each graph generated is given a unique filename using a rather large random number, so every time you run it it'll give you a different image filename. This is thanks to the (mis)behavior of the caching mechanisms of almost all browsers. Also, every time you run it, any graphs older than one minute are deleted, so there shouldn't be any build-up of them. Reports can be saved under any name you wish. Once you've set the options on the form the way you want to save them, enter a name for the report (near the bottom of the page) and then hit the "Run It" button. The report options will be saved, then used to display the requested information. But if a report already exists with the name you choose, it won't be overwritten; you have to use the admin section to either delete it and then try to save it again, or save it under a different name (it will still show you the results of the options you chose, it just won't *save* them as that report name). To recall a saved report, just click on it in the drop-down list at the top of the index page. If you have Javascript disabled, you'll have to then click the "Run It" button to view it; if it's enabled, the report will be displayed as soon as you change the value of the drop-down list. Report administration is handled via the administration pages; that's the only place an existing report can be changed or deleted. One more thing; if you submit a report and then hit "escape" to stop loading it before it displays, the servers and scripts will continue to grind along working on it even though you'll never see the results. Try to avoid doing it... it can cause slowness. :) Oh. See the end of the INSTALL file for instructions on using the fancy new administration section. Fine Tuning NISCA ----------------- This will be one of the hardest tasks for a NISCA user, but all the people involved in developing and contributing to this project are working hard in order to provide as much information as possible. The NISCA user has to take into account many parameters in order to setup the COLLECTION INTERVAL and the number of hosts/interfaces monitored with NISCA. The interval is rather different than MRTG's, which is done via crontab and thus can generate overlapping statistics if collection takes longer than the crontab interval (300 seconds, usually). The way the collectors in NISCA work is, they will poll all your monitored hosts and THEN go to sleep for the delay interval you have configured; thus, if collection takes six minutes, and your delay is 5 minutes, the effective delay time will be eleven minutes. Running the command "snmp_collect t" will help you determine how long each collection cycle takes, and you can adjust your interval time accordingly. (The "t" puts it in debug mode.) Another thing about the collection interval. The smaller it is, the "fuzzier" your graphs will be. Anything less than 15 seconds or so will be just about useless. A 5-minute delay will probably look best on fast interfaces (and take up much less database space. :) People need to evaluate many parameters in order to not generate overload of the whole system (nisca, network, monitored hosts, etc.) Estimating all these parameters is very very complex, especially because various systems react in different ways to SNMP requests and the network conditions can change from moment to moment. DO NOT overestimate your setup's abilities! One thing about NISCA is that it uses memory, a lot of it, while it's generating reports for you (and only then). And the more datapoints being analyzed, the more memory it takes. This means you can quickly get several httpd processes taking up lots and lots of memory. To help fix this, I've changed the "MaxRequestsPerChild" setting in Apache's httpd.conf file from its default of "0" (unlimited) to "1". This will force every child server process to die as soon as it's done with its request, and thus it won't consume all your memory. Setting this to "2" or higher doesn't seem to do much good; the children don't die, and new children are spawned which will take up just as much space, so if you run four 60-meg reports one after the other you could bring your machine to a complete halt if it's set higher than "1". Your mileage, as always, may vary; tune it for you. This seems to be much more well-behaved with later Apache's (1.3.27 is what I use now and it plays nice). I've also seen PHP die with an error similar to "Maximum allowed memory usage exceeded" when viewing large reports. If this happens to you a lot, you can edit your "php.ini" file and change the max memory allowed (it defaults to 8 meg, 8388608). This setting is called "memory_limit". Don't forget to HUP or restart your web server if you change this. In the future, more technical details will be provided, but for now the user should start with a minimum setup: a Delay value of 300 seconds at first, then slowly increase the number of interfaces and decrease the Delay time in order to not overload both the NISCA server and the network(s) over which the server is polling the monitored hosts. Trial and error is the best way to see what you can get away with. Benchmark --------- The report generator currently generates a graph from 66,000 datapoints (230 days' worth) in about 6 seconds running on a 1.2Ghz AMD Thunderbird with 512M of RAM. A multi-interface report which adds the transfer averages of 2 interfaces over a one-month period (some 16,000 entries) takes 45 seconds (it's a much more intense operation). Your Mileage May Vary. It's the graph generation that takes so long. Yes, I'm working on ways of speeding it up... it ain't easy. Apparently the report generation time isn't entirely cumulative; getting reports one interface at a time takes more time than one report on many interfaces. As for disk space, the 1,000,000 entries in my database take up 104 meg of disk space in the form of MySQL tables/indices. Since I moved the hosts, communities, and interfaces to another table and now use medium integers to refer to them in the "stats" table, the disk space usage has been cut in half and response time of just about everything (except reports) has become instantaneous since it doesn't have to look through hundreds of thousands of rows to find every unique host/community/if now. Even graph generation speed has been doubled just from this one change. Just the opposite of the effect I thought it would have; live and learn, I always say. A Detailed Description of the Multiple-Interface Graphing Method ---------------------------------------------------------------- I'm including this just to satisfy people's curiosity. I'm sure there are other geeks out there who'd love to know how it works. So here we go... warning; it may get a bit technical. When I set out to write the multi-IF graphing code, I had no idea how complicated it was, or how simple the final solution would be. I had to rewrite it all from scratch four times to get it right, making all kinds of notes and diagrams and drawings and stuff. Here's what I finally came up with. First of all, I didn't want Nisca to do it the MRTG way: require you to be collecting pre-summed statistics from each interface desired before you can draw a graph of it. It just seemed silly to me, especially since it requires that you poll each interface TWICE... once for the regular single stats, and once again for the summed-interfaces stats. There had to be a way to take any existing set of statistics for any combination of interfaces on any number of hosts over any time period, whether all the interfaces involved had identical time periods or not, and add them together in the same time periods. My first attempt was horrible; I won't bore you with the grisly details of how a one- month report took half an hour and 500 meg of memory, and then delivered a graph that looked like something a drunk centipede had walked all over after wading through a few pools of paint. Let's just say, I wasn't satisfied. So after the second rewrite attempt, I'm sitting there staring into space trying to think of an answer, and I realized I was staring at a CD storage rack. And my mind whispered to me, "Pigeonholing!" Just make the entries fall into the right slots, and make the slots as wide as the collection interval. But even that delivered pretty shoddy results. And then I realized something else, something that probably would have occurred instantly to anyone who does statistical analysis for a living. Statistics are always measured in pairs. There's a starting point for both the counter itself (which is a running total) and the timestamp it was collected, and a corresponding ending point. You find the amount of traffic transferred by subtracting the earlier counter from the later counter. If the machine has rebooted in between them, this *should* result in a negative number; if that happens, the later counter is used by itself to determine the change in count; there's no way to know exactly how much data was transferred between the earlier counter and the later counter because it got zeroed out *somewhere* in between them. So in that case, the entire value of the later counter is used as the amount transferred between the two entries (because it was *at least* that many bytes, but probably a lot more). So once you have a value for the amount of bytes transferred between the two points, you figure out the interval between them; divide A by B, and you have the average. But as it turned out, that's useless. I had divided the report period up into "pigeonholes," or slots, that were as wide as the requested averaging interval (300 seconds, or 5 minutes, by default). Sometimes an entry would lie entirely within one slot; sometimes its starting point was in one slot and its ending was in the very next slot; and sometimes there was one or more slots without a datapoint in it in between them. So I re-re-rewrote it, again, and it worked. Imagine my shock. It keeps track of the time stamps of the current stat and the previous one. When start and end are in the same slot, it just adds that whole counter change to the slot and keeps going. When the end goes past a slot boundary, it starts doing math. It finds the time between the start time and the slot boundary and divides it by the time between the start and end times. This gives it the percentage of the total time which lies in the earlier slot. It multiplies that by the whole counter change between them, which tells it how much of the data belongs to the earlier slot, and it adds it to the earlier slot's counter array. Then there are two possibilities. It adds the averaging interval to the slot boundary. That will either put the boundary *past* the end time of the stat, or it won't... meaning there are intervening slots without a stat entry. If so, it repeats the earlier percentage operation, but instead it divides the value of the averaging interval by the total time of the entry and adds that to that slot. It does this until the slot boundary passes the ending timestamp. Once the slot boundary is past the ending timestamp, however it got there, it does the percentage thing again, this time using the time between the slot's beginning and the stat's end timestamp to calculate the percentage, which it multiplies by the total and adds to that slot's counter. And by the way, these percentages *can* be zero; that just means an entry's start or end lies exactly on a slot boundary, so 100% goes on one side and 0% goes on the other. Pretty neat how that worked out. Now, this has one unfortunate side-effect. The very last entry for an interface won't have an ending point with which to calculate a transferred count for the last slot. This means the last slot will almost always fall sharply downwards, since it will almost always have far less data transferred in it than all the previous slots. So when viewing these graphs, please don't panic; it doesn't mean all your interfaces went down sometime in the past five minutes or anything. :) Now, that was just for one type of data; incoming bytes, say. It has to do all that separately for the incoming and outgoing stats of every entry. And that's why it doesn't support making multiple-interface graphs of packets, or drops, or errors; not only is it kinda pointless, it would also mean another hundred lines of code for each added report type. Once it's done every entry, it passes the data to the makegraph() function, which converts them from counts to average per-second rates, just as it does for regular graphs. And that's a whole other story in itself. :) Who has actually contributed? ----------------------------- Pierfrancesco Caci (pf@gusp.dyndns.org) Fabio Massimo Di Nitto (fabio.m.d.nitto@ted.ericsson.dk) (It wouldn't have been possible without you two... :) Mark Motley (mmotley@la-mirada.net) Sean, of Hotlinks Internet Services (www.hotlinks.co.uk) Jimmy Kaplowitz (jimmy@kaplowitz.org) Eddy Lai Tomaso Vasella New ideas, requests, job offers, and comments are always welcome from everyone. Contributing to NISCA will only improve its quality. Oddities -------- We now have the first confirmed Nisca-related tech support call, made from Milano to Rome, Italy, at or about 4:00PM (Italy time, +0200) on Monday, June the 25th, 2001. If anyone knows of an earlier call, let me know. :) Mumbo-Jumbo ----------- This program is released under the GNU General Public License (see LICENSE). This means you have my permission to do anything you like with it except printing it out, rolling it up, and swatting your pet with it. I will not condone cruelty to animals. And if you make any money off of it, please think of my poor unemployed self and have pity on me as you count your millions. You know that really long boring bit about "as-is" and "merchantability" and "fitness for a particular purpose" and all that crap? Insert it here. Note that I am not affiliated with Team Nisca, who makes ID card printers; the National Interscholastic Swimming Coaches Association; the Northern Ireland Society for Computing in Anaesthesia; or the NISCA protocol, which is used to connect systems in an OpenVMS cluster. Anyone claiming otherwise will certainly be ridiculed into an embarassing extinction, because to me NISCA means "the Network Interface Statistics Collection Agent" and nothing more. Note that I am affiliated with isthisthingon.org, a very, very non-profit non-organization of no one in particular. Any resemblance to actual programs, living or dead, is purely coincidental. I ask you all to reflect upon how often form follows function. Contact Info-Mation ------------------- Author's email: phee@isthisthingon.org Official Site: http://nisca.sourceforge.net/ My ICQ #: 13130273