petalert.pl ----------------------- This is a snmptrapd handler script to alert when Platform Event Traps (PET) occur. It was written because traptoemail distributed with net-snmp-5.3.2.2 is incapable of handling multi-line hexstrings and restricted to email alert. This script operates in two modes, traphandle or embperl. When in traphandle mode, it concatenates the quoted hex string into one long line, then builds structures to resemble embperl mode. Both modes then invokes helper decoder, ipmi-pet(8) from FreeIPMI, parses the output and alerts in given way like email, nagios external command, nsca, etc. 1. REQUIREMENTS freeipmi-1.1.1 and above is required for the script to function. Both FreeIPMI and the script imply Unix-like system, notably GNU/Linux; Windows is not supported as of this writing, Dec 13, 2011. Net-SNMP 5.3.2.2 and above is required to make a complete alerting solution. Actually only snmptrapd is related which acts as the trap receiver. If you prefer to running it as embedded perl handler, your version of Net-SNMP should have embedded perl support enabled, see "Embedded Perl Support" from snmpd.conf(8) for more infomration. Usually it's enabled, and you can verify with the following command: # net-snmp-config --configure-options | tr ' ' '\n' | grep perl '--enable-embedded-perl' If you prefer to invoking the handler directly rather than invoking it with perl(1), make sure the script itself has execute permission. Both cases require a working Perl installation, better Perl-5.8.8. If you prefer to built-in email alerting, make sure Net::SMTP is installed. If you prefer to Nagios monitoring system, make sure the Nagios process and snmptrapd is on the same host. Usually you don't need to worry about write permission of Nagios external command file, because the handler is invoked as root by snmptrapd. If that's not your case, you need to ensure write permission on the command file. You might prefer to other alerting methods, bad news is it is not implemented yet. Please drop me a mail, then I might take my time to go on with plugin support. Paranoids might check firewall rules allowing only traffic from trusted hosts. 2. CONFIGURATION (Note backslash-newline concatenates adjacent lines, so put them in one) Put a line like these in your snmptrapd.conf file: traphandle .1.3.6.1.4.1.3183.1.1 /usr/bin/perl \ /usr/share/doc/freeipmi/contrib/pet/petalert.pl --mode=traphandle \ --alert=email --sdrcache SDRCONF -- -f FROM -s SMTPSERVER ADDRESSES Or, if you prefer embedded perl, perl do "/usr/share/doc/freeipmi/contrib/pet/petalert.pl"; perl IpmiPET::main(qw(--mode=embperl --trapoid=OID --sdrcache=SDRCONF \ --alert=email -- -s SMTPSERVER -f FROM ADDRESSES)); where: only --mode is required, see "petalert.pl -h". Make sure execute permission is granted to execute handlers, for example, authCommunity execute COMMUNITY_STRING see "ACCESS CONTROL" from snmptrapd.conf(8) for more information. Bad news is that you have to use numeric representation, so in addition add "-Of -On" to snmptrapd options. You have to enable PET on IPMI nodes as well, including LAN access, PEF alerting, community, alert policy and destination. You may use bmc-config and ipmi-pef-config from FreeIPMI to do those configuration. See "IPMI NODES". You might wish to set up PTR records for IPMI nodes, otherwise, snmptrapd reports <UNKNOWN> to traphandle and the script will fall back to use ip. 2.1 ACKNOWLEDGE Platform event trap is over UDP, you might worry about trap loss. IPMI spec allows the trap receiver to acknowledge the trap. Use --ack to acknowledge the trap before alerting. You may need workarounds for acknowledgement. See BUGS. So in a acknowlege setup, it might be like this: perl do "/usr/share/doc/freeipmi/contrib/pet/petalert.pl"; perl IpmiPET::main(qw(--mode=embperl --trapoid=OID --sdrcache=SDRCONF \ --ack -W malformedack \ --alert=email -- -s SMTPSERVER -f FROM ADDRESSES)); 2.2 NAGIOS INTEGRATION Nagios monitoring system could be plugged into by writing to its external command file as passive check. See ipminodes.cfg and check_rmcping for related Nagios configuration. Assuming Nagios process is local, use: perl do "/usr/share/doc/freeipmi/contrib/pet/petalert.pl"; perl IpmiPET::main(qw(--mode=embperl --trapoid=OID --sdrcache=SDRCONF \ --alert=nagios -- -H short -S PET NAGIOS_COMMAND_FILE)); where "-H short" means if 10.2.3.4 resolves to foo.example.com, Nagios passive check gets foo as host; use "-H fqdn" to pass foo.example.com to Nagios. In addition, "-S PET" sets service description. If Nagios process is on remote host, normally you turns to NSCA which consists of NSCA daemon on the Nagios host and the send_nsca client program. To alert by send_nsca, perl do "/usr/share/doc/freeipmi/contrib/pet/petalert.pl"; perl IpmiPET::main(qw(--mode=embperl --trapoid=OID --sdrcache=SDRCONF \ --alert=nsca -- --prog /usr/bin/send_nsca -H short -S PET \ -- -H NAGIOS_HOST -c SEND_NSCA_CONF)); Notice the unattached -- appears two times in the configuration line separating three steps of arguments processing, namely generic args, alert specific args, and external helper args. 3. SDR CACHE FILE MAPPING Notice the underlying helper program ipmi-pet(8) normally depends on some sdr cache file, either preinitialized or created on demand. If no credential is supplied, ipmi-pet(8) simply assumes localhost and creates sdr cache which is usually ~/.freeipmi/sdr-cache/sdr-cache-<hostname>.localhost. You may wish to supply preinitialized ones, then use -c sdrmapping.conf to associate them with IPMI nodes. The sdr cache config syntax is: every unindented line starts an sdr cache file, followed by any number of indented lines of IPMI nodes. Every IPMI node line may consist of multiple nodes delimited by whitespaces. Comments follow Shell-style, trailing whitespaces are trimmed, empty lines skipped. For example, |/path/to/sdr-cache-file-1 | 10.2.3.10 # comment | |/path/to/sdr-cache-file-2 | 10.2.3.4 # one node | 10.2.3.5 10.2.3.6 # two nodes | 10.2.3.[7-9] # trhee nodes in range form | ^-- this is the beginning of lines 3.1 SDR CACHE INITIALIZATION The sdr cache file can be initialized by ipmi-sel(8) and the --sdr-cache-file option. # ipmi-sel -h 10.2.3.4 -u root -P --sdr-cache-file=/path/to/sdr-cache-file-X Password: Caching SDR repository information: /path/to/sdr-cache-file-X Caching SDR record 125 of 125 (current record ID 125) ID | Date | Time | Name | Type | Event 1 | Dec-12-2011 | 16:41:51 | SEL | Event Logging Disabled | Log Area Reset/Cleared ... 4. IPMI NODES For PET to be generated, configurations on IPMI nodes have to be done, including LAN access, PEF alerting, trap community, alert policy and destination. You may use bmc-config and ipmi-pef-config from FreeIPMI to do those configuration. However, before doing configurations and facing unexpected firmware issues, you'd better verify that the trap receiver end works well. Simply modify the following example traphandle input to meet your setup, then feed it to stdin of petalert.pl like this, assuming you prefer to alert email: # perl petalert.pl -D :all --mode=traphandle --sdrcache SDRCONF \ --alert=email -- -f FROM -s SMTPSERVER ADDRESSES <<EOF pet.example.com UDP: [10.2.3.4]:32768 .1.3.6.1.2.1.1.3.0 60:5:11:46.26 .1.3.6.1.6.3.1.1.4.1.0 .1.3.6.1.4.1.3183.1.1.0.356096 .1.3.6.1.4.1.3183.1.1.1 "44 45 4C 4C 50 00 10 59 80 43 B2 C0 4F 33 33 58 00 42 19 EE AB 64 FF FF 20 20 00 41 73 18 00 80 01 FF 00 00 00 00 00 19 00 00 02 A2 01 00 C1 " .1.3.6.1.6.3.18.1.3.0 10.2.3.4 .1.3.6.1.6.3.18.1.4.0 "public" .1.3.6.1.6.3.1.1.4.3.0 .1.3.6.1.4.1.3183.1.1 EOF Then normally you will get a notification about the test input. Notice that the three-line octet string "44 45 ... C1 " includes one whitespace at the ending of each line, that's the way snmptrapd(8) works. 4.1 NODE CONFIGURATION You'd better save your customization, if any, before changing anything. Guys with factory default settings feel free to go on. FreeIPMI is outstanding with its ability to save your overall customization with --checkout option; there are several config programs, of which only bmc-config(8) and ipmi-pef-config(8) are needed in this senario. To save configurations, # bmc-config --checkout -f saved-bmc-config.txt # ipmi-pef-config --checkout -f saved-pef-config.txt Notice the checked out configuration is well commented, you can refer to it for help about the following configuration options. To observe differences between current configuration and pet configuration, # bmc-config --diff <<EOF Section Lan_Channel Volatile_Access_Mode Always_Available Volatile_Enable_Pef_Alerting Yes Non_Volatile_Access_Mode Always_Available Non_Volatile_Enable_Pef_Alerting Yes EndSection Section Lan_Conf IP_Address_Source Static IP_Address 10.2.3.4 Subnet_Mask 255.255.255.0 Default_Gateway_IP_Address 10.2.3.1 EndSection Section User2 Username root Password CHANGE_ME Enable_User Yes Lan_Privilege_Limit Administrator EndSection EOF # ipmi-pef-config --diff <<EOF Section Lan_Alert_Destination_1 Alert_Destination_Type PET_Trap Alert_Acknowledge Yes Alert_Acknowledge_Timeout 5 Alert_Retries 2 Alert_Gateway Default Alert_IP_Address 10.2.3.254 Alert_MAC_Address 00:00:00:00:00:00 EndSection Section Community_String Community_String COMMUNITY_STRING EndSection Section Alert_Policy_1 Policy_Type Always_Send_To_This_Destination Policy_Enabled Yes Policy_Number 1 Destination_Selector 1 Channel_Number 1 Alert_String_Set_Selector 0 Event_Specific_Alert_String No EndSection EOF Notice ipmi-pef-config(8) could be used out of band once IPMI LAN access is OK, which means you no longer need to logon the host to do further configuration. # ipmi-pef-config -h 10.2.3.4 -u root -P --diff <<EOF ... EOF Password: bmc-config(8) works out of band as well with the same prerequisite. In addition, FreeIPMI is featured by configuring many hosts in one run, see "HOSTRANGED SUPPORT" of ipmi-pef-config(8) and others from FreeIPMI. When you are sure about the changes, use --commit to apply. Then you can use bmc-config to generate a software generated platform event: # bmc-device --platform-event="41 04 05 73 6f assertion 80 01 ff" You should get notified about the event a few seconds later, otherwise check vendor rules. If the node is Dell PowerEdge R610 or 1950, you have to make a catch-all filter for the software generated event, # ipmi-pef-config --commit <<EOF Section Event_Filter_40 Filter_Type Software_Configurable Enable_Filter Yes Event_Filter_Action_Alert Yes Event_Filter_Action_Power_Off No Event_Filter_Action_Reset No Event_Filter_Action_Power_Cycle No Event_Filter_Action_Oem No Event_Filter_Action_Diagnostic_Interrupt No Event_Filter_Action_Group_Control_Operation No Alert_Policy_Number 1 Event_Severity Unspecified Group_Control_Selector 0 Generator_Id_Byte_1 0xFF Generator_Id_Byte_2 0xFF Sensor_Type Any Sensor_Number 0xFF Event_Trigger 0xFF Event_Data1_Offset_Mask 0xFFFF Event_Data1_AND_Mask 0x00 Event_Data1_Compare1 0xFF Event_Data1_Compare2 0x00 Event_Data2_AND_Mask 0x00 Event_Data2_Compare1 0xFF Event_Data2_Compare2 0x00 Event_Data3_AND_Mask 0x00 Event_Data3_Compare1 0xFF Event_Data3_Compare2 0x00 EndSection EOF If you still have problems you may wish to read "Figure 17-2 Event Filter, Alert Policy, and Alert Destination, String Relationships" from IPMIv2 spec, and/or check your network traffic. You might be the unlucky guy triggering unspotted firmware bugs. See BUGS. 5. LOGGING petalert.pl supports sophisticated logging. Use "-D token" to enable logging of specific parts, or "-D :all" to enable very verbose output. Internally it uses Data::Dumper to represent Perl structs, you can simply copy to files to evaluate. For example, an acknowledge request invocation was logged as [Sun Dec 11 13:47:51 2011] ack command line [ '/usr/sbin/ipmi-pet', [ '--pet-acknowledge', '-h', '10.2.3.4', '356096', '44', '45', '4C', '4C', ... ] ] The above request invocation could be rebuilt by # perl -le '@v=<STDIN>; $x=eval "@v"; print join(" ", $x->[0], @{$x->[1]})."\n"' [ '/usr/sbin/ipmi-pet', [ '--pet-acknowledge', '-h', '10.2.3.4', '356096', '44', '45', '4C', '4C', ... ] ] Ctrl-D /usr/sbin/ipmi-pet --pet-acknowledge -h 10.2.3.4 356096 44 45 4C 4C ... Then you could simply paste the command in the shell to simulate a manual acknowledge. Looks like acknowledge requests without previous PET is also accepted and responded as usual. snmptrapd(8) itself allows for logging of traps into syslog which requires log permission, see "ACCESS CONTROL" from snmptrapd.conf for more information. NSCA daemon logs to syslog, set "debug=1" in nsca.cfg to get detailed connection handling. Nagios is also able to log to syslog, set "use_syslog=1" in nagios.cfg to help debugging alert. 6. PET TRAFFIC On a no-acknowledge setup, usually there should be only one packet on behalf of the PET from the ipmi node targeting the trap receiver, however, firmware defect was spotted resulting in additional traffic, see BUGS. On an acknowledge setup, there should be three packets per event, one PET, one PET acknowledge request from trap receiver targeting the ipmi node, and one PET acknowledge response in the other direction. More bugs were spotted, see BUGS. Any setup, packets could be captured like this # tcpdump -i any -nn -vvv -s0 -w pet.pcap 'host 10.2.3.4 and udp' Then you can browse the interactions with the help of Wireshark. 7. BUGS It's spotted that factory default rules of iDRAC Express on Dell PowerEdge R610 don't match software generated events. You need to make a catch-all filter rule to report those events. However, hardware gernerated events are not subject to such limitation. To verify this situation, open the case to generate a hardware generated intrusion event. Dell PowerEdge 1950 with BMC has similiar problem. The difference is that 1950 has 31 filter rules, so you don't worry about overwriting an existent one. It's spotted that Dell PowerEdge 1950 with BMC suffers clock drift, remarkably SEL timestamps. bmc-device(8) from FreeIPMI could be used to adjust SEL time and SDR repository time, # bmc-device --set-sdr-repository-time=now # bmc-device --set-sel-time=now Notice that 'now' refers to current timestamp on the host where the commands are issued. bmc-device(8) works out of band, so simply issue the commands on a host where clock is synchronized. It's spotted that iDRAC Express on Dell PowerEdge R610 generates two traps per hardware event. Notice session id from the two traps differ, they are different traps instead of duplication, even though other contents of payload are identical. It's spotted that iDRAC Express on Dell PowerEdge R610 produces malformed PET acknowledge responses. In this case, ipmi-pet exits with timeout error "ipmi_cmd_pet_acknowledge: message timeout". You may use '-W malformedack', which is simply passed through, to instruct the underlying helper ipmi-pet(8) to disable such detection and to immediately return. Timeout hurts snmptrapd because slow handler hinders the main loop. To discover potential time consuming cases, use "-D perf" and observe the log. It's spotted that some DNS servers return "localhost" on private ip addresses rather than NXDOMAIN, in this case, snmptrapd(8) passes "localhost" as resolved hostname to petalert.pl which is confused. You'd better switch to a correctly configured DNS server, or contact the administrator to solve the problem. Kaiwang Chen kaiwang.chen@gmail.com