<!doctype html public "-//W3C//DTD HTML 4.0//EN"> <html> <head> <title>Trouble Shooting Apcupsd</title> <link rel=stylesheet href="apcupsd-styles.css" type="text/css"> </head> <body> <h1>Trouble Shooting Apcupsd</h1> <h2>Testing</h2> The first step in trouble shooting <b>apcupsd</b> is to read the <a href="testing.html">Testing Apcupsd</a> section of this manual. <h2>Network Problems with Mater/Slave Configurations</h2> When working with a master/slave configuration (one UPS powering more than one computer), the master and slave communicate via the network. In many configurations, <b>apcupsd</b> is started before the network is initialized. In this case, it is possible that the master will be unable to contact the slave. On <b>apcupsd</b> versions prior to 3.8.0, this could cause <b>apcupsd</b> to error off. The solution to this problem is to either force <b>apcupsd</b> to be started after the network and the DNS (fiddle the symbolic links in /etc/rc.d), or put the names of the slave machines in your <b>/etc/hosts</b> file, or even more preferable, use IP addresses rather than machine names. On some configurations, you may need to use fully qualified names (host.domain.xxx) rather than simple host names. <h2>Error Messages from a Master Configuration</h2> In a master/slave configuration, you can get the following error messages from a master. The error message is followed by a possible explanation: <p class=tty>Cannot resolve slave name XXX</p> To contact the slave, the slave name given in the configuration file must be resolved to an IP address. In this case, <b>apcupsd</b> could not get the IP address. Either the slave name is incorrect, your DNS may not be working, or you have started <b>apcupsd</b> during the boot process before the network is operational. <p class=tty>Got slave shutdown from SSS</p> This message should not be printed as it is not yet used. <p class=tty>Cannot write to slave SSS</p> This message occurs when the master attempts to send a message to the slave SSS and gets an error. It indicates that either the slave machine is not responding (<b>apcupsd</b> died, the system crashed, ...) or that the network is down. <p class=tty>Cannot read magic from slave SSS</p> This message indicates that the master attempted to read the code key from the slave SSS and it did not match the value expected. A common cause of this problem is that the master and slave versions of <b>apcupsd</b> are not the same. Please be sure you are running the same version of <b>apcupsd</b> on all your master and slave machines. <p class=tty>Connect to slave SSS failed</p> This message is logged when the master attempts to connect to slave SSS and no connection is accepted. The most common cause of this problem is that the slave copy of <b>apcuspd</b> is not yet ready to accept connections or is not running. Generally, <b>apcupsd</b> will retry the connection a bit later. If the problem is persistent, it can indicate a network problem or the slave name on the SLAVE directive of the master's configuration file is incorrect. <p class=tty>Cannot open stream socket</p> This indicates a fundamental networking problem on your system -- either a lack of sufficient resources or you have not configured TCP/IP operations. <h2>Error Messages from a Slave Configuration</h2> In a master/slave configuration, you can get the following error messages from a slave. The error message is followed by a possible explanation: <p class=tty>Can't resolve master name MMM</p> This message is logged when the slave attempts to resolve the name given on the MASTER configuration directive to an IP address. It probably means that the master name MMM is not defined, your DNS is not properly working, or you have started <b>apcupsd</b> in the boot process before the network is initialized. Check the name MMM, or use an explicit IP address on the MASTER configuration directive in the slave's configuration file. <p class=tty>Cannot bind local address, probably already in use</p> This means that the slave has attempted to bind the port number so that it can listen for messages from the master. This can occur if already have a copy of <b>apcupsd</b> running, or you have previously run <b>apcupsd</b> in the past 5 or 10 minutes, because occasionally the operating system will not shutdown a port correctly for 5 to 10 minutes after a program exits. In this case, you can either wait a few minutes for the problem to go away, or use a different port in both your master and slave configuration files. <p class=tty>Socket accept error</p> The slave got an error waiting on the accept() system call. This is probably due to a fundamental networking problem. <p class=tty>Unauthorized attempt from master MMM</p> The master named MMM (probably an IP address) contacted the slave but MMM is not the master that was listed on the MASTER configuration directive in /etc/apcupsd.conf, and consequently, it is not authorized to communicate with the slave. Please check that your MASTER and SLAVE names in your slave and master configuration files respectively are correct. <p class=tty>Read failure from socket</p> The slave got an error reading the socket open to the master. This indicates a fundamental networking problem. <p class=tty>Bad APC magic from master: MMM</p> The slave received a code key from the master that does not correspond to the one expected by the slave. The most common cause of this problem is that you are running a different version of <b>apcupsd</b> on the master and the slave. Please ensure that you are running the same version of <b>apcupsd</b> on all your master and slaves. <p class=tty>Bad user magic from master: MMM</p> This message indicates that the master and slave have previously communicated, but that the code key transmitted with the most recent message from the master does not correspond to what the slave expects. This problem is probably due to a network error or some other user or machine contacting the slave on the network port. <h2>Master/Slave Connection Not Working</h2> Master/slave problems are usually related to one of the following items: <ol type="1"> <li>Improper apcupsd.conf files. A good starting point are the master/slave example files in the examples subdirectory of the source.</li> <li>Master or slave IP address or name incorrect. Try ping'ing each machine from the other using the names or addresses that you have put in the respective apcupsd.conf files.</li> <li>Make sure no other program is using socket number 6666 or change the NETPORT directive in both apcupsd.conf files.</li> <li>Make sure you are using the same version of apcupsd on both the master and slave machines.</li> </ol> <h2>CGI Programs Do Not Work</h2> Try checking the following: <ol type="1"> <li>Did you successfully compile and link the cgi programs without errors? If not sure, cd to the cgi directory, do a "make clean" followed by a "make"</li> <li>Did you move or copy all the .cgi programs in the cgi directory to your Web server cgi-bin directory on the SAME machine?</li> <li>Did you verify that the cgi programs located in the cgi-bin all have execute permission?</li> <li>Have you tried any other cgi programs and proven that they work?</li> <li>Have you verified that the Network Information Server process of <b>apcupsd</b> is running as described in this manual?</li> <li>Have you verified that your <b>apcupsd.conf</b> file is properly configured for the Network Information Server and that the port is defined as 7000? I.e. "NETSERVER on" and "SERVERPORT 7000"</li> <li>If one or more machines does not show up in the multimon output, it is most likely due to a configuration error in the <b>hosts.conf</b> file in your <b>/etc/apcupsd</b> directory.</li> </ol> <h2>Battery Problems</h2> Please see the <a href="batteries.html">Battery Chapter</a> of this document for more details. <h2>Cable or Connection Problems</h2> Frequently during the initial installation, users don't know what cable they have or have problems connecting to the serial port. If this is your case, one means of diagnosing the problem can be to use the <b>apctest</b> program. To do so, you must first build it with: <p class="tty">make apctest</p> Then, you simply execute it with: <p class="tty">./apctest</p> and follow the instructions. It will place the output from the session in the file <b>apctest.output</b>. If you are not able to resolve your problem, sometimes we can help if you email us this output file along with your <b>apcupsd.conf</b> file. Please see the <a href="testing.html">Testing Chapter</a> of this document for additional details on how to build and use <b>apctest</b>. <h2>Bizarre Intermittent Behavior</h2> In one case, a user reported that he received random incorrect values from the UPS in the status output. It turned out that <b>gpm</b>, the mouse control program for command windows, was using the serial port without using the standard Unix locking mechanism. As a consequence, both <b>apcupsd</b> and <b>gpm</b> were reading the serial port. Please ensure that if you are running <b>gpm</b> that it is not configured with a serial port mouse on the same serial port. <hr> <a href="testing.html" target="_self"><img src="back.gif" border=0 alt="Back"></a> <a href="shutdown.html" target="_self"><img src="next.gif" border=0 alt="Next"></a> <a href="index.html"><img src="home.gif" border=0 alt="Home"></a> </body> </html>