<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html xmlns:fn="http://www.w3.org/2005/02/xpath-functions"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <link rel="stylesheet" href="../otp_doc.css" type="text/css"> <title>Erlang -- Robustness</title> </head> <body bgcolor="white" text="#000000" link="#0000ff" vlink="#ff00ff" alink="#ff0000"><div id="container"> <script id="js" type="text/javascript" language="JavaScript" src="../js/flipmenu/flipmenu.js"></script><script id="js2" type="text/javascript" src="../js/erlresolvelinks.js"></script><script language="JavaScript" type="text/javascript"> <!-- function getWinHeight() { var myHeight = 0; if( typeof( window.innerHeight ) == 'number' ) { //Non-IE myHeight = window.innerHeight; } else if( document.documentElement && ( document.documentElement.clientWidth || document.documentElement.clientHeight ) ) { //IE 6+ in 'standards compliant mode' myHeight = document.documentElement.clientHeight; } else if( document.body && ( document.body.clientWidth || document.body.clientHeight ) ) { //IE 4 compatible myHeight = document.body.clientHeight; } return myHeight; } function setscrollpos() { var objf=document.getElementById('loadscrollpos'); document.getElementById("leftnav").scrollTop = objf.offsetTop - getWinHeight()/2; } function addEvent(obj, evType, fn){ if (obj.addEventListener){ obj.addEventListener(evType, fn, true); return true; } else if (obj.attachEvent){ var r = obj.attachEvent("on"+evType, fn); return r; } else { return false; } } addEvent(window, 'load', setscrollpos); //--></script><div id="leftnav"><div class="innertube"> <img alt="Erlang logo" src="../erlang-logo.png"><br><small><a href="users_guide.html">User's Guide</a><br><a href="../pdf/otp-system-documentation-5.9.3.1.pdf">PDF</a><br><a href="../index.html">Top</a></small><p><strong>Getting Started with Erlang</strong><br><strong>User's Guide</strong><br><small>Version 5.9.3.1</small></p> <br><a href="javascript:openAllFlips()">Expand All</a><br><a href="javascript:closeAllFlips()">Contract All</a><p><small><strong>Chapters</strong></small></p> <ul class="flipMenu" imagepath="../js/flipmenu"> <li id="no" title="Introduction" expanded="false">Introduction<ul> <li><a href="intro.html"> Top of chapter </a></li> <li title="Introduction"><a href="intro.html#id62060">Introduction</a></li> <li title="Things Left Out"><a href="intro.html#id60748">Things Left Out</a></li> </ul> </li> <li id="no" title="Sequential Programming" expanded="false">Sequential Programming<ul> <li><a href="seq_prog.html"> Top of chapter </a></li> <li title="The Erlang Shell"><a href="seq_prog.html#id62023">The Erlang Shell</a></li> <li title="Modules and Functions"><a href="seq_prog.html#id57046">Modules and Functions</a></li> <li title="Atoms"><a href="seq_prog.html#id57421">Atoms</a></li> <li title="Tuples"><a href="seq_prog.html#id62855">Tuples</a></li> <li title="Lists"><a href="seq_prog.html#id57425">Lists</a></li> <li title="Standard Modules and Manual Pages"><a href="seq_prog.html#id63464">Standard Modules and Manual Pages</a></li> <li title="Writing Output to a Terminal"><a href="seq_prog.html#id63511">Writing Output to a Terminal</a></li> <li title="A Larger Example"><a href="seq_prog.html#id65303">A Larger Example</a></li> <li title="Matching, Guards and Scope of Variables"><a href="seq_prog.html#id63164">Matching, Guards and Scope of Variables</a></li> <li title="More About Lists"><a href="seq_prog.html#id63431">More About Lists</a></li> <li title="If and Case"><a href="seq_prog.html#id66137">If and Case</a></li> <li title="Built In Functions (BIFs)"><a href="seq_prog.html#id66368">Built In Functions (BIFs)</a></li> <li title="Higher Order Functions (Funs)"><a href="seq_prog.html#id66559">Higher Order Functions (Funs)</a></li> </ul> </li> <li id="no" title="Concurrent Programming" expanded="false">Concurrent Programming<ul> <li><a href="conc_prog.html"> Top of chapter </a></li> <li title="Processes"><a href="conc_prog.html#id66869">Processes</a></li> <li title="Message Passing"><a href="conc_prog.html#id67007">Message Passing</a></li> <li title="Registered Process Names"><a href="conc_prog.html#id67348">Registered Process Names</a></li> <li title="Distributed Programming"><a href="conc_prog.html#id67451">Distributed Programming</a></li> <li title="A Larger Example"><a href="conc_prog.html#id67712">A Larger Example</a></li> </ul> </li> <li id="loadscrollpos" title="Robustness" expanded="true">Robustness<ul> <li><a href="robustness.html"> Top of chapter </a></li> <li title="Timeouts"><a href="robustness.html#id68419">Timeouts</a></li> <li title="Error Handling"><a href="robustness.html#id68545">Error Handling</a></li> <li title="The Larger Example with Robustness Added"><a href="robustness.html#id68732">The Larger Example with Robustness Added</a></li> </ul> </li> <li id="no" title="Records and Macros" expanded="false">Records and Macros<ul> <li><a href="record_macros.html"> Top of chapter </a></li> <li title="The Larger Example Divided into Several Files"><a href="record_macros.html#id68928">The Larger Example Divided into Several Files</a></li> <li title="Header Files"><a href="record_macros.html#id69078">Header Files</a></li> <li title="Records"><a href="record_macros.html#id69123">Records</a></li> <li title="Macros"><a href="record_macros.html#id69217">Macros</a></li> </ul> </li> </ul> </div></div> <div id="content"> <div class="innertube"> <h1>4 Robustness</h1> <p>There are several things which are wrong with the <span class="bold_code"><a href="conc_prog.html#ex">messenger example</a></span> from the previous chapter. For example if a node where a user is logged on goes down without doing a log off, the user will remain in the server's <span class="code">User_List</span> but the client will disappear thus making it impossible for the user to log on again as the server thinks the user already logged on.</p> <p>Or what happens if the server goes down in the middle of sending a message leaving the sending client hanging for ever in the <span class="code">await_result</span> function?</p> <h3><a name="id68419">4.1 Timeouts</a></h3> <p>Before improving the messenger program we will look into some general principles, using the ping pong program as an example. Recall that when "ping" finishes, it tells "pong" that it has done so by sending the atom <span class="code">finished</span> as a message to "pong" so that "pong" could also finish. Another way to let "pong" finish, is to make "pong" exit if it does not receive a message from ping within a certain time, this can be done by adding a <strong>timeout</strong> to <span class="code">pong</span> as shown in the following example:</p> <div class="example"><pre> -module(tut19). -export([start_ping/1, start_pong/0, ping/2, pong/0]). ping(0, Pong_Node) -> io:format("ping finished~n", []); ping(N, Pong_Node) -> {pong, Pong_Node} ! {ping, self()}, receive pong -> io:format("Ping received pong~n", []) end, ping(N - 1, Pong_Node). pong() -> receive {ping, Ping_PID} -> io:format("Pong received ping~n", []), Ping_PID ! pong, pong() after 5000 -> io:format("Pong timed out~n", []) end. start_pong() -> register(pong, spawn(tut19, pong, [])). start_ping(Pong_Node) -> spawn(tut19, ping, [3, Pong_Node]).</pre></div> <p>After we have compiled this and copied the <span class="code">tut19.beam</span> file to the necessary directories:</p> <p>On (pong@kosken):</p> <div class="example"><pre> (pong@kosken)1> <span class="bold_code">tut19:start_pong().</span> true Pong received ping Pong received ping Pong received ping Pong timed out</pre></div> <p>On (ping@gollum):</p> <div class="example"><pre> (ping@gollum)1> <span class="bold_code">tut19:start_ping(pong@kosken).</span> <0.36.0> Ping received pong Ping received pong Ping received pong ping finished </pre></div> <p>(The timeout is set in:</p> <div class="example"><pre> pong() -> receive {ping, Ping_PID} -> io:format("Pong received ping~n", []), Ping_PID ! pong, pong() after 5000 -> io:format("Pong timed out~n", []) end.</pre></div> <p>We start the timeout (<span class="code">after 5000</span>) when we enter <span class="code">receive</span>. The timeout is canceled if <span class="code">{ping,Ping_PID}</span> is received. If <span class="code">{ping,Ping_PID}</span> is not received, the actions following the timeout will be done after 5000 milliseconds. <span class="code">after</span> must be last in the <span class="code">receive</span>, i.e. preceded by all other message reception specifications in the <span class="code">receive</span>. Of course we could also call a function which returned an integer for the timeout:</p> <div class="example"><pre> after pong_timeout() -></pre></div> <p>In general, there are better ways than using timeouts to supervise parts of a distributed Erlang system. Timeouts are usually appropriate to supervise external events, for example if you have expected a message from some external system within a specified time. For example, we could use a timeout to log a user out of the messenger system if they have not accessed it, for example, in ten minutes.</p> <h3><a name="id68545">4.2 Error Handling</a></h3> <p>Before we go into details of the supervision and error handling in an Erlang system, we need see how Erlang processes terminate, or in Erlang terminology, <strong>exit</strong>.</p> <p>A process which executes <span class="code">exit(normal)</span> or simply runs out of things to do has a <strong>normal</strong> exit.</p> <p>A process which encounters a runtime error (e.g. divide by zero, bad match, trying to call a function which doesn't exist etc) exits with an error, i.e. has an <strong>abnormal</strong> exit. A process which executes <span class="bold_code"><a href="javascript:erlhref('../../','erts','erlang.html#exit-1');">exit(Reason)</a></span> where <span class="code">Reason</span> is any Erlang term except the atom <span class="code">normal</span>, also has an abnormal exit.</p> <p>An Erlang process can set up links to other Erlang processes. If a process calls <span class="bold_code"><a href="javascript:erlhref('../../','erts','erlang.html#link-1');">link(Other_Pid)</a></span> it sets up a bidirectional link between itself and the process called <span class="code">Other_Pid</span>. When a process terminates, it sends something called a <strong>signal</strong> to all the processes it has links to.</p> <p>The signal carries information about the pid it was sent from and the exit reason.</p> <p>The default behaviour of a process which receives a normal exit is to ignore the signal.</p> <p>The default behaviour in the two other cases (i.e. abnormal exit) above is to bypass all messages to the receiving process and to kill it and to propagate the same error signal to the killed process' links. In this way you can connect all processes in a transaction together using links and if one of the processes exits abnormally, all the processes in the transaction will be killed. As we often want to create a process and link to it at the same time, there is a special BIF, <span class="bold_code"><a href="javascript:erlhref('../../','erts','erlang.html#spawn_link-1');">spawn_link</a></span> which does the same as <span class="code">spawn</span>, but also creates a link to the spawned process.</p> <p>Now an example of the ping pong example using links to terminate "pong":</p> <div class="example"><pre> -module(tut20). -export([start/1, ping/2, pong/0]). ping(N, Pong_Pid) -> link(Pong_Pid), ping1(N, Pong_Pid). ping1(0, _) -> exit(ping); ping1(N, Pong_Pid) -> Pong_Pid ! {ping, self()}, receive pong -> io:format("Ping received pong~n", []) end, ping1(N - 1, Pong_Pid). pong() -> receive {ping, Ping_PID} -> io:format("Pong received ping~n", []), Ping_PID ! pong, pong() end. start(Ping_Node) -> PongPID = spawn(tut20, pong, []), spawn(Ping_Node, tut20, ping, [3, PongPID]).</pre></div> <div class="example"><pre> (s1@bill)3> <span class="bold_code">tut20:start(s2@kosken).</span> Pong received ping <3820.41.0> Ping received pong Pong received ping Ping received pong Pong received ping Ping received pong</pre></div> <p>This is a slight modification of the ping pong program where both processes are spawned from the same <span class="code">start/1</span> function, where the "ping" process can be spawned on a separate node. Note the use of the <span class="code">link</span> BIF. "Ping" calls <span class="code">exit(ping)</span> when it finishes and this will cause an exit signal to be sent to "pong" which will also terminate.</p> <p>It is possible to modify the default behaviour of a process so that it does not get killed when it receives abnormal exit signals, but all signals will be turned into normal messages on the format <span class="code">{'EXIT',FromPID,Reason}</span> and added to the end of the receiving processes message queue. This behaviour is set by:</p> <div class="example"><pre> process_flag(trap_exit, true)</pre></div> <p>There are several other process flags, see <span class="bold_code"><a href="javascript:erlhref('../../','erts','erlang.html#process_flag-2');">erlang(3)</a></span>. Changing the default behaviour of a process in this way is usually not done in standard user programs, but is left to the supervisory programs in OTP (but that's another tutorial). However we will modify the ping pong program to illustrate exit trapping.</p> <div class="example"><pre> -module(tut21). -export([start/1, ping/2, pong/0]). ping(N, Pong_Pid) -> link(Pong_Pid), ping1(N, Pong_Pid). ping1(0, _) -> exit(ping); ping1(N, Pong_Pid) -> Pong_Pid ! {ping, self()}, receive pong -> io:format("Ping received pong~n", []) end, ping1(N - 1, Pong_Pid). pong() -> process_flag(trap_exit, true), pong1(). pong1() -> receive {ping, Ping_PID} -> io:format("Pong received ping~n", []), Ping_PID ! pong, pong1(); {'EXIT', From, Reason} -> io:format("pong exiting, got ~p~n", [{'EXIT', From, Reason}]) end. start(Ping_Node) -> PongPID = spawn(tut21, pong, []), spawn(Ping_Node, tut21, ping, [3, PongPID]).</pre></div> <div class="example"><pre> (s1@bill)1> <span class="bold_code">tut21:start(s2@gollum).</span> <3820.39.0> Pong received ping Ping received pong Pong received ping Ping received pong Pong received ping Ping received pong pong exiting, got {'EXIT',<3820.39.0>,ping}</pre></div> <h3><a name="id68732">4.3 The Larger Example with Robustness Added</a></h3> <p>Now we return to the messenger program and add changes which make it more robust:</p> <div class="example"><pre> %%% Message passing utility. %%% User interface: %%% login(Name) %%% One user at a time can log in from each Erlang node in the %%% system messenger: and choose a suitable Name. If the Name %%% is already logged in at another node or if someone else is %%% already logged in at the same node, login will be rejected %%% with a suitable error message. %%% logoff() %%% Logs off anybody at at node %%% message(ToName, Message) %%% sends Message to ToName. Error messages if the user of this %%% function is not logged on or if ToName is not logged on at %%% any node. %%% %%% One node in the network of Erlang nodes runs a server which maintains %%% data about the logged on users. The server is registered as "messenger" %%% Each node where there is a user logged on runs a client process registered %%% as "mess_client" %%% %%% Protocol between the client processes and the server %%% ---------------------------------------------------- %%% %%% To server: {ClientPid, logon, UserName} %%% Reply {messenger, stop, user_exists_at_other_node} stops the client %%% Reply {messenger, logged_on} logon was successful %%% %%% When the client terminates for some reason %%% To server: {'EXIT', ClientPid, Reason} %%% %%% To server: {ClientPid, message_to, ToName, Message} send a message %%% Reply: {messenger, stop, you_are_not_logged_on} stops the client %%% Reply: {messenger, receiver_not_found} no user with this name logged on %%% Reply: {messenger, sent} Message has been sent (but no guarantee) %%% %%% To client: {message_from, Name, Message}, %%% %%% Protocol between the "commands" and the client %%% ---------------------------------------------- %%% %%% Started: messenger:client(Server_Node, Name) %%% To client: logoff %%% To client: {message_to, ToName, Message} %%% %%% Configuration: change the server_node() function to return the %%% name of the node where the messenger server runs -module(messenger). -export([start_server/0, server/0, logon/1, logoff/0, message/2, client/2]). %%% Change the function below to return the name of the node where the %%% messenger server runs server_node() -> messenger@super. %%% This is the server process for the "messenger" %%% the user list has the format [{ClientPid1, Name1},{ClientPid22, Name2},...] server() -> process_flag(trap_exit, true), server([]). server(User_List) -> receive {From, logon, Name} -> New_User_List = server_logon(From, Name, User_List), server(New_User_List); {'EXIT', From, _} -> New_User_List = server_logoff(From, User_List), server(New_User_List); {From, message_to, To, Message} -> server_transfer(From, To, Message, User_List), io:format("list is now: ~p~n", [User_List]), server(User_List) end. %%% Start the server start_server() -> register(messenger, spawn(messenger, server, [])). %%% Server adds a new user to the user list server_logon(From, Name, User_List) -> %% check if logged on anywhere else case lists:keymember(Name, 2, User_List) of true -> From ! {messenger, stop, user_exists_at_other_node}, %reject logon User_List; false -> From ! {messenger, logged_on}, link(From), [{From, Name} | User_List] %add user to the list end. %%% Server deletes a user from the user list server_logoff(From, User_List) -> lists:keydelete(From, 1, User_List). %%% Server transfers a message between user server_transfer(From, To, Message, User_List) -> %% check that the user is logged on and who he is case lists:keysearch(From, 1, User_List) of false -> From ! {messenger, stop, you_are_not_logged_on}; {value, {_, Name}} -> server_transfer(From, Name, To, Message, User_List) end. %%% If the user exists, send the message server_transfer(From, Name, To, Message, User_List) -> %% Find the receiver and send the message case lists:keysearch(To, 2, User_List) of false -> From ! {messenger, receiver_not_found}; {value, {ToPid, To}} -> ToPid ! {message_from, Name, Message}, From ! {messenger, sent} end. %%% User Commands logon(Name) -> case whereis(mess_client) of undefined -> register(mess_client, spawn(messenger, client, [server_node(), Name])); _ -> already_logged_on end. logoff() -> mess_client ! logoff. message(ToName, Message) -> case whereis(mess_client) of % Test if the client is running undefined -> not_logged_on; _ -> mess_client ! {message_to, ToName, Message}, ok end. %%% The client process which runs on each user node client(Server_Node, Name) -> {messenger, Server_Node} ! {self(), logon, Name}, await_result(), client(Server_Node). client(Server_Node) -> receive logoff -> exit(normal); {message_to, ToName, Message} -> {messenger, Server_Node} ! {self(), message_to, ToName, Message}, await_result(); {message_from, FromName, Message} -> io:format("Message from ~p: ~p~n", [FromName, Message]) end, client(Server_Node). %%% wait for a response from the server await_result() -> receive {messenger, stop, Why} -> % Stop the client io:format("~p~n", [Why]), exit(normal); {messenger, What} -> % Normal response io:format("~p~n", [What]) after 5000 -> io:format("No response from server~n", []), exit(timeout) end.</pre></div> <p>We have added the following changes:</p> <p>The messenger server traps exits. If it receives an exit signal, <span class="code">{'EXIT',From,Reason}</span> this means that a client process has terminated or is unreachable because:</p> <ul> <li>the user has logged off (we have removed the "logoff" message),</li> <li>the network connection to the client is broken,</li> <li>the node on which the client process resides has gone down, or</li> <li>the client processes has done some illegal operation.</li> </ul> <p>If we receive an exit signal as above, we delete the tuple, <span class="code">{From,Name}</span> from the servers <span class="code">User_List</span> using the <span class="code">server_logoff</span> function. If the node on which the server runs goes down, an exit signal (automatically generated by the system), will be sent to all of the client processes: <span class="code">{'EXIT',MessengerPID,noconnection}</span> causing all the client processes to terminate.</p> <p>We have also introduced a timeout of five seconds in the <span class="code">await_result</span> function. I.e. if the server does not reply within five seconds (5000 ms), the client terminates. This is really only needed in the logon sequence before the client and server are linked.</p> <p>An interesting case is if the client was to terminate before the server links to it. This is taken care of because linking to a non-existent process causes an exit signal, <span class="code">{'EXIT',From,noproc}</span>, to be automatically generated as if the process terminated immediately after the link operation.</p> </div> <div class="footer"> <hr> <p>Copyright © 1996-2012 Ericsson AB. All Rights Reserved.</p> </div> </div> </div></body> </html>