Sophie

Sophie

distrib > Fedora > 17 > i386 > media > updates > by-pkgid > 675c8c8167236dfcf8d66da674f931e8 > files > 53

erlang-doc-R15B-03.3.fc17.noarch.rpm

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html xmlns:fn="http://www.w3.org/2005/02/xpath-functions">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel="stylesheet" href="../otp_doc.css" type="text/css">
<title>Erlang -- Profiling</title>
</head>
<body bgcolor="white" text="#000000" link="#0000ff" vlink="#ff00ff" alink="#ff0000"><div id="container">
<script id="js" type="text/javascript" language="JavaScript" src="../js/flipmenu/flipmenu.js"></script><script id="js2" type="text/javascript" src="../js/erlresolvelinks.js"></script><script language="JavaScript" type="text/javascript">
            <!--
              function getWinHeight() {
                var myHeight = 0;
                if( typeof( window.innerHeight ) == 'number' ) {
                  //Non-IE
                  myHeight = window.innerHeight;
                } else if( document.documentElement && ( document.documentElement.clientWidth ||
                                                         document.documentElement.clientHeight ) ) {
                  //IE 6+ in 'standards compliant mode'
                  myHeight = document.documentElement.clientHeight;
                } else if( document.body && ( document.body.clientWidth || document.body.clientHeight ) ) {
                  //IE 4 compatible
                  myHeight = document.body.clientHeight;
                }
                return myHeight;
              }

              function setscrollpos() {
                var objf=document.getElementById('loadscrollpos');
                 document.getElementById("leftnav").scrollTop = objf.offsetTop - getWinHeight()/2;
              }

              function addEvent(obj, evType, fn){
                if (obj.addEventListener){
                obj.addEventListener(evType, fn, true);
                return true;
              } else if (obj.attachEvent){
                var r = obj.attachEvent("on"+evType, fn);
                return r;
              } else {
                return false;
              }
             }

             addEvent(window, 'load', setscrollpos);

             //--></script><div id="leftnav"><div class="innertube">
<img alt="Erlang logo" src="../erlang-logo.png"><br><small><a href="users_guide.html">User's Guide</a><br><a href="../pdf/otp-system-documentation-5.9.3.1.pdf">PDF</a><br><a href="../index.html">Top</a></small><p><strong>Efficiency Guide</strong><br><strong>User's Guide</strong><br><small>Version 5.9.3.1</small></p>
<br><a href="javascript:openAllFlips()">Expand All</a><br><a href="javascript:closeAllFlips()">Contract All</a><p><small><strong>Chapters</strong></small></p>
<ul class="flipMenu" imagepath="../js/flipmenu">
<li id="no" title="Introduction" expanded="false">Introduction<ul>
<li><a href="introduction.html">
              Top of chapter
            </a></li>
<li title="Purpose"><a href="introduction.html#id62954">Purpose</a></li>
<li title="Prerequisites"><a href="introduction.html#id65420">Prerequisites</a></li>
</ul>
</li>
<li id="no" title="The Eight Myths of Erlang Performance" expanded="false">The Eight Myths of Erlang Performance<ul>
<li><a href="myths.html">
              Top of chapter
            </a></li>
<li title="Myth: Funs are slow"><a href="myths.html#id64448">Myth: Funs are slow</a></li>
<li title="Myth: List comprehensions are slow"><a href="myths.html#id61406">Myth: List comprehensions are slow</a></li>
<li title="Myth: Tail-recursive functions are MUCH faster
    than recursive functions"><a href="myths.html#id62291">Myth: Tail-recursive functions are MUCH faster
    than recursive functions</a></li>
<li title="Myth: '++' is always bad"><a href="myths.html#id60058">Myth: '++' is always bad</a></li>
<li title="Myth: Strings are slow"><a href="myths.html#id62534">Myth: Strings are slow</a></li>
<li title="Myth: Repairing a Dets file is very slow"><a href="myths.html#id61135">Myth: Repairing a Dets file is very slow</a></li>
<li title="Myth: BEAM is a stack-based byte-code virtual machine (and therefore slow)"><a href="myths.html#id64006">Myth: BEAM is a stack-based byte-code virtual machine (and therefore slow)</a></li>
<li title="Myth: Use '_' to speed up your program when a variable is not used"><a href="myths.html#id64026">Myth: Use '_' to speed up your program when a variable is not used</a></li>
</ul>
</li>
<li id="no" title="Common Caveats" expanded="false">Common Caveats<ul>
<li><a href="commoncaveats.html">
              Top of chapter
            </a></li>
<li title="The timer module"><a href="commoncaveats.html#id62547">The timer module</a></li>
<li title="list_to_atom/1"><a href="commoncaveats.html#id61885">list_to_atom/1</a></li>
<li title="length/1"><a href="commoncaveats.html#id60806">length/1</a></li>
<li title="setelement/3"><a href="commoncaveats.html#id58300">setelement/3</a></li>
<li title="size/1"><a href="commoncaveats.html#id65402">size/1</a></li>
<li title="split_binary/2"><a href="commoncaveats.html#id60836">split_binary/2</a></li>
<li title="The '--' operator"><a href="commoncaveats.html#id61766">The '--' operator</a></li>
</ul>
</li>
<li id="no" title="Constructing and matching binaries" expanded="false">Constructing and matching binaries<ul>
<li><a href="binaryhandling.html">
              Top of chapter
            </a></li>
<li title="How binaries are implemented"><a href="binaryhandling.html#id62932">How binaries are implemented</a></li>
<li title="Constructing binaries"><a href="binaryhandling.html#id66066">Constructing binaries</a></li>
<li title="Matching binaries"><a href="binaryhandling.html#id63610">Matching binaries</a></li>
</ul>
</li>
<li id="no" title="List handling" expanded="false">List handling<ul>
<li><a href="listHandling.html">
              Top of chapter
            </a></li>
<li title="Creating a list"><a href="listHandling.html#id66709">Creating a list</a></li>
<li title="List comprehensions"><a href="listHandling.html#id66805">List comprehensions</a></li>
<li title="Deep and flat lists"><a href="listHandling.html#id66875">Deep and flat lists</a></li>
<li title="Why you should not worry about recursive lists functions"><a href="listHandling.html#id67017">Why you should not worry about recursive lists functions</a></li>
</ul>
</li>
<li id="no" title="Functions" expanded="false">Functions<ul>
<li><a href="functions.html">
              Top of chapter
            </a></li>
<li title="Pattern matching"><a href="functions.html#id67142">Pattern matching</a></li>
<li title="Function Calls "><a href="functions.html#id67362">Function Calls </a></li>
<li title="Memory usage in recursion"><a href="functions.html#id67506">Memory usage in recursion</a></li>
</ul>
</li>
<li id="no" title="Tables and databases" expanded="false">Tables and databases<ul>
<li><a href="tablesDatabases.html">
              Top of chapter
            </a></li>
<li title="Ets, Dets and Mnesia"><a href="tablesDatabases.html#id67596">Ets, Dets and Mnesia</a></li>
<li title="Ets specific"><a href="tablesDatabases.html#id67983">Ets specific</a></li>
<li title="Mnesia specific"><a href="tablesDatabases.html#id68087">Mnesia specific</a></li>
</ul>
</li>
<li id="no" title="Processes" expanded="false">Processes<ul>
<li><a href="processes.html">
              Top of chapter
            </a></li>
<li title="Creation of an Erlang process"><a href="processes.html#id68191">Creation of an Erlang process</a></li>
<li title="Process messages"><a href="processes.html#id68339">Process messages</a></li>
<li title="The SMP emulator"><a href="processes.html#id68531">The SMP emulator</a></li>
</ul>
</li>
<li id="no" title="Drivers" expanded="false">Drivers<ul>
<li><a href="drivers.html">
              Top of chapter
            </a></li>
<li title="Drivers and concurrency"><a href="drivers.html#id68634">Drivers and concurrency</a></li>
<li title="Avoiding copying of binaries when calling a driver"><a href="drivers.html#id68675">Avoiding copying of binaries when calling a driver</a></li>
<li title="Returning small binaries from a driver"><a href="drivers.html#id68743">Returning small binaries from a driver</a></li>
<li title="Returning big binaries without copying from a driver"><a href="drivers.html#id68777">Returning big binaries without copying from a driver</a></li>
</ul>
</li>
<li id="no" title="Advanced" expanded="false">Advanced<ul>
<li><a href="advanced.html">
              Top of chapter
            </a></li>
<li title="Memory"><a href="advanced.html#id68919">Memory</a></li>
<li title="System limits"><a href="advanced.html#id69276">System limits</a></li>
</ul>
</li>
<li id="loadscrollpos" title="Profiling" expanded="true">Profiling<ul>
<li><a href="profiling.html">
              Top of chapter
            </a></li>
<li title="Do not guess about performance - profile"><a href="profiling.html#id69579">Do not guess about performance - profile</a></li>
<li title="Big systems"><a href="profiling.html#id69661">Big systems</a></li>
<li title="What to look for"><a href="profiling.html#id69681">What to look for</a></li>
<li title="Tools"><a href="profiling.html#id69738">Tools</a></li>
<li title="Benchmarking"><a href="profiling.html#id70245">Benchmarking</a></li>
</ul>
</li>
</ul>
</div></div>
<div id="content">
<div class="innertube">
<h1>11 Profiling</h1>
  

  <h3><a name="id69579">11.1 
        Do not guess about performance - profile</a></h3>
    

    <p>Even experienced software developers often guess wrong about where
    the performance bottlenecks are in their programs.</p>

    <p>Therefore, profile your program to see where the performance
    bottlenecks are and concentrate on optimizing them.</p>

    <p>Erlang/OTP contains several tools to help finding bottlenecks.</p>

    <p><span class="code">fprof</span> provide the most detailed information
    about where the time is spent, but it significantly slows down the
    program it profiles.</p>

	<p><span class="code">eprof</span> provides time information of each function used
		in the program. No callgraph is produced but <span class="code">eprof</span> has 
		considerable less impact on the program profiled.</p>

    <p>If the program is too big to be profiled by <span class="code">fprof</span> or <span class="code">eprof</span>,
    <span class="code">cover</span> and <span class="code">cprof</span> could be used to locate parts of the
    code that should be more thoroughly profiled using <span class="code">fprof</span> or
    <span class="code">eprof</span>.</p>

    <p><span class="code">cover</span> provides execution counts per line per process,
    with less overhead than <span class="code">fprof</span>. Execution counts can
    with some caution be used to locate potential performance bottlenecks.
    The most lightweight tool is <span class="code">cprof</span>, but it only provides execution
    counts on a function basis (for all processes, not per process).</p>
  

  <h3><a name="id69661">11.2 
        Big systems</a></h3>
    
    <p>If you have a big system it might be interesting to run profiling
      on a simulated and limited scenario to start with. But bottlenecks
      have a tendency to only appear or cause problems when
      there are many things going on at the same time, and when there
      are many nodes involved. Therefore it is desirable to also run
      profiling in a system test plant on a real target system.</p>
    <p>When your system is big you do not want to run the profiling
      tools on the whole system. You want to concentrate on processes
      and modules that you know are central and stand for a big part of the
      execution.</p>
  

  <h3><a name="id69681">11.3 
        What to look for</a></h3>
    
    <p>When analyzing the result file from the profiling activity
      you should look for functions that are called many
      times and have a long "own" execution time (time excluding calls
      to other functions). Functions that just are called very
      many times can also be interesting, as even small things can add
      up to quite a bit if they are repeated often. Then you need to
      ask yourself what can I do to reduce this time. Appropriate
      types of questions to ask yourself are: </p>
    <ul>
      <li>Can I reduce the number of times the function is called?</li>
      <li>Are there tests that can be run less often if I change
       the order of tests?</li>
      <li>Are there redundant tests that can be removed? </li>
      <li>Is there some expression calculated giving the same result
       each time? </li>
      <li>Are there other ways of doing this that are equivalent and
       more efficient?</li>
      <li>Can I use another internal data representation to make
       things more efficient? </li>
    </ul>
    <p>These questions are not always trivial to answer. You might
      need to do some benchmarks to back up your theory, to avoid
      making things slower if your theory is wrong. See <span class="bold_code"><a href="#benchmark">benchmarking</a></span>.</p>
  

  <h3><a name="id69738">11.4 
        Tools</a></h3>
    

    <h4>fprof</h4>
      
	  <p>
		 <span class="code">fprof</span> measures the execution time for each function,
        both own time i.e how much time a function has used for its
        own execution, and accumulated time i.e. including called
        functions. The values are displayed per process. You also get
        to know how many times each function has been
        called. <span class="code">fprof</span> is based on trace to file in order to
        minimize runtime performance impact. Using fprof is just a
		matter of calling a few library functions, see 
		<span class="bold_code"><a href="javascript:erlhref('../../','tools','fprof.html');">fprof</a></span> 
		manual page under the application tools.<span class="code"> fprof</span> was introduced in
		version R8 of Erlang/OTP.
	</p>
    

	<h4>eprof</h4>
		
		<p>
			<span class="code">eprof</span> is based on the Erlang trace_info BIFs. Eprof shows how much time has been used by
			each process, and in which function calls this time has been
			spent. Time is shown as percentage of total time and absolute time.
			See <span class="bold_code"><a href="javascript:erlhref('../../','tools','eprof.html');">eprof</a></span> for
			additional information.
		</p>
	

    <h4>cover</h4>
      
			<p>
				<span class="code">cover</span>'s primary use is coverage analysis to verify
        test cases, making sure all relevant code is covered.
        <span class="code">cover</span> counts how many times each executable line of
        code is executed when a program is run. This is done on a per
        module basis. Of course this information can be used to
        determine what code is run very frequently and could therefore
        be subject for optimization. Using cover is just a matter of
		calling a few library functions, see 
		<span class="bold_code"><a href="javascript:erlhref('../../','tools','cover.html');">cover</a></span> 
		manual page under the application tools.</p>
    

    <h4>cprof</h4>
      
      <p><span class="code">cprof</span> is something in between <span class="code">fprof</span> and
        <span class="code">cover</span> regarding features. It counts how many times each
        function is called when the program is run, on a per module
        basis. <span class="code">cprof</span> has a low performance degradation effect (versus
        <span class="code">fprof</span>) and does not need to recompile
		any modules to profile (versus <span class="code">cover</span>).
		See <span class="bold_code"><a href="javascript:erlhref('../../','tools','cprof.html');">cprof</a></span> manual page for additional
		information.
	</p>
    

    <h4>Tool summarization</h4>
      
      <table border="1" cellpadding="2" cellspacing="0">
<tr>
          <td align="left" valign="middle">Tool</td>
          <td align="left" valign="middle">Results</td>
          <td align="left" valign="middle">Size of result</td>
          <td align="left" valign="middle">Effects on program execution time</td>
          <td align="left" valign="middle">Records number of calls</td>
          <td align="left" valign="middle">Records Execution time</td>
          <td align="left" valign="middle">Records called by</td>
          <td align="left" valign="middle">Records garbage collection</td>
        </tr>
<tr>
          <td align="left" valign="middle"><span class="code">fprof </span></td>
          <td align="left" valign="middle">per process to screen/file </td>
          <td align="left" valign="middle">large </td>
          <td align="left" valign="middle">significant slowdown </td>
          <td align="left" valign="middle">yes  </td>
          <td align="left" valign="middle">total and own</td>
          <td align="left" valign="middle">yes </td>
          <td align="left" valign="middle">yes </td>
        </tr>
<tr>
          <td align="left" valign="middle"><span class="code">eprof </span></td>
          <td align="left" valign="middle">per process/function to screen/file </td>
          <td align="left" valign="middle">medium </td>
          <td align="left" valign="middle">small slowdown </td>
          <td align="left" valign="middle">yes </td>
          <td align="left" valign="middle">only total </td>
          <td align="left" valign="middle">no </td>
          <td align="left" valign="middle">no </td>
        </tr>
<tr>
          <td align="left" valign="middle"><span class="code">cover </span></td>
          <td align="left" valign="middle">per module to screen/file</td>
          <td align="left" valign="middle">small </td>
          <td align="left" valign="middle">moderate slowdown</td>
          <td align="left" valign="middle">yes, per line  </td>
          <td align="left" valign="middle">no </td>
          <td align="left" valign="middle">no </td>
          <td align="left" valign="middle">no </td>
        </tr>
<tr>
          <td align="left" valign="middle"><span class="code">cprof </span></td>
          <td align="left" valign="middle">per module to caller</td>
          <td align="left" valign="middle">small </td>
          <td align="left" valign="middle">small slowdown </td>
          <td align="left" valign="middle">yes </td>
          <td align="left" valign="middle">no </td>
          <td align="left" valign="middle">no </td>
          <td align="left" valign="middle">no </td>
        </tr>
</table>
<em>Table
        11.1:
         
        </em>
    
  

  <h3><a name="id70245">11.5 
        Benchmarking</a></h3>
    <a name="benchmark"></a>
    

    <p>The main purpose of benchmarking is to find out which
    implementation of a given algorithm or function is the fastest.
    Benchmarking is far from an exact science. Today's operating systems
    generally run background tasks that are difficult to turn off.
    Caches and multiple CPU cores doesn't make it any easier.
    It would be best to run Unix-computers in single-user mode when
    benchmarking, but that is inconvenient to say the least for casual
    testing.</p>
    
    <p>Benchmarks can measure wall-clock time or CPU time.</p>

    <p><span class="bold_code"><a href="javascript:erlhref('../../','stdlib','timer.html#tc-3');">timer:tc/3</a></span> measures
    wall-clock time. The advantage with wall-clock time is that I/O,
    swapping, and other activities in the operating-system kernel are
    included in the measurements. The disadvantage is that the
    the measurements will vary wildly. Usually it is best to run the
    benchmark several times and note the shortest time - that time should
    be the minimum time that is possible to achieve under the best of
    circumstances.</p>

    <p><span class="bold_code"><a href="javascript:erlhref('../../','erts','erlang.html#statistics-1');">statistics/1</a></span>
    with the argument <span class="code">runtime</span> measures CPU time spent in the Erlang
    virtual machine. The advantage is that the results are more
    consistent from run to run. The disadvantage is that the time
    spent in the operating system kernel (such as swapping and I/O)
    are not included. Therefore, measuring CPU time is misleading if
    any I/O (file or socket) is involved.</p>

    <p>It is probably a good idea to do both wall-clock measurements and
    CPU time measurements.</p>

    <p>Some additional advice:</p>

    <ul>
    <li>The granularity of both types of measurement could be quite
    high so you should make sure that each individual measurement
    lasts for at least several seconds.</li>

    <li>To make the test fair, each new test run should run in its own,
    newly created Erlang process. Otherwise, if all tests run in the
    same process, the later tests would start out with larger heap sizes
    and therefore probably do less garbage collections. You could
    also consider restarting the Erlang emulator between each test.</li>

    <li>Do not assume that the fastest implementation of a given algorithm
    on computer architecture X also is the fastest on computer architecture Y.</li>

    </ul>
  
</div>
<div class="footer">
<hr>
<p>Copyright © 2001-2012 Ericsson AB. All Rights Reserved.</p>
</div>
</div>
</div></body>
</html>