Sophie

Sophie

distrib > Fedora > 15 > i386 > by-pkgid > d07d7ab417d79053e7e0155c99e1a1c8 > files > 2199

mlton-20100608-3.fc15.i686.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta name="robots" content="index,nofollow">



<title>CommonArg - MLton Standard ML Compiler (SML Compiler)</title>
<link rel="stylesheet" type="text/css" charset="iso-8859-1" media="all" href="common.css">
<link rel="stylesheet" type="text/css" charset="iso-8859-1" media="screen" href="screen.css">
<link rel="stylesheet" type="text/css" charset="iso-8859-1" media="print" href="print.css">


<link rel="Start" href="Home">


</head>

<body lang="en" dir="ltr">

<script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
</script>
<script type="text/javascript">
_uacct = "UA-833377-1";
urchinTracker();
</script>
<table bgcolor = lightblue cellspacing = 0 style = "border: 0px;" width = 100%>
  <tr>
    <td style = "
		border: 0px;
		color: darkblue; 
		font-size: 150%;
		text-align: left;">
      <a class = mltona href="Home">MLton MLTONWIKIVERSION</a>
    <td style = "
		border: 0px;
		font-size: 150%;
		text-align: center;
		width: 50%;">
      CommonArg
    <td style = "
		border: 0px;
		text-align: right;">
      <table cellspacing = 0 style = "border: 0px">
        <tr style = "vertical-align: middle;">
      </table>
  <tr style = "background-color: white;">
    <td colspan = 3
	style = "
		border: 0px;
		font-size:70%;
		text-align: right;">
      <a href = "Home">Home</a>
      &nbsp;<a href = "TitleIndex">Index</a>
      &nbsp;
</table>
<div id="content" lang="en" dir="ltr">
CommonArg is an optimization pass for the <a href="SSA">SSA</a> <a href="IntermediateLanguage">IntermediateLanguage</a>, invoked from <a href="SSASimplify">SSASimplify</a>. <h2 id="head-55f8ebc805e65b5b71ddafdae390e3be2bcd69af">Description</h2>
<p>
It optimizes instances of <tt>Goto</tt> transfers that pass the same arguments to the same label; e.g. 
<pre>L_1 ()
  ... 
  z1 = ? 
  ... 
  L_3 (x, y, z1)
L_2 ()
  ... 
  z2 = ? 
  ...
  L_3 (x, y, z2)
L_3 (a, b, c)
  ...
</pre> This code can be simplified to:  
<pre>L_1 ()
  ... 
  z1 = ? 
  ... 
  L_3 (z1)
L_2 ()
  ... 
  z2 = ? 
  ...
  L_3 (z2)
L_3 (c)
  a = x
  b = y
</pre>which saves a number of resources: time of setting up the arguments for the jump to <tt>L_3</tt>, space (either stack or pseudo-registers) for the arguments of <tt>L_3</tt>, etc.  It may also expose some other optimizations, if more information is known about <tt>x</tt> or <tt>y</tt>. 
</p>
<h2 id="head-8781d615fd77be9578225c40ac67b9471394cced">Implementation</h2>
<a href = "http://mlton.org/cgi-bin/viewsvn.cgi/mlton/tags/on-MLTONWIKIVERSION-release/mlton/ssa/common-arg.sig?view=markup"><img src="moin-www.png" alt="[WWW]" height="11" width="11">common-arg.sig</a> <a href = "http://mlton.org/cgi-bin/viewsvn.cgi/mlton/tags/on-MLTONWIKIVERSION-release/mlton/ssa/common-arg.fun?view=markup"><img src="moin-www.png" alt="[WWW]" height="11" width="11">common-arg.fun</a> <h2 id="head-35ec00231a68203708e39f0e2cc10b50c6bf62de">Details and Notes</h2>
<p>
Three analyses were originally proposed to drive the optimization transformation.  Only the <em>Dominator Analysis</em> is currently implemented.  (Implementations of the other analyses are available in the Subversion repository.) 
</p>
<h4 id="head-9658a3d16f7d3e0a09ce9f0001035a0a0733c771">Syntactic Analysis</h4>
<p>
The simplest analysis I could think of maintains  
<pre>varInfo: Var.t -&gt; Var.t option list ref
</pre> initialized to <tt>[]</tt>. 
</p>
<p>
For each variable <tt>v</tt> bound in a <tt>Statement.t</tt> or in the <tt>Function.t</tt> args, then <tt>List.push(varInfo&nbsp;v,&nbsp;NONE)</tt>. For each  <tt>L&nbsp;(x1,&nbsp;...,&nbsp;xn)</tt> transfer where <tt>(a1,&nbsp;...,&nbsp;an)</tt> are the formals of <tt>L</tt>, then <tt>List.push(varInfo&nbsp;ai,&nbsp;SOME&nbsp;xi)</tt>. For each block argument a used in an unknown context (e.g., arguments of blocks used as continuations, handlers, arith success, runtime return, or case switch labels), then <tt>List.push(varInfo&nbsp;a,&nbsp;NONE)</tt>. 
</p>
<p>
Now, any block argument <tt>a</tt> such that <tt>varInfo&nbsp;a&nbsp;=&nbsp;xs</tt>, where all of the elements of <tt>xs</tt> are equal to <tt>SOME&nbsp;x</tt>, can be optimized by setting  <tt>a&nbsp;=&nbsp;x</tt> at the beginning of the block and dropping the argument from <tt>Goto</tt> transfers. 
</p>
<p>
That takes care of the example above.  We can clearly do slightly better, by changing the transformation criteria to the following: any block argument a such that <tt>varInfo&nbsp;a&nbsp;=&nbsp;xs</tt>, where all of the elements of <tt>xs</tt> are equal to <tt>SOME&nbsp;x</tt> <em>or</em> are equal to <tt>SOME&nbsp;a</tt>, can be optimized by setting  <tt>a&nbsp;=&nbsp;x</tt>  at the beginning of the block and dropping the argument from <tt>Goto</tt> transfers.  This optimizes a case like:  
<pre>L_1 () 
  ... z1 = ? ... 
  L_3 (x, y, z1)
L_2 () 
  ... z2 = ? ...
  L_3(x, y, z2)
L_3 (a, b, c)
  ... w = ? ...
  case w of 
    true =&gt; L_4 | false =&gt; L_5
L_4 ()
   ...
   L_3 (a, b, w)
L_5 () 
   ...
</pre>where a common argument is passed to a loop (and is invariant through the loop).  Of course, the <a href="LoopInvariant">LoopInvariant</a> optimization pass would normally introduce a local loop and essentially reduce this to the first example, but I have seen this in practice, which suggests that some optimizations after <a href="LoopInvariant">LoopInvariant</a> do enough simplifications to introduce (new) loop invariant arguments. 
</p>
<h4 id="head-391c452724ec2ff821c0c2798ad26914f131f018">Fixpoint Analysis</h4>
<p>
However, the above analysis and transformation doesn't cover the cases where eliminating one common argument exposes the opportunity to eliminate other common arguments.  For example:  
<pre>L_1 () 
  ...
  L_3 (x)
L_2 ()
  ...
  L_3 (x)
L_3 (a) 
  ...
  L_5 (a)
L_4 () 
  ...
  L_5 (x)
L_5 (b)
  ...
</pre>One pass of analysis and transformation would eliminate the argument to <tt>L_3</tt> and rewrite the <tt>L_5(a)</tt> transfer to <tt>L_5&nbsp;(x)</tt>, thereby exposing the opportunity to eliminate the common argument to <tt>L_5</tt>. 
</p>
<p>
The interdependency the arguments to <tt>L_3</tt> and <tt>L_5</tt> suggest performing some sort of fixed-point analysis.  This analysis is relatively simple; maintain  
<pre>varInfo: Var.t -&gt; VarLattice.t
</pre> where  
<pre>VarLattice.t ~=~ Bot | Point of Var.t | Top
</pre> (but is implemented by the <a class="nonexistent" href="FlatLattice">FlatLattice</a> functor with a <tt>lessThan</tt> list and <tt>value&nbsp;ref</tt> under the hood), initialized to <tt>Bot</tt>. 
</p>
<p>
For each variable <tt>v</tt> bound in a <tt>Statement.t</tt> or in the <tt>Function.t</tt> args, then <tt>VarLattice.&lt;=&nbsp;(Point&nbsp;v,&nbsp;varInfo&nbsp;v)</tt> For each  <tt>L&nbsp;(x1,&nbsp;...,&nbsp;xn)</tt> transfer where <tt>(a1,&nbsp;...,&nbsp;an)</tt> are the formals of <tt>L</tt>}, then <tt>VarLattice.&lt;=&nbsp;(varInfo&nbsp;xi,&nbsp;varInfo&nbsp;ai)</tt>. For each block argument a used in an unknown context, then <tt>VarLattice.&lt;=&nbsp;(Point&nbsp;a,&nbsp;varInfo&nbsp;a)</tt>. 
</p>
<p>
Now, any block argument a such that <tt>varInfo&nbsp;a&nbsp;=&nbsp;Point&nbsp;x</tt> can be optimized by setting <tt>a&nbsp;=&nbsp;x</tt> at the beginning of the block and dropping the argument from <tt>Goto</tt> transfers. 
</p>
<p>
Now, with the last example, we introduce the ordering constraints:  
</p>

    <ul>

 
<pre>varInfo x &lt;= varInfo a
varInfo a &lt;= varInfo b
varInfo x &lt;= varInfo b
</pre>
    </ul>


<p>
Assuming that <tt>varInfo&nbsp;x&nbsp;=&nbsp;Point&nbsp;x</tt>, then we get <tt>varInfo&nbsp;a&nbsp;=&nbsp;Point&nbsp;x</tt> and <tt>varInfo&nbsp;b&nbsp;=&nbsp;Point&nbsp;x</tt>, and we optimize the example as desired. 
</p>
<p>
But, that is a rather weak assumption.  It's quite possible for <tt>varInfo&nbsp;x&nbsp;=&nbsp;Top</tt>.  For example, consider: 
<pre>G_1 () 
  ... n = 1 ...
  L_0 (n)
G_2 () 
  ... m = 2 ...
  L_0 (m)
L_0 (x) 
  ...
L_1 () 
  ...
  L_3 (x)
L_2 ()
  ...
  L_3 (x)
L_3 (a)
  ...
  L_5(a)
L_4 ()
  ...
  L_5(x)
L_5 (b) 
   ...
</pre>Now <tt>varInfo&nbsp;x&nbsp;=&nbsp;varInfo&nbsp;a&nbsp;=&nbsp;varInfo&nbsp;b&nbsp;=&nbsp;Top</tt>.  What went wrong here? When <tt>varInfo&nbsp;x</tt> went to <tt>Top</tt>, it got propagated all the way through to <tt>a</tt> and <tt>b</tt>, and prevented the elimination of any common arguments.  What we'd like to do instead is when <tt>varInfo&nbsp;x</tt> goes to <tt>Top</tt>, propagate on <tt>Point&nbsp;x</tt> -- we have no hope of eliminating <tt>x</tt>, but if we hold <tt>x</tt> constant, then we have a chance of eliminating arguments for which <tt>x</tt> is passed as an actual. 
</p>
<h4 id="head-31d2f75a6c210512fa42401e00d0b609716ec3b5">Dominator Analysis</h4>
<p>
Does anyone see where this is going yet?  Pausing for a little thought, <a href="MatthewFluet">MatthewFluet</a> realized that he had once before tried proposing this kind of "fix" to a fixed-point analysis -- when we were first investigating the <a href="Contify">Contify</a> optimization in light of John Reppy's CWS paper.  Of course, that "fix" failed because it defined a non-monotonic function and one couldn't take the fixed point.  But, <a href="StephenWeeks">StephenWeeks</a> suggested a dominator based approach, and we were able to show that, indeed, the dominator analysis subsumed both the previous call based analysis and the cont based analysis.  And, a moment's reflection reveals further parallels:  when <tt>varInfo:&nbsp;Var.t&nbsp;-&gt;&nbsp;Var.t&nbsp;option&nbsp;list&nbsp;ref</tt>, we have something analogous to the call analysis, and when <tt>varInfo:&nbsp;Var.t&nbsp;-&gt;&nbsp;VarLattice.t</tt>, we have something analogous to the cont analysis.  Maybe there is something analogous to the dominator approach (and therefore superior to the previous analyses). 
</p>
<p>
And this turns out to be the case.  Construct the graph <tt>G</tt> as follows:  
</p>

    <ul>

 
<pre>nodes(G) = {Root} U Var.t
edges(G) = {Root -&gt; v | v bound in a Statement.t or
                                in the Function.t args} U
           {xi -&gt; ai | L(x1, ..., xn) transfer where (a1, ..., an) 
                                      are the formals of L} U
           {Root -&gt; a | a is a block argument used in an unknown context}
</pre>
    </ul>


<p>
Let <tt>idom(x)</tt> be the immediate dominator of <tt>x</tt> in <tt>G</tt> with root <tt>Root</tt>. Now, any block argument a such that <tt>idom(a)&nbsp;=&nbsp;x&nbsp;&lt;&gt;&nbsp;Root</tt> can be optimized by setting <tt>a&nbsp;=&nbsp;x</tt> at the beginning of the block and dropping the argument from <tt>Goto</tt> transfers. 
</p>
<p>
Furthermore, experimental evidence suggests (and we are confident that a formal presentation could prove) that the dominator analysis subsumes the "syntactic" and "fixpoint" based analyses in this context as well and that the dominator analysis gets "everything" in one go. 
</p>
<h4 id="head-e163a6e8bdc0e8e8e2cb41c825525e3f18b2eec0">Final Thoughts</h4>
<p>
I must admit, I was rather surprised at this progression and final result.  At the outset, I never would have thought of a connection between <a href="Contify">Contify</a> and CommonArg optimizations.  They would seem to be two completely different optimizations.  Although, this may not really be the case.  As one of the reviewers of the ICFP paper said: 
</p>

        <ul>

  I understand that such a form of CPS might be convenient in some   cases, but when we're talking about analyzing code to detect that   some continuation is constant, I think it makes a lot more sense to   make all the continuation arguments completely explicit. <p>
  I believe that making all the continuation arguments explicit will   show that the optimization can be generalized to eliminating   constant arguments, whether continuations or not. 
</p>

        </ul>


<p>
What I think the common argument optimization shows is that the dominator analysis does slightly better than the reviewer puts it: we find more than just constant continuations, we find common continuations.  And I think this is further justified by the fact that I have observed common argument eliminate some <tt>env_X</tt> arguments which would appear to correspond to determining that while the closure being executed isn't constant it is at least the same as the closure being passed elsewhere. 
</p>
<p>
At first, I was curious whether or not we had missed a bigger picture with the dominator analysis.  When we wrote the contification paper, I assumed that the dominator analysis was a specialized solution to a specialized problem; we never suggested that it was a technique suited to a larger class of analyses.  After initially finding a connection between <a href="Contify">Contify</a> and CommonArg (and thinking that the only connection was the technique), I wondered if the dominator technique really was applicable to a larger class of analyses.  That is still a question, but after writing up the above, I'm suspecting that the "real story" is that the dominator analysis is a solution to the common argument optimization, and that the <a href="Contify">Contify</a> optimization is specializing CommonArg to the case of continuation arguments (with a different transformation at the end).  (Note, a whole-program, inter-procedural common argument analysis doesn't really make sense (in our <a href="SSA">SSA</a> <a href="IntermediateLanguage">IntermediateLanguage</a>), because the only way of passing values between functions is as arguments.  (Unless of course in the case that the common argument is also a constant argument, in which case <a href="ConstantPropagation">ConstantPropagation</a> could lift it to a global.)  The inter-procedural <a href="Contify">Contify</a> optimization works out because there we move the function to the argument.) 
</p>
<p>
Anyways, it's still unclear to me whether or not the dominator based approach solves other kinds of problems.   
</p>
<h4 id="head-281b6a33f8767214a7090e6db6c27c9e979c50b9">Phase Ordering</h4>
<p>
On the downside, the optimization doesn't have a huge impact on runtime, although it does predictably saved some code size.  I stuck it in the optimization sequence after <a href="Flatten">Flatten</a> and (the third round of) <a href="LocalFlatten">LocalFlatten</a>, since it seems to me that we could have cases where some components of a tuple used as an argument are common, but the whole tuple isn't.  I think it makes sense to add it after <a href="IntroduceLoops">IntroduceLoops</a> and <a href="LoopInvariant">LoopInvariant</a> (even though CommonArg get some things that <a href="LoopInvariant">LoopInvariant</a> gets, it doesn't get all of them).  I also think that it makes sense to add it before <a href="CommonSubexp">CommonSubexp</a>, since identifying variables could expose more common subexpressions.  I would think a similar thought applies to <a href="RedundantTests">RedundantTests</a>. 
</p>
</div>



<p>
<hr>
Last edited on 2007-08-15 22:05:26 by <span title="fenrir.uchicago.edu"><a href="MatthewFluet">MatthewFluet</a></span>.
</body></html>