<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta name="robots" content="index,nofollow"> <title>CommonArg - MLton Standard ML Compiler (SML Compiler)</title> <link rel="stylesheet" type="text/css" charset="iso-8859-1" media="all" href="common.css"> <link rel="stylesheet" type="text/css" charset="iso-8859-1" media="screen" href="screen.css"> <link rel="stylesheet" type="text/css" charset="iso-8859-1" media="print" href="print.css"> <link rel="Start" href="Home"> </head> <body lang="en" dir="ltr"> <script src="http://www.google-analytics.com/urchin.js" type="text/javascript"> </script> <script type="text/javascript"> _uacct = "UA-833377-1"; urchinTracker(); </script> <table bgcolor = lightblue cellspacing = 0 style = "border: 0px;" width = 100%> <tr> <td style = " border: 0px; color: darkblue; font-size: 150%; text-align: left;"> <a class = mltona href="Home">MLton MLTONWIKIVERSION</a> <td style = " border: 0px; font-size: 150%; text-align: center; width: 50%;"> CommonArg <td style = " border: 0px; text-align: right;"> <table cellspacing = 0 style = "border: 0px"> <tr style = "vertical-align: middle;"> </table> <tr style = "background-color: white;"> <td colspan = 3 style = " border: 0px; font-size:70%; text-align: right;"> <a href = "Home">Home</a> <a href = "TitleIndex">Index</a> </table> <div id="content" lang="en" dir="ltr"> CommonArg is an optimization pass for the <a href="SSA">SSA</a> <a href="IntermediateLanguage">IntermediateLanguage</a>, invoked from <a href="SSASimplify">SSASimplify</a>. <h2 id="head-55f8ebc805e65b5b71ddafdae390e3be2bcd69af">Description</h2> <p> It optimizes instances of <tt>Goto</tt> transfers that pass the same arguments to the same label; e.g. <pre>L_1 () ... z1 = ? ... L_3 (x, y, z1) L_2 () ... z2 = ? ... L_3 (x, y, z2) L_3 (a, b, c) ... </pre> This code can be simplified to: <pre>L_1 () ... z1 = ? ... L_3 (z1) L_2 () ... z2 = ? ... L_3 (z2) L_3 (c) a = x b = y </pre>which saves a number of resources: time of setting up the arguments for the jump to <tt>L_3</tt>, space (either stack or pseudo-registers) for the arguments of <tt>L_3</tt>, etc. It may also expose some other optimizations, if more information is known about <tt>x</tt> or <tt>y</tt>. </p> <h2 id="head-8781d615fd77be9578225c40ac67b9471394cced">Implementation</h2> <a href = "http://mlton.org/cgi-bin/viewsvn.cgi/mlton/tags/on-MLTONWIKIVERSION-release/mlton/ssa/common-arg.sig?view=markup"><img src="moin-www.png" alt="[WWW]" height="11" width="11">common-arg.sig</a> <a href = "http://mlton.org/cgi-bin/viewsvn.cgi/mlton/tags/on-MLTONWIKIVERSION-release/mlton/ssa/common-arg.fun?view=markup"><img src="moin-www.png" alt="[WWW]" height="11" width="11">common-arg.fun</a> <h2 id="head-35ec00231a68203708e39f0e2cc10b50c6bf62de">Details and Notes</h2> <p> Three analyses were originally proposed to drive the optimization transformation. Only the <em>Dominator Analysis</em> is currently implemented. (Implementations of the other analyses are available in the Subversion repository.) </p> <h4 id="head-9658a3d16f7d3e0a09ce9f0001035a0a0733c771">Syntactic Analysis</h4> <p> The simplest analysis I could think of maintains <pre>varInfo: Var.t -> Var.t option list ref </pre> initialized to <tt>[]</tt>. </p> <p> For each variable <tt>v</tt> bound in a <tt>Statement.t</tt> or in the <tt>Function.t</tt> args, then <tt>List.push(varInfo v, NONE)</tt>. For each <tt>L (x1, ..., xn)</tt> transfer where <tt>(a1, ..., an)</tt> are the formals of <tt>L</tt>, then <tt>List.push(varInfo ai, SOME xi)</tt>. For each block argument a used in an unknown context (e.g., arguments of blocks used as continuations, handlers, arith success, runtime return, or case switch labels), then <tt>List.push(varInfo a, NONE)</tt>. </p> <p> Now, any block argument <tt>a</tt> such that <tt>varInfo a = xs</tt>, where all of the elements of <tt>xs</tt> are equal to <tt>SOME x</tt>, can be optimized by setting <tt>a = x</tt> at the beginning of the block and dropping the argument from <tt>Goto</tt> transfers. </p> <p> That takes care of the example above. We can clearly do slightly better, by changing the transformation criteria to the following: any block argument a such that <tt>varInfo a = xs</tt>, where all of the elements of <tt>xs</tt> are equal to <tt>SOME x</tt> <em>or</em> are equal to <tt>SOME a</tt>, can be optimized by setting <tt>a = x</tt> at the beginning of the block and dropping the argument from <tt>Goto</tt> transfers. This optimizes a case like: <pre>L_1 () ... z1 = ? ... L_3 (x, y, z1) L_2 () ... z2 = ? ... L_3(x, y, z2) L_3 (a, b, c) ... w = ? ... case w of true => L_4 | false => L_5 L_4 () ... L_3 (a, b, w) L_5 () ... </pre>where a common argument is passed to a loop (and is invariant through the loop). Of course, the <a href="LoopInvariant">LoopInvariant</a> optimization pass would normally introduce a local loop and essentially reduce this to the first example, but I have seen this in practice, which suggests that some optimizations after <a href="LoopInvariant">LoopInvariant</a> do enough simplifications to introduce (new) loop invariant arguments. </p> <h4 id="head-391c452724ec2ff821c0c2798ad26914f131f018">Fixpoint Analysis</h4> <p> However, the above analysis and transformation doesn't cover the cases where eliminating one common argument exposes the opportunity to eliminate other common arguments. For example: <pre>L_1 () ... L_3 (x) L_2 () ... L_3 (x) L_3 (a) ... L_5 (a) L_4 () ... L_5 (x) L_5 (b) ... </pre>One pass of analysis and transformation would eliminate the argument to <tt>L_3</tt> and rewrite the <tt>L_5(a)</tt> transfer to <tt>L_5 (x)</tt>, thereby exposing the opportunity to eliminate the common argument to <tt>L_5</tt>. </p> <p> The interdependency the arguments to <tt>L_3</tt> and <tt>L_5</tt> suggest performing some sort of fixed-point analysis. This analysis is relatively simple; maintain <pre>varInfo: Var.t -> VarLattice.t </pre> where <pre>VarLattice.t ~=~ Bot | Point of Var.t | Top </pre> (but is implemented by the <a class="nonexistent" href="FlatLattice">FlatLattice</a> functor with a <tt>lessThan</tt> list and <tt>value ref</tt> under the hood), initialized to <tt>Bot</tt>. </p> <p> For each variable <tt>v</tt> bound in a <tt>Statement.t</tt> or in the <tt>Function.t</tt> args, then <tt>VarLattice.<= (Point v, varInfo v)</tt> For each <tt>L (x1, ..., xn)</tt> transfer where <tt>(a1, ..., an)</tt> are the formals of <tt>L</tt>}, then <tt>VarLattice.<= (varInfo xi, varInfo ai)</tt>. For each block argument a used in an unknown context, then <tt>VarLattice.<= (Point a, varInfo a)</tt>. </p> <p> Now, any block argument a such that <tt>varInfo a = Point x</tt> can be optimized by setting <tt>a = x</tt> at the beginning of the block and dropping the argument from <tt>Goto</tt> transfers. </p> <p> Now, with the last example, we introduce the ordering constraints: </p> <ul> <pre>varInfo x <= varInfo a varInfo a <= varInfo b varInfo x <= varInfo b </pre> </ul> <p> Assuming that <tt>varInfo x = Point x</tt>, then we get <tt>varInfo a = Point x</tt> and <tt>varInfo b = Point x</tt>, and we optimize the example as desired. </p> <p> But, that is a rather weak assumption. It's quite possible for <tt>varInfo x = Top</tt>. For example, consider: <pre>G_1 () ... n = 1 ... L_0 (n) G_2 () ... m = 2 ... L_0 (m) L_0 (x) ... L_1 () ... L_3 (x) L_2 () ... L_3 (x) L_3 (a) ... L_5(a) L_4 () ... L_5(x) L_5 (b) ... </pre>Now <tt>varInfo x = varInfo a = varInfo b = Top</tt>. What went wrong here? When <tt>varInfo x</tt> went to <tt>Top</tt>, it got propagated all the way through to <tt>a</tt> and <tt>b</tt>, and prevented the elimination of any common arguments. What we'd like to do instead is when <tt>varInfo x</tt> goes to <tt>Top</tt>, propagate on <tt>Point x</tt> -- we have no hope of eliminating <tt>x</tt>, but if we hold <tt>x</tt> constant, then we have a chance of eliminating arguments for which <tt>x</tt> is passed as an actual. </p> <h4 id="head-31d2f75a6c210512fa42401e00d0b609716ec3b5">Dominator Analysis</h4> <p> Does anyone see where this is going yet? Pausing for a little thought, <a href="MatthewFluet">MatthewFluet</a> realized that he had once before tried proposing this kind of "fix" to a fixed-point analysis -- when we were first investigating the <a href="Contify">Contify</a> optimization in light of John Reppy's CWS paper. Of course, that "fix" failed because it defined a non-monotonic function and one couldn't take the fixed point. But, <a href="StephenWeeks">StephenWeeks</a> suggested a dominator based approach, and we were able to show that, indeed, the dominator analysis subsumed both the previous call based analysis and the cont based analysis. And, a moment's reflection reveals further parallels: when <tt>varInfo: Var.t -> Var.t option list ref</tt>, we have something analogous to the call analysis, and when <tt>varInfo: Var.t -> VarLattice.t</tt>, we have something analogous to the cont analysis. Maybe there is something analogous to the dominator approach (and therefore superior to the previous analyses). </p> <p> And this turns out to be the case. Construct the graph <tt>G</tt> as follows: </p> <ul> <pre>nodes(G) = {Root} U Var.t edges(G) = {Root -> v | v bound in a Statement.t or in the Function.t args} U {xi -> ai | L(x1, ..., xn) transfer where (a1, ..., an) are the formals of L} U {Root -> a | a is a block argument used in an unknown context} </pre> </ul> <p> Let <tt>idom(x)</tt> be the immediate dominator of <tt>x</tt> in <tt>G</tt> with root <tt>Root</tt>. Now, any block argument a such that <tt>idom(a) = x <> Root</tt> can be optimized by setting <tt>a = x</tt> at the beginning of the block and dropping the argument from <tt>Goto</tt> transfers. </p> <p> Furthermore, experimental evidence suggests (and we are confident that a formal presentation could prove) that the dominator analysis subsumes the "syntactic" and "fixpoint" based analyses in this context as well and that the dominator analysis gets "everything" in one go. </p> <h4 id="head-e163a6e8bdc0e8e8e2cb41c825525e3f18b2eec0">Final Thoughts</h4> <p> I must admit, I was rather surprised at this progression and final result. At the outset, I never would have thought of a connection between <a href="Contify">Contify</a> and CommonArg optimizations. They would seem to be two completely different optimizations. Although, this may not really be the case. As one of the reviewers of the ICFP paper said: </p> <ul> I understand that such a form of CPS might be convenient in some cases, but when we're talking about analyzing code to detect that some continuation is constant, I think it makes a lot more sense to make all the continuation arguments completely explicit. <p> I believe that making all the continuation arguments explicit will show that the optimization can be generalized to eliminating constant arguments, whether continuations or not. </p> </ul> <p> What I think the common argument optimization shows is that the dominator analysis does slightly better than the reviewer puts it: we find more than just constant continuations, we find common continuations. And I think this is further justified by the fact that I have observed common argument eliminate some <tt>env_X</tt> arguments which would appear to correspond to determining that while the closure being executed isn't constant it is at least the same as the closure being passed elsewhere. </p> <p> At first, I was curious whether or not we had missed a bigger picture with the dominator analysis. When we wrote the contification paper, I assumed that the dominator analysis was a specialized solution to a specialized problem; we never suggested that it was a technique suited to a larger class of analyses. After initially finding a connection between <a href="Contify">Contify</a> and CommonArg (and thinking that the only connection was the technique), I wondered if the dominator technique really was applicable to a larger class of analyses. That is still a question, but after writing up the above, I'm suspecting that the "real story" is that the dominator analysis is a solution to the common argument optimization, and that the <a href="Contify">Contify</a> optimization is specializing CommonArg to the case of continuation arguments (with a different transformation at the end). (Note, a whole-program, inter-procedural common argument analysis doesn't really make sense (in our <a href="SSA">SSA</a> <a href="IntermediateLanguage">IntermediateLanguage</a>), because the only way of passing values between functions is as arguments. (Unless of course in the case that the common argument is also a constant argument, in which case <a href="ConstantPropagation">ConstantPropagation</a> could lift it to a global.) The inter-procedural <a href="Contify">Contify</a> optimization works out because there we move the function to the argument.) </p> <p> Anyways, it's still unclear to me whether or not the dominator based approach solves other kinds of problems. </p> <h4 id="head-281b6a33f8767214a7090e6db6c27c9e979c50b9">Phase Ordering</h4> <p> On the downside, the optimization doesn't have a huge impact on runtime, although it does predictably saved some code size. I stuck it in the optimization sequence after <a href="Flatten">Flatten</a> and (the third round of) <a href="LocalFlatten">LocalFlatten</a>, since it seems to me that we could have cases where some components of a tuple used as an argument are common, but the whole tuple isn't. I think it makes sense to add it after <a href="IntroduceLoops">IntroduceLoops</a> and <a href="LoopInvariant">LoopInvariant</a> (even though CommonArg get some things that <a href="LoopInvariant">LoopInvariant</a> gets, it doesn't get all of them). I also think that it makes sense to add it before <a href="CommonSubexp">CommonSubexp</a>, since identifying variables could expose more common subexpressions. I would think a similar thought applies to <a href="RedundantTests">RedundantTests</a>. </p> </div> <p> <hr> Last edited on 2007-08-15 22:05:26 by <span title="fenrir.uchicago.edu"><a href="MatthewFluet">MatthewFluet</a></span>. </body></html>