Sophie

Sophie

distrib > Mageia > 7 > armv7hl > by-pkgid > b3bdfe6d859a3d6920ff2c44b38e9a6f > files > 106

saxon-manual-9.4.0.9-2.mga7.noarch.rpm

<?xml version="1.0" encoding="iso-8859-1"?>
<?xml-stylesheet href="../../make-menu.xsl" type="text/xsl"?><html>
   <head>
      <this-is section="changes" page="s90" subpage="optimization90"/>
      <!--
           Generated at 2011-12-09T20:47:22.916Z--><title>Saxonica: XSLT and XQuery Processing: Optimization</title>
      <meta name="coverage" content="Worldwide"/>
      <meta name="copyright" content="Copyright Saxonica Ltd"/>
      <meta name="title" content="Saxonica: XSLT and XQuery Processing: Optimization"/>
      <meta name="robots" content="noindex,nofollow"/>
      <link rel="stylesheet" href="../../saxondocs.css" type="text/css"/>
   </head>
   <body class="main">
      <h1>Optimization</h1>
      <p>When the descendant axis is used in a schema-aware query or stylesheet, and the type of the context node
is statically known, the step that uses the descendant axis is now replaced by a sequence of steps using the
child axis (and the descendant or descendant-or-self axes, if necessary) that restricts the search to the parts
of the tree where the required element can actually be found. If it is not possible for the descendant element
to exist within the subtree, a compile-time warning is produced (in the same way as previous releases do for the
child and attribute axes).</p>
      <p>Expressions using the axis step <code>child::*</code> now have their static type inferred if the schema
only allows one possible element type in this context. This may lead to warnings being produced when a path
expression using such a construct cannot select anything.</p>
      <p>In previous releases of the Saxon-SA join optimizer, 
document-level indexes (keys) were not used to index expressions that required
sorting into document order, for example <code>doc('abc.xml')//b/c[@d=$x]"</code>. This restriction has
been removed.</p>
      <p>Saxon-SA is now better at detecting when there is an indexable term in a predicate masked by other terms
that are not indexable, for example <code>doc('abc.xml')/a/b/c[@d gt 5 and @e=$x]"</code> which will now be indexed
on the value of <code>@e</code></p>
      <p>Saxon has always gone to some efforts to ensure that the result of a path expression is not sorted at run-time if
the path is <i>naturally sorted</i>, that is, if the nested-loop evaluation of the path expression will deliver nodes
in document order anyway. One situation where this is not possible is with a path of the form $v/a/b/c/d, in the case
where Saxon cannot determine statically that $v will be a singleton. In this situation Saxon
was effectively generating the expression sort($v/a/b/c/d). This has now changed in Saxon-SA so that in the case where the tail
of the path expression (a/b/c/d) is naturally sorted, Saxon now generates a conditional sort expression, which performs
the sort only if the condition <code>exists($v[2])</code> is true. (Note: it is not possible to rewrite the expression
as <code>sort($v)/a/b/c/d</code>, because this can result in duplicates if $v is not a peer node-set, that is, if it
contains one node that is an ancestor of another.)</p>
      <p>This optimization (which is available only in Saxon-SA) benefits many queries of the form:</p>
      <div class="codeblock"
           style="border: solid thin; background-color: #B1CCC7; padding: 2px">
         <pre>
            <code>
let $x := doc('abc.xml')//item[@code='12345']
return $x/price, $x/value, $x/size
</code>
         </pre>
      </div>
      <p>Where the expression <code>EXP1</code> in <code>for $i in EXP1 return EXP2</code> is known to be a singleton,
the expression is rewritten <code>let $i := EXP1 return EXP2</code>. This creates the opportunity for further simplifications.</p>
      <p>Local variables are now inlined if they are bound to a constant value. Previously variables were inlined only in cases where
there is just one reference to the variable. This creates the opportunity for further static evaluation of constant subexpressions.</p>
      <p>In Saxon-SA, function calls are now inlined, provided certain conditions are met. These conditions are currently rather
conservative. The function that is inlined must not call any user-defined functions,
and it must not exceed a certain size (currently set, rather arbitrarily, to 15 nodes in the expression tree).
It must also not contain certain constructs: for example, various XSLT instructions such as xsl:number and xsl:apply-templates,
any instruction that performs sorting, or a "for" expression with an "at" variable.</p>
      <p><i>There are two reasons to inline function calls. Firstly, with very simple functions, the cost of doing a function call can be
noticeable. Secondly and more importantly, combining the calling and called functions into a single expression enables further optimizations,
for example it often becomes possible to extract subexpressions from the function body out of a loop. Where a function call has
constants as its arguments, this might even include evaluating the result of the function at compile-time.</i></p>
      <p>When a function call is inlined, the original function remains available even if there are no further calls to it. This is because
there are interfaces in Saxon that allow functions in a query module to be located and invoked dynamically.</p>
      <p>If a subexpression within a function or template body does not depend on the parameters to the function, does not create
new nodes, and is not a constant, then it is now extracted from the function body and evaluated as a global variable.
This might apply to an expression that depends on other global variables or parameters, or to an expression such
as doc('abc.xml') that is never evaluated at compile time. This optimization applies only to Saxon-SA. However,
during testing of this optimization a considerable number of cases were found where Saxon was not taking the opportunity
to do "constant folding" (compile-time evaluation of expressions) and these have been fixed, benefitting both Saxon-B and Saxon-SA.</p>
      <p>Static type checking when applied to a conditional expression is now distributed to the branches of the conditional.
("Static type checking" here means checking that the static type of an expression is compatible with the required type,
and generating run-time type checking code where this proves necessary). This means that no run-time checking code is now
generated or executed for those branches of the conditional that are statically type-safe. This in turn means that if one branch
of the conditional is a recursive tail call, tail call optimization is no longer inhibited by the unnecessary run-time
type check on the value returned by the recursive call. Another effect of the change is that a static type error may now
be reported if any branch of the conditional has a static type that is incompatible with the required type; previously this error
would have been reported only when this branch was actually executed. This change affects XPath if/then/else, XSLT's 
<code>xsl:if</code> and <code>xsl:choose</code>, and XQuery typeswitch.</p>
      <p>Tail-call optimization on <code>xsl:call-template</code> has also been improved. In the past this optimization was
never applied if the named template declared a return type. This restriction is removed. To enable this, the static type
inferencing on xsl:call-template has been improved. (Note however that declaring a return type on a match template will
still generally inhibit tail call optimization, because calls on <code>xsl:apply-templates</code> cannot be statically
analyzed.)</p>
      <p>Saxon-SA now optimizes certain multi-branch conditional expressions into an efficient switch expression. The
expressions that qualify are XSLT <code>xsl:choose</code> instructions or multi-way XPath/XQuery
<code>if () then ... else if () then ...</code> expressions where all the conditions take the form
of singleton comparisons of the same expression against different literal constants, for example
<code>@a = 3</code>, <code>@a = 7</code>, <code>@a = 8</code>. The expression on the left must
be identical in each case, and the constants on the right must all be of the same type. The expression
is optimized by putting the constant values (or collation keys derived from them) in a hash table
and doing a direct lookup on the value of the expression.</p>
      <p>There has been some tuning applied to the DOM interface, specifically the wrapper code which implements the
<code>NodeInfo</code> interface on top of <code>org.w3.dom.Node</code>. The frequently-used iterator for the child
axis was creating nodes for all the children in a list, to ensure that adjacent text nodes were properly 
concatenated. This has changed so that the creation of nodes is now incremental.</p>
      <p>Saxon now tries more aggressively to precompile regular expressions that are known at compile time, where these
are used in the XPath functions <code>matches()</code>, <code>tokenize()</code>, and <code>replace()</code>, or in the
<code>xsl:analyze-string</code> instruction. Previously, this was only done (in general) when the regex was written as a string
literal or a constant subexpression. It is now done also when the regex can be reduced to a string literal during earlier stages
of optimization. In particular, it now handles the case where the expression is written in the content of an XSLT variable,
as this is a popular coding idiom because it avoids problems with escaping curly braces and other special characters.</p>
      <p>There are some improvements in the optimization of expressions used in a context where the effective boolean
value is required. These now all use the same logic (implemented as a static method in class BooleanFn). 
Expressions known to return nodes are now wrapped in a call of exists(), which takes advantage of the ability
of some iterators to report whether any nodes exist without materializing the node. Expressions of the form
<code>A | B</code> appearing in a boolean context are rewritten as <code>exists(A) or exists(B)</code>,
which eliminates the costs of sorting into document order and checking for duplicates. Calls to normalize-space()
in a boolean context are optimized to simply test whether the string contains any non-whitespace characters.</p>
      <p>There has been a complete redesign of the optimization of expressions such as <code>SEQ[position() gt 5]</code> -
specifically, filter expressions that perform an explicit test on the position() function. These are generally
rewritten into a call of <code>subsequence()</code>, <code>remove()</code>, or the new internal function
<code>saxon:item-at()</code>, or (typically for <code>S[position() != 1]</code>) into a
 <code>TailExpression</code>. Where necessary, conditional logic is added to the call to handle the case
where the expression being compared to <code>position()</code> is not guaranteed to be an integer, or might
be an empty sequence. This redesign eliminates the need for the expression types <code>PositionRange</code>
and <code>SliceExpression</code>.</p>
      <p>Compile-time performance has been improved for expressions containing long lists of subexpressions
separated by the comma operator. Lists longer than 5000 or so items were blowing the Java stack, and the
compile time was also quadratic in the number of subexpressions.</p>
      <p class="subhead">Document Projection</p>
      <p>Document Projection is a mechanism that analyzes a query to determine what parts of a document it
can potentially access, and then while building a tree to represent the document, leaves out those parts
of the tree that cannot make any difference to the result of the query.</p>
      <p>In this release document projection is an option on the XQuery command line interface. Currently
it is only used if requested.</p>
      <p>Internally, the class <code>PathMap</code> computes (and represents) the set of paths within a document that
are followed by an expression, that is, for each document accessed by an expression, the set of nodes
that are reachable by the expression. A PathMap can be set on an <code>AugmentedSource</code> supplied
to the <code>Configuration.buildDocument()</code> method to request filtering of the document while
constructing the tree. If <code>-explain</code> is specified, the output includes feedback on the use
of document projection.</p>
      <table width="100%">
         <tr>
            <td>
               <p align="right"><a class="nav" href="diagnostics90.xml">Next</a></p>
            </td>
         </tr>
      </table>
   </body>
</html>