Sophie

Sophie

distrib > Mageia > 7 > armv7hl > by-pkgid > b3bdfe6d859a3d6920ff2c44b38e9a6f > files > 3006

saxon-manual-9.4.0.9-2.mga7.noarch.rpm

<?xml version="1.0" encoding="iso-8859-1"?>
<?xml-stylesheet href="../make-menu.xsl" type="text/xsl"?><html>
   <head>
      <this-is section="sourcedocs" page="streaming" subpage=""/>
      <!--
           Generated at 2011-12-09T20:47:22.916Z--><title>Saxonica: XSLT and XQuery Processing: Streaming of Large Documents</title>
      <meta name="coverage" content="Worldwide"/>
      <meta name="copyright" content="Copyright Saxonica Ltd"/>
      <meta name="title"
            content="Saxonica: XSLT and XQuery Processing: Streaming of Large Documents"/>
      <meta name="robots" content="noindex,nofollow"/>
      <link rel="stylesheet" href="../saxondocs.css" type="text/css"/>
   </head>
   <body class="main">
      <h1>Streaming of Large Documents</h1>
      <p>Sometimes source documents are too large to hold in memory. Saxon-EE provides a range of facilities for
      processing such documents in <i>streaming mode</i>: that is, processing data as it is read by the XML parser, without
      building a complete tree representation of the document in memory.</p>
      <p>Some of these facilities implement new features in the draft XSLT 3.0 standard (also known as XSLT 2.1). Some
      are specific to Saxon, and a few facilities are also available in XQuery.</p>
      <p>Inevitably there are things that cannot be done in streaming mode - sorting is an obvious example. Sometimes,
      achieving a streaming transformation means rethinking the design of how it works - for example, splitting it into
      multiple phases. So streaming is rarely a case of simply taking your existing code and setting a simple switch
      to request streamed implementation.</p>
      <p>There are basically two ways of doing streaming in Saxon:</p>
      <ul>
         <li>
            <p><a class="bodylink" href="../sourcedocs/streaming/burst-mode-streaming.xml">Burst-mode streaming</a>: with this approach, the transformation of a large file is broken
            up into a sequence of transformations of small pieces of the file. Each piece in turn is read from the input,
            turned into a small tree in memory, transformed, and written to the output file.</p>
            <p>This approach works well for files that are fairly flat in structure, for example a log file holding
            millions of log records, where the processing of each log record is independent of the ones that went before.</p>
            <p>A variant of this technique uses the new XSLT 3.0 <code>xsl:iterate</code> instruction to iterate
            over the records, in place of <code>xsl:for-each</code>. This allows working data to be maintained as
            the records are processed: this makes it possible, for example, to output totals or averages at the end of the run,
            or to make the processing of one record dependent on what came before it in the file. The <code>xsl:iterate</code>
            instruction also allows early exit from the loop, which makes it possible for a transformation to process data
            from the beginning of a large file without actually reading the whole file.</p>
            <p>Burst-mode streaming is available in both XSLT and XQuery, but there is no equivalent in XQuery to the
            <code>xsl:iterate</code> construct.</p>
         </li>
         <li>
            <p><a class="bodylink" href="../sourcedocs/streaming/streaming-templates.xml">Streaming templates</a>: this approach follows the traditional XSLT processing pattern of performing
                  a recursive descent of the input XML hierarchy by matching template rules to the nodes at each level, but
                  does so one element at a time, without building the tree in memory.</p>
            <p>Every template belongs to a <code>mode</code> (perhaps the default, unnamed mode), and streaming is a 
                  property of the mode that can be specified using the new <code>xsl:mode</code> declaration. If the mode
                  is declared to be streamable, then every template rule within that mode must obey the rules for
                  streamable processing.</p>
            <p>The rules for what is allowed in streamed processing are quite complicated, but the essential principle
                  is that the template rule for a given node can only read the descendants of that node once, in order.
                  There are further rules imposed by limitations in the current Saxon implementation: for example, although
                  grouping using <code>&lt;xsl:for-each-group group-adjacent="xxx"&gt;</code> is theoretically consistent
                  with a streamed implementation, it is not currently implemented in Saxon.</p>
            <p>The streamed template mechanism applies to XSLT only.</p>
         </li>
      </ul>
      <p>Both these facilities are available in Saxon-EE only. Streamed templates also require XSLT 3.0 to be enabled by setting
      the relevant configuration parameters or command line options.</p>
      <ul>
         <li>
            <p><a class="bodylink" href="streaming/burst-mode-streaming.xml">Burst-mode streaming</a></p>
         </li>
         <li>
            <p><a class="bodylink" href="streaming/furtherprocessing.xml">Processing the nodes returned by saxon:stream()</a></p>
         </li>
         <li>
            <p><a class="bodylink" href="streaming/partialreading.xml">Reading source documents partially</a></p>
         </li>
         <li>
            <p><a class="bodylink" href="streaming/streamable-xpath.xml">Streamable path expressions</a></p>
         </li>
         <li>
            <p><a class="bodylink" href="streaming/burst-mode-implementation.xml">How burst-mode streaming works</a></p>
         </li>
         <li>
            <p><a class="bodylink" href="streaming/streamwithiterate.xml">Using saxon:stream() with saxon:iterate</a></p>
         </li>
         <li>
            <p><a class="bodylink" href="streaming/streaming-templates.xml">Streaming Templates</a></p>
         </li>
      </ul>
      <table width="100%">
         <tr>
            <td>
               <p align="right"><a class="nav" href="streaming/burst-mode-streaming.xml">Next</a></p>
            </td>
         </tr>
      </table>
   </body>
</html>