Sophie

Sophie

distrib > Mageia > 7 > x86_64 > by-pkgid > b3bdfe6d859a3d6920ff2c44b38e9a6f > files > 2996

saxon-manual-9.4.0.9-2.mga7.noarch.rpm

<?xml version="1.0" encoding="iso-8859-1"?>
<?xml-stylesheet href="../make-menu.xsl" type="text/xsl"?><html>
   <head>
      <this-is section="sourcedocs" page="collections" subpage=""/>
      <!--
           Generated at 2011-12-09T20:47:22.916Z--><title>Saxonica: XSLT and XQuery Processing: Collections</title>
      <meta name="coverage" content="Worldwide"/>
      <meta name="copyright" content="Copyright Saxonica Ltd"/>
      <meta name="title" content="Saxonica: XSLT and XQuery Processing: Collections"/>
      <meta name="robots" content="noindex,nofollow"/>
      <link rel="stylesheet" href="../saxondocs.css" type="text/css"/>
   </head>
   <body class="main">
      <h1>Collections</h1>
      <p>Saxon implements the <code>collection()</code> function by passing the given URI
(or null, if the default collection is requested) to a user-provided <a class="bodylink" href="../javadoc/net/sf/saxon/lib/CollectionURIResolver.html"><code>CollectionURIResolver</code></a>.
This section describes how the standard collection resolver behaves, if no user-written collection
resolver is supplied.</p>
      <p>The default collection resolver returns the empty sequence as the default collection. The only
way of specifying a default collection it to provide your own <code>CollectionURIResolver</code>.</p>
      <p>If a collection URI is provided, Saxon attempts to dereference it. What happens next depends on whether
the URI identifies a file or a directory.</p>
      <p class="subhead">Using catalog files</p>
      <p>If the collection URI identifies a file, Saxon treats this as a catalog file. This is a file in XML format that lists the
documents comprising the collection. Here is an example of such a catalog file:</p>
      <div class="codeblock"
           style="border: solid thin; background-color: #B1CCC7; padding: 2px">
         <pre>
            <code>
&lt;collection stable="true"&gt;
  &lt;doc href="dir/chap1.xml"/&gt;
  &lt;doc href="dir/chap2.xml"/&gt;
  &lt;doc href="dir/chap3.xml"/&gt;
  &lt;doc href="dir/chap4.xml"/&gt;
&lt;/collection&gt;</code>
         </pre>
      </div>
      <p>The <code>stable</code> attribute indicates whether the collection is stable or not. The default value is <code>true</code>.
If a collection is stable, then the URIs listed in the <code>doc</code> elements are treated like URIs passed
to the <code>doc()</code> function. Each URI is first looked up in the document pool to see if it is already loaded;
if it is, then the document node is returned. Otherwise the URI is passed to the registered <code>URIResolver</code>,
and the resulting document is added to the document pool. The effect of this process is firstly, that two calls on
the <code>collection()</code> function passing the same collection URI will return the same nodes each time, and secondly,
that these results are consistent with the results of the <code>doc()</code> function: if the <code>document-uri()</code> of
a node returned by the <code>collection()</code> function is passed to the <code>doc()</code> function, the original node
will be returned. If <code>stable="false"</code> is specified, however, the URI is dereferenced directly, and the document
is not added to the document pool, which means that a subsequent retrieval of the same document will not return the same node.</p>
      <p class="subhead">Processing directories</p>
      <p>If the URI passed to the <code>collection()</code> function (still assuming a default <code>CollectionURIResolver</code>)
identifies a directory, then the contents of the directory are returned. Such a URI may have a number of query parameters, written
in the form <code>file:///a/b/c/d?keyword=value;keyword=value;...</code>. The recognized keywords and their values are as follows:</p>
      <table>
         <tr>
            <td content="para">
               <p>
               <b>keyword</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>values</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>effect</b>
            </p>
            </td>
         </tr>
         <tr>
            <td content="para">
               <p>
               <b>recurse</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>yes | no (default no)</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>determine whether subdirectories are searched recursively</b>
            </p>
            </td>
         </tr>
         <tr>
            <td content="para">
               <p>
               <b>strip-space</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>yes | ignorable | no</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>determines whether whitespace text nodes are to be stripped.
The default depends on the Configuration settings.</b>
            </p>
            </td>
         </tr>
         <tr>
            <td content="para">
               <p>
               <b>validation</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>strip | preserve | lax | strict</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>determines whether and how schema validation is applied to each document.
The default depends on the Configuration settings.</b>
            </p>
            </td>
         </tr>
         <tr>
            <td content="para">
               <p>
               <b>select</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>file name pattern</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>determines which files are selected (see below)</b>
            </p>
            </td>
         </tr>
         <tr>
            <td content="para">
               <p>
               <b>on-error</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>fail | warning | ignore</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>determines the action to be taken if one of the documents cannot be successfully parsed</b>
            </p>
            </td>
         </tr>
         <tr>
            <td content="para">
               <p>
               <b>parser</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>Java class name</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>class name of the Java <code>XMLReader</code> to be used. For example, John Cowan's TagSoup parser may be selected by specifying
<code>parser=org.ccil.cowan.tagsoup.Parser</code> (this parses arbitrary ill-formed HTML and presents it
to Saxon as well-formed XML).</b>
            </p>
            </td>
         </tr>
         <tr>
            <td content="para">
               <p>
               <b>xinclude</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>yes | no</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>determines whether XInclude processing should be applied to the selected documents. This overrides
any setting in the <a class="bodylink" href="../javadoc/net/sf/saxon/Configuration.html"><code>Configuration</code></a> (or any command line option).</b>
            </p>
            </td>
         </tr>
         <tr>
            <td content="para">
               <p>
               <b>unparsed</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>yes | no (default no)</b>
            </p>
            </td>
            <td content="para">
               <p>
               <b>determine whether the files contain unparsed text. If <code>unparsed=yes</code> is specified, the files are
read as text using the platform default encoding. An error occurs if they contain characters that are not legal in XML.
The parameters that are specific to XML, such as strip-space, parser, and validation are ignored. The function
returns a document node representing each file; the document node holds a single text node containing the
file contents, and the document-uri() function returns the URI of the corresponding file.</b>
            </p>
            </td>
         </tr>
      </table>
      <p>The pattern used in the <code>select</code> parameter can take the conventional form, for example <code>*.xml</code> selects
all files with extension "xml". More generally, the pattern is converted to a regular expression by 
prepending "^", appending "$", replacing "." by "\.", and replacing "*" by ".*",
and it is then used to match the file names appearing in the directory using
the Java regular expression rules. So, for example, you can write <code>?select=*.(xml|xhtml)</code> to match files
with either of these two file extensions. Note however, that special characters used in the URL (that is, characters
with a special meaning in regular expressions) may need
to be escaped using the %HH convention. For example, vertical bar needs to be written as <code>%7C</code>. 
This escaping can be achieved using the iri-to-uri() function.</p>
      <p><i>A collection read in this way is not stable. Calling the <code>collection()</code> function again with the same URI will
reprocess the directory, and return a different set of document nodes, even if the contents of the directory have
not changed.</i></p>
      <p class="subhead">Registered Collections</p>
      <p>On the .NET product there is a third way to use a collection URI (provided that you use the API rather than
the command line): you can register a collection using the <code>Processor.RegisterCollection</code>
method on the <code>Saxon.Api.Processor</code> class.</p>
      <table width="100%">
         <tr>
            <td>
               <p align="right"><a class="nav" href="builder-api.xml">Next</a></p>
            </td>
         </tr>
      </table>
   </body>
</html>