Sophie

Sophie

distrib > Mageia > 7 > armv7hl > by-pkgid > b3bdfe6d859a3d6920ff2c44b38e9a6f > files > 2998

saxon-manual-9.4.0.9-2.mga7.noarch.rpm

<?xml version="1.0" encoding="iso-8859-1"?>
<?xml-stylesheet href="../make-menu.xsl" type="text/xsl"?><html>
   <head>
      <this-is section="sourcedocs" page="controlling-parsing" subpage=""/>
      <!--
           Generated at 2011-12-09T20:47:22.916Z--><title>Saxonica: XSLT and XQuery Processing: Controlling Parsing of Source Documents</title>
      <meta name="coverage" content="Worldwide"/>
      <meta name="copyright" content="Copyright Saxonica Ltd"/>
      <meta name="title"
            content="Saxonica: XSLT and XQuery Processing: Controlling Parsing of Source Documents"/>
      <meta name="robots" content="noindex,nofollow"/>
      <link rel="stylesheet" href="../saxondocs.css" type="text/css"/>
   </head>
   <body class="main">
      <h1>Controlling Parsing of Source Documents</h1>
      <p>Saxon does not include its own XML parser. By default:</p>
      <ul>
         <li content="para">
            <p>On the Java platform, the default SAX parser provided as part of the JDK is used.
            With the Sun/Oracle JDK, this is a variant of the Apache Xerces parser customized by Sun.</p>
         </li>
         <li content="para">
            <p>On the .NET platform, Saxon includes a copy of the Apache Xerces parser cross-compiled
            to run on .NET</p>
         </li>
      </ul>
      <p>An error reported by the XML parser is generally fatal. It is not possible to process ill-formed XML.</p>
      <p>There are several ways you can cause a different XML parser to be used:</p>
      <ul>
         <li content="para">
            <p>The -x and -y options on the command line can be used to specify the class name of a
     SAX parser, which Saxon will load in preference to the default SAX parser. The -x option is used for source XML documents,
     the -y option for schemas and stylesheets. The equivalent options can
     be set programmatically or by using the configuration file.</p>
         </li>
         <li content="para">
            <p>By default Saxon uses the <code>SAXParserFactory</code> mechanism to load a parser.
         This can be configured by setting the system property <code>javax.xml.parsers.SAXParserFactory</code>,
         by means of the file <code>lib/jaxp.properties</code> in the JRE directory, or by adding another parser
         to the <code>lib/endorsed</code> directory.</p>
         </li>
         <li content="para">
            <p>The source for parsing can be supplied in the form of a <code>SAXSource</code> object,
         which has an <code>XMLReader</code> property containing the parser instance to be used.</p>
         </li>
         <li content="para">
            <p>On .NET, the configuration option <code>PREFER_JAXP_PARSER</code> can be set to false,
         in which case Saxon will use the Microsoft XML parser instead of the Apache parser. (This parser is not used
         by default because it does not notify ID attributes to the application, which means the XPath <code>id()</code>
         and <code>idref()</code> functions do not work.)</p>
         </li>
      </ul>
      <p>Saxonica recommends use of the Xerces parser from Apache in preference to the version bundled in the JDK, which
        is known to have some serious bugs.</p>
      <p>By default, Saxon invokes the parser in non-validating mode (that is, without requested DTD validation). Note however,
        that the parser still needs to read the DTD if one is present, because it may contain entity definitions that need to 
        be expanded. DTD validation can be requested using <code>-dtd:on</code> on the command line, or equivalent API or
        configuration options.</p>
      <p>Saxon is issued with local copies of commonly-used W3C DTDs such as the XHTML, SVG, and MathML DTDs. When Saxon itself
        instantiates the XML parser, it will use an EntityResolver that causes these local copies of DTDs to be used rather
        than fetching public copies from the web (the W3C servers are increasingly failing to serve these requests as the
        volume of traffic is too high.) It is possible to override this using the configuration setting <code>ENTITY_RESOLVER_CLASS</code>,
        which can be set to the name of a user-supplied EntityResolver, or to the empty string to indicate that no EntityResolver should
        be used. Saxon will not add this EntityResolver in cases where the XML parser instance is supplied by the caller as part of
        a <code>SAXSource</code> object. It will add it to a parser obtained as an instance of the class
        specified using the -x and -y command line options, unless either the use of the EntityResolver is suppressed using the 
        <code>ENTITY_RESOLVER_CLASS</code> configuration option, or the instantiated parser already has an EntityResolver registered.</p>
      <p>Saxon never asks the XML parser to perform schema validation. If schema validation is required it should be requested using
        the command line options <code>-val:strict</code> or <code>-val:lax</code>, or their API equivalents. Saxon will then use its
        own schema processor to validate the document as it emerges from the XML parser. Schema processing is done in parallel with parsing,
        by use of a SAX-like pipeline.</p>
      <table width="100%">
         <tr>
            <td>
               <p align="right"><a class="nav" href="xml11.xml">Next</a></p>
            </td>
         </tr>
      </table>
   </body>
</html>