<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <link rel="stylesheet" href="style.css" type="text/css"> <meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type"> <link rel="Start" href="index.html"> <link rel="previous" href="Pxp_dtd.html"> <link rel="next" href="Pxp_core_types.html"> <link rel="Up" href="index.html"> <link title="Index of types" rel=Appendix href="index_types.html"> <link title="Index of exceptions" rel=Appendix href="index_exceptions.html"> <link title="Index of values" rel=Appendix href="index_values.html"> <link title="Index of class methods" rel=Appendix href="index_methods.html"> <link title="Index of classes" rel=Appendix href="index_classes.html"> <link title="Index of class types" rel=Appendix href="index_class_types.html"> <link title="Index of modules" rel=Appendix href="index_modules.html"> <link title="Index of module types" rel=Appendix href="index_module_types.html"> <link title="Pxp_types" rel="Chapter" href="Pxp_types.html"> <link title="Pxp_document" rel="Chapter" href="Pxp_document.html"> <link title="Pxp_dtd" rel="Chapter" href="Pxp_dtd.html"> <link title="Pxp_tree_parser" rel="Chapter" href="Pxp_tree_parser.html"> <link title="Pxp_core_types" rel="Chapter" href="Pxp_core_types.html"> <link title="Pxp_ev_parser" rel="Chapter" href="Pxp_ev_parser.html"> <link title="Pxp_event" rel="Chapter" href="Pxp_event.html"> <link title="Pxp_dtd_parser" rel="Chapter" href="Pxp_dtd_parser.html"> <link title="Pxp_codewriter" rel="Chapter" href="Pxp_codewriter.html"> <link title="Pxp_marshal" rel="Chapter" href="Pxp_marshal.html"> <link title="Pxp_yacc" rel="Chapter" href="Pxp_yacc.html"> <link title="Pxp_reader" rel="Chapter" href="Pxp_reader.html"> <link title="Intro_trees" rel="Chapter" href="Intro_trees.html"> <link title="Intro_extensions" rel="Chapter" href="Intro_extensions.html"> <link title="Intro_namespaces" rel="Chapter" href="Intro_namespaces.html"> <link title="Intro_events" rel="Chapter" href="Intro_events.html"> <link title="Intro_resolution" rel="Chapter" href="Intro_resolution.html"> <link title="Intro_getting_started" rel="Chapter" href="Intro_getting_started.html"> <link title="Intro_advanced" rel="Chapter" href="Intro_advanced.html"> <link title="Intro_preprocessor" rel="Chapter" href="Intro_preprocessor.html"> <link title="Example_readme" rel="Chapter" href="Example_readme.html"><link title="ID indices" rel="Section" href="#1_IDindices"> <link title="Parsing functions" rel="Section" href="#1_Parsingfunctions"> <link title="Helpers" rel="Section" href="#1_Helpers"> <title>PXP Reference : Pxp_tree_parser</title> </head> <body> <div class="navbar"><a href="Pxp_dtd.html">Previous</a> <a href="index.html">Up</a> <a href="Pxp_core_types.html">Next</a> </div> <center><h1>Module <a href="type_Pxp_tree_parser.html">Pxp_tree_parser</a></h1></center> <br> <pre><span class="keyword">module</span> Pxp_tree_parser: <code class="code"><span class="keyword">sig</span></code> <a href="Pxp_tree_parser.html">..</a> <code class="code"><span class="keyword">end</span></code></pre>Calling the parser in tree mode<br> <hr width="100%"> <br> The following functions return the parsed XML text as tree, i.e. as <a href="Pxp_document.node.html"><code class="code"><span class="constructor">Pxp_document</span>.node</code></a> or <a href="Pxp_document.document.html"><code class="code"><span class="constructor">Pxp_document</span>.document</code></a>.<br> <br> <a name="1_IDindices"></a> <h1>ID indices</h1><br> <br> These indices are used to check the uniqueness of elements declared as <code class="code"><span class="constructor">ID</span></code>. Of course, the indices can also be used to quickly look up such elements.<br> <pre><span class="keyword">exception</span> <a name="EXCEPTIONID_not_unique"></a>ID_not_unique</pre> <div class="info"> Used inside <a href="Pxp_tree_parser.index.html"><code class="code"><span class="constructor">Pxp_tree_parser</span>.index</code></a> to indicate that the same ID is attached to several nodes<br> </div> <pre><span class="keyword">class type</span> <a name="TYPEindex"></a><code class="type">['a <a href="Pxp_document.node.html">Pxp_document.node</a> #<a href="Pxp_document.extension.html">Pxp_document.extension</a> as 'a]</code> <a href="Pxp_tree_parser.index.html">index</a> = <code class="code"><span class="keyword">object</span></code> <a href="Pxp_tree_parser.index.html">..</a> <code class="code"><span class="keyword">end</span></code></pre><div class="info"> The type of indexes over the ID attributes of the elements. </div> <pre><span class="keyword">class</span> <a name="TYPEhash_index"></a><code class="type">['a <a href="Pxp_document.node.html">Pxp_document.node</a> #<a href="Pxp_document.extension.html">Pxp_document.extension</a> as 'a]</code> <a href="Pxp_tree_parser.hash_index.html">hash_index</a> : <code class="type"></code><code class="code"><span class="keyword">object</span></code> <a href="Pxp_tree_parser.hash_index.html">..</a> <code class="code"><span class="keyword">end</span></code></pre><div class="info"> This is a simple implementation of <a href="Pxp_tree_parser.index.html"><code class="code"><span class="constructor">Pxp_tree_parser</span>.index</code></a> using a hash table. </div> <br> <a name="1_Parsingfunctions"></a> <h1>Parsing functions</h1><br> <br> There are two types of XML texts one can parse:<ul> <li>Closed XML documents</li> <li>External XML entities</li> </ul> Usually, the functions for closed XML documents are the right ones. The exact difference between both types is subtle, as many texts are parseable in both ways. The idea, however, is that an external XML entity is text from a different file that is included by reference into a closed document. Some XML features are only meaningful for the whole document, and are not available when only an external entity is parsed. This includes:<ul> <li>The DOCTYPE and the DTD declarations</li> <li>The standalone declaration</li> </ul> It is a syntax error to use these features in an external XML entity. <p> An external entity is a file referenced by another XML text. For example, this document includes "file.xml" as external entity: <p> <pre></pre><code class="code"> <?xml version=<span class="string">"1.0"</span><span class="keywordsign">?></span><br> <!<span class="constructor">DOCTYPE</span> root [<br> <!<span class="constructor">ENTITY</span> extref <span class="constructor">SYSTEM</span> <span class="string">"file.xml"</span>><br> ]><br> <root><br> <span class="keywordsign">&</span>extref;<br> </root><br> </code><pre></pre> <p> (In contrast to this, an internal entity would give the definition text immediately, e.g. <code class="code"><!<span class="constructor">ENTITY</span> intref <span class="string">"This is the entity text"</span>></code>.) Of course, it does not make sense that the external entity has another DOCTYPE definition, and hence it is forbidden to use this feature in "file.xml". <p> There is no function to exactly parse a file like "file.xml" as if it was included into a bigger document. The closest behavior show <a href="Pxp_tree_parser.html#VALparse_content_entity"><code class="code"><span class="constructor">Pxp_tree_parser</span>.parse_content_entity</code></a> and <a href="Pxp_tree_parser.html#VALparse_wfcontent_entity"><code class="code"><span class="constructor">Pxp_tree_parser</span>.parse_wfcontent_entity</code></a>. They implement the additional constraint that the file has to have a single top-most element. <p> The following functions also distinguish between validating and well-formedness mode. In the latter mode, many formal document constraints are not enforced. For instance, elements and attributes need not to be declared. <p> There are, unfortunately, a number of myths about well-formed XML documents. One says that the declarations are completely ignored. This is of course not true. For example, the above shown example includes the external XML entity "file.xml" by reference. The <code class="code"><!<span class="constructor">ENTITY</span>></code> declaration is respected no matter in which mode the parser is run. Also, it is not true that the presence of <code class="code"><span class="constructor">DOCTYPE</span></code> indicates validated mode and the absence well-formedness mode. The presence of <code class="code"><span class="constructor">DOCTYPE</span></code> is perfectly compatible with well-formedness mode - only that the declarations are interpreted in a different way. <p> If it is tried to parse a document in validating mode, but the <code class="code"><span class="constructor">DOCTYPE</span></code> is missing, this parser will fail when the root element is parsed, because its declaration is missing. This conforms to the XML standard, and also follows the logic that the program calling the parser is written in the expectation that the parsed file is validated. If this validation is missing, the program can run into failed assertions (or worse).<br> <pre><span class="keyword">val</span> <a name="VALparse_document_entity"></a>parse_document_entity : <code class="type">?transform_dtd:(<a href="Pxp_dtd.dtd.html">Pxp_dtd.dtd</a> -> <a href="Pxp_dtd.dtd.html">Pxp_dtd.dtd</a>) -><br> ?id_index:('a <a href="Pxp_document.node.html">Pxp_document.node</a> #<a href="Pxp_document.extension.html">Pxp_document.extension</a> as 'a)<br> <a href="Pxp_tree_parser.index.html">index</a> -><br> <a href="Pxp_types.html#TYPEconfig">Pxp_types.config</a> -><br> <a href="Pxp_types.html#TYPEsource">Pxp_types.source</a> -> 'a <a href="Pxp_document.html#TYPEspec">Pxp_document.spec</a> -> 'a <a href="Pxp_document.document.html">Pxp_document.document</a></code></pre><div class="info"> Parse a closed document, and validate the contents of the document against the DTD contained and/or referenced in the document. <p> If the optional argument <code class="code">transform_dtd</code> is passed, the following modification applies: After the DTD (both the internal and external subsets) has been read, the function <code class="code">transform_dtd</code> is called, and the resulting DTD is actually used to validate the document. This makes it possible<ul> <li>to check which DTD is used (e.g. by comparing <a href="Pxp_dtd.dtd.html#METHODid"><code class="code"><span class="constructor">Pxp_dtd</span>.dtd.id</code></a> with a list of allowed ID's)</li> <li>to apply modifications to the DTD before content parsing is started</li> <li>to even switch to a built-in DTD, and to drop all user-defined declarations.</li> </ul> If the optional argument <code class="code">transform_dtd</code> is missing, the parser behaves in the same way as if the identity were passed as <code class="code">transform_dtd</code>, i.e. the DTD is left unmodified. <p> If the optional argument <code class="code">id_index</code> is present, the parser adds any ID attribute to the passed index. An index is required to detect violations of the uniqueness of IDs.<br> </div> <pre><span class="keyword">val</span> <a name="VALparse_wfdocument_entity"></a>parse_wfdocument_entity : <code class="type">?transform_dtd:(<a href="Pxp_dtd.dtd.html">Pxp_dtd.dtd</a> -> <a href="Pxp_dtd.dtd.html">Pxp_dtd.dtd</a>) -><br> <a href="Pxp_types.html#TYPEconfig">Pxp_types.config</a> -><br> <a href="Pxp_types.html#TYPEsource">Pxp_types.source</a> -><br> ('a <a href="Pxp_document.node.html">Pxp_document.node</a> #<a href="Pxp_document.extension.html">Pxp_document.extension</a> as 'a) <a href="Pxp_document.html#TYPEspec">Pxp_document.spec</a> -><br> 'a <a href="Pxp_document.document.html">Pxp_document.document</a></code></pre><div class="info"> Parse a closed document, but do not validate it. Only checks on well-formedness are performed. <p> The option <code class="code">transform_dtd</code> works as for <code class="code">parse_document_entity</code>, but the resulting DTD is not used for validation. It is just included into the returned document (e.g. useful to get entity declarations).<br> </div> <pre><span class="keyword">val</span> <a name="VALparse_content_entity"></a>parse_content_entity : <code class="type">?id_index:('a <a href="Pxp_document.node.html">Pxp_document.node</a> #<a href="Pxp_document.extension.html">Pxp_document.extension</a> as 'a)<br> <a href="Pxp_tree_parser.index.html">index</a> -><br> <a href="Pxp_types.html#TYPEconfig">Pxp_types.config</a> -><br> <a href="Pxp_types.html#TYPEsource">Pxp_types.source</a> -><br> <a href="Pxp_dtd.dtd.html">Pxp_dtd.dtd</a> -> 'a <a href="Pxp_document.html#TYPEspec">Pxp_document.spec</a> -> 'a <a href="Pxp_document.node.html">Pxp_document.node</a></code></pre><div class="info"> Parse a file representing a well-formed fragment of a document. The fragment must be a single element (i.e. something like <code class="code"><a>...</a></code>; not a sequence like <code class="code"><a>...</a><b>...</b></code>). The element is validated against the passed DTD, but it is not checked whether the element is the root element specified in the DTD. <b>This function is almost always the wrong one to call. Rather consider <a href="Pxp_tree_parser.html#VALparse_document_entity"><code class="code"><span class="constructor">Pxp_tree_parser</span>.parse_document_entity</code></a>.</b> <p> Despite its name, this function <b>cannot</b> parse the <code class="code">content</code> production defined in the XML specification! This is a misnomer I'm sorry about. The <code class="code">content</code> production would allow to parse a list of elements and other node kinds. Also, this function corresponds to the event entry point <code class="code"><span class="keywordsign">`</span><span class="constructor">Entry_element_content</span></code> and not <code class="code"><span class="keywordsign">`</span><span class="constructor">Entry_content</span></code>. <p> If the optional argument <code class="code">id_index</code> is present, the parser adds any ID attribute to the passed index. An index is required to detect violations of the uniqueness of IDs.<br> </div> <pre><span class="keyword">val</span> <a name="VALparse_wfcontent_entity"></a>parse_wfcontent_entity : <code class="type"><a href="Pxp_types.html#TYPEconfig">Pxp_types.config</a> -><br> <a href="Pxp_types.html#TYPEsource">Pxp_types.source</a> -><br> ('a <a href="Pxp_document.node.html">Pxp_document.node</a> #<a href="Pxp_document.extension.html">Pxp_document.extension</a> as 'a) <a href="Pxp_document.html#TYPEspec">Pxp_document.spec</a> -><br> 'a <a href="Pxp_document.node.html">Pxp_document.node</a></code></pre><div class="info"> Parse a file representing a well-formed fragment of a document. The fragment is not validated, only checked for well-formedness. See also the notes for <a href="Pxp_tree_parser.html#VALparse_content_entity"><code class="code"><span class="constructor">Pxp_tree_parser</span>.parse_content_entity</code></a>.<br> </div> <br> <a name="1_Helpers"></a> <h1>Helpers</h1><br> <pre><span class="keyword">val</span> <a name="VALdefault_extension"></a>default_extension : <code class="type">'a <a href="Pxp_document.node.html">Pxp_document.node</a> <a href="Pxp_document.extension.html">Pxp_document.extension</a> as 'a</code></pre><div class="info"> A "null" extension; an extension that does not extend the functionality<br> </div> <pre><span class="keyword">val</span> <a name="VALdefault_spec"></a>default_spec : <code class="type">('a <a href="Pxp_document.node.html">Pxp_document.node</a> <a href="Pxp_document.extension.html">Pxp_document.extension</a> as 'a) <a href="Pxp_document.html#TYPEspec">Pxp_document.spec</a></code></pre><div class="info"> Specifies that you do not want to use extensions.<br> </div> <pre><span class="keyword">val</span> <a name="VALdefault_namespace_spec"></a>default_namespace_spec : <code class="type">('a <a href="Pxp_document.node.html">Pxp_document.node</a> <a href="Pxp_document.extension.html">Pxp_document.extension</a> as 'a) <a href="Pxp_document.html#TYPEspec">Pxp_document.spec</a></code></pre><div class="info"> Specifies that you want to use namespace, but not extensions<br> </div> </body></html>