<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <link rel="stylesheet" href="style.css" type="text/css"> <meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type"> <link rel="Start" href="index.html"> <link rel="previous" href="Intro_extensions.html"> <link rel="next" href="Intro_events.html"> <link rel="Up" href="index.html"> <link title="Index of types" rel=Appendix href="index_types.html"> <link title="Index of exceptions" rel=Appendix href="index_exceptions.html"> <link title="Index of values" rel=Appendix href="index_values.html"> <link title="Index of class methods" rel=Appendix href="index_methods.html"> <link title="Index of classes" rel=Appendix href="index_classes.html"> <link title="Index of class types" rel=Appendix href="index_class_types.html"> <link title="Index of modules" rel=Appendix href="index_modules.html"> <link title="Index of module types" rel=Appendix href="index_module_types.html"> <link title="Pxp_types" rel="Chapter" href="Pxp_types.html"> <link title="Pxp_document" rel="Chapter" href="Pxp_document.html"> <link title="Pxp_dtd" rel="Chapter" href="Pxp_dtd.html"> <link title="Pxp_tree_parser" rel="Chapter" href="Pxp_tree_parser.html"> <link title="Pxp_core_types" rel="Chapter" href="Pxp_core_types.html"> <link title="Pxp_ev_parser" rel="Chapter" href="Pxp_ev_parser.html"> <link title="Pxp_event" rel="Chapter" href="Pxp_event.html"> <link title="Pxp_dtd_parser" rel="Chapter" href="Pxp_dtd_parser.html"> <link title="Pxp_codewriter" rel="Chapter" href="Pxp_codewriter.html"> <link title="Pxp_marshal" rel="Chapter" href="Pxp_marshal.html"> <link title="Pxp_yacc" rel="Chapter" href="Pxp_yacc.html"> <link title="Pxp_reader" rel="Chapter" href="Pxp_reader.html"> <link title="Intro_trees" rel="Chapter" href="Intro_trees.html"> <link title="Intro_extensions" rel="Chapter" href="Intro_extensions.html"> <link title="Intro_namespaces" rel="Chapter" href="Intro_namespaces.html"> <link title="Intro_events" rel="Chapter" href="Intro_events.html"> <link title="Intro_resolution" rel="Chapter" href="Intro_resolution.html"> <link title="Intro_getting_started" rel="Chapter" href="Intro_getting_started.html"> <link title="Intro_advanced" rel="Chapter" href="Intro_advanced.html"> <link title="Intro_preprocessor" rel="Chapter" href="Intro_preprocessor.html"> <link title="Example_readme" rel="Chapter" href="Example_readme.html"><link title="Namespaces" rel="Section" href="#1_Namespaces"> <link title="Namespace URI's and prefixes" rel="Subsection" href="#2_NamespaceURIsandprefixes"> <link title="Example for prefix normalization" rel="Subsection" href="#2_Exampleforprefixnormalization"> <link title="Getting more details of namespaces" rel="Subsection" href="#2_Gettingmoredetailsofnamespaces"> <title>PXP Reference : Intro_namespaces</title> </head> <body> <div class="navbar"><a href="Intro_extensions.html">Previous</a> <a href="index.html">Up</a> <a href="Intro_events.html">Next</a> </div> <center><h1>Intro_namespaces</h1></center> <br> <br> This text explains how PXP deals with the optional namespace declarations in XML text. <p> <a name="1_Namespaces"></a> <h1>Namespaces</h1> <p> PXP supports namespaces (but they have to be explicitly enabled). In order to simplify the handling of namespace-aware documents PXP applies a transformation to the document which is called "prefix normalization". This transformation ensures that every namespace prefix uniquely identifies a namespace throughout the whole document. <p> <a name="3_Linkstootherdocumentation"></a> <h3>Links to other documentation</h3> <p> <ul> <li><a href="Intro_getting_started.html#namespaces"><i>Namespaces</i></a></li> <li><a href="Pxp_dtd.namespace_manager.html"><code class="code"><span class="constructor">Pxp_dtd</span>.namespace_manager</code></a></li> <li><a href="Pxp_dtd.html#VALcreate_namespace_manager"><code class="code"><span class="constructor">Pxp_dtd</span>.create_namespace_manager</code></a></li> <li><a href="Pxp_dtd.namespace_scope.html"><code class="code"><span class="constructor">Pxp_dtd</span>.namespace_scope</code></a></li> <li><a href="Pxp_dtd.html#VALcreate_namespace_scope"><code class="code"><span class="constructor">Pxp_dtd</span>.create_namespace_scope</code></a></li> <li>Trees and namespaces: <a href="Intro_trees.html#access"><i>Access methods</i></a>, see the namespace subsection</li> <li><a href="Intro_advanced.html#irrnodes"><i>Irregular nodes: namespace nodes and attribute nodes</i></a></li> <li><a href="Intro_events.html#namespaces"><i>Events and namespaces</i></a></li> </ul> <a name="2_NamespaceURIsandprefixes"></a> <h2>Namespace URI's and prefixes</h2> <p> A namespace is identified by a namespace URI (e.g. something like "http://company.org/namespaces/project1" - note that this URI is simply processed as string, and never looked up by an HTTP access). For brevity of formulation, one has to define a so-called namespace prefix for such a URI. For example: <p> <pre></pre><code class="code"> <x:q xmlns:x=<span class="string">"http://company.org/namespaces/project1"</span>>...</q> </code><pre></pre> <p> The "xmlns:x" attribute is special, and declares that for this subtree the prefix "x" is to be used as replacement for the long URI. Here, "x:q" denotes that the element "q" in this namespace "x" is meant. <p> The problem is now that the URI defines the namespace, and not the prefix. In another subtree you may want to use the prefix "y" for the same namespace. This has always made it difficult to deal with namespaces in XML-processing software. <p> PXP, however, performs prefix normalization before it returns the tree. This means that all prefixes are changed to a norm prefix for the namespace. This can be the first prefix used for the namespace, or a prefix declared with a PXP extension, or a programmatically declared binding of the norm prefix to the namespace. <p> In order to use the PXP implementation of namespaces, one has to set <code class="code">enable_namespace_processing</code> in the parser configuration, and to use namespace-aware node implementations. If you don't use extended node trees, this means to use <a href="Pxp_tree_parser.html#VALdefault_namespace_spec"><code class="code"><span class="constructor">Pxp_tree_parser</span>.default_namespace_spec</code></a> instead of <a href="Pxp_tree_parser.html#VALdefault_spec"><code class="code"><span class="constructor">Pxp_tree_parser</span>.default_spec</code></a>. A good starting point to enable all that: <p> <pre></pre><code class="code"> <span class="keyword">let</span> nsmng = <span class="constructor">Pxp_dtd</span>.create_namespace_manager()<br> <span class="keyword">let</span> config = <br> { <span class="constructor">Pxp_types</span>.default_config <span class="keyword">with</span><br> enable_namespace_processing = <span class="constructor">Some</span> nsmng<br> }<br> <span class="keyword">let</span> source = ...<br> <span class="keyword">let</span> spec = <span class="constructor">Pxp_tree_parser</span>.default_namespace_spec<br> <span class="keyword">let</span> doc = <span class="constructor">Pxp_tree_parser</span>.parse_document_entity config source spec<br> <span class="keyword">let</span> root = doc<span class="keywordsign">#</span>root<br> </code><pre></pre> <p> The namespace-aware implementations of the <code class="code">node</code> class type define additional namespace methods like <code class="code">namespace_uri</code> (see <a href="Pxp_document.node.html#METHODnamespace_uri"><code class="code"><span class="constructor">Pxp_document</span>.node.namespace_uri</code></a>). (Although you also could direct the parser to create non-namespace-aware nodes, this does not make much sense, as you do not get these special access methods then.) <p> The method <code class="code">namespace_scope</code> (see <a href="Pxp_document.node.html#METHODnamespace_scope"><code class="code"><span class="constructor">Pxp_document</span>.node.namespace_scope</code></a>) allows one to get more information what happened during prefix normalization. In particular, it is possible to find out the original prefix in the XML text (which is also called <b>display prefix</b>), before it was mapped to the normalized prefix. The <code class="code">namespace_scope</code> method returns a <a href="Pxp_dtd.namespace_scope.html"><code class="code"><span class="constructor">Pxp_dtd</span>.namespace_scope</code></a> object with additional lookup methods. <p> <a name="2_Exampleforprefixnormalization"></a> <h2>Example for prefix normalization</h2> <p> In the following XML snippet the prefix "h" is declared as a shorthand for the XHTML namespace: <p> <pre></pre><code class="code"><h:html xmlns:h=<span class="string">"http://www.w3.org/1999/xhtml"</span>> <br> <h:head><br> <h:title><span class="constructor">Virtual</span> <span class="constructor">Library</span></h:title> <br> </h:head> <br> <h:body> <br> <h:p><span class="constructor">Moved</span> <span class="keyword">to</span> <h:a href=<span class="string">"http://vlib.org/"</span>>vlib.org</h:a>.</h:p> <br> </h:body> <br> </h:html><br> </code><pre></pre> <p> In this example, normalization changes nothing, because the prefix "h" has the same meaning thoughout the whole document. However, keep in mind that every author of XHTML documents can freely choose the prefix to use. <p> The XML standard gives the author of the document even the freedom to change the meaning of a prefix at any time. For example, here the prefix "x" is changed in the inner node: <p> <pre></pre><code class="code"><x:address xmlns:x=<span class="string">"http://addresses.org"</span>><br> <x:name xmlns:x=<span class="string">"http://names.org"</span>><br> <span class="constructor">Gerd</span> <span class="constructor">Stolpmann</span><br> </x:name><br> </x:address><br> </code><pre></pre> <p> In the outer node the prefix "x" is connected with the "http://addresses.org" namespace, but in the inner node it is connected with "http://names.org". <p> After normalization, the prefixes would look as follows: <p> <pre></pre><code class="code"><x:address xmlns:x=<span class="string">"http://addresses.org"</span>><br> <x1:name xmlns:x1=<span class="string">"http://names.org"</span>><br> <span class="constructor">Gerd</span> <span class="constructor">Stolpmann</span><br> </x1:name><br> </x:address><br> </code><pre></pre> <p> In order to avoid overridden prefixes, the prefix in the inner node was changed to "x1" (for type theorists: think of alpha conversion). <p> The idea of prefix normalization is to simplify how programs can match against element and attribute names. It is possible to configure the normalizer so that certain prefixes are used for certain URI's. In this example, we could direct the normalizer to use the prefixes "addr" and "nm" instead of the quite arbitrary strings "x" and "x1": <p> <pre></pre><code class="code">dtd <span class="keywordsign">#</span> namespace_manager <span class="keywordsign">#</span> add_namespace <span class="string">"addr"</span> <span class="string">"http://addresses.org"</span>;<br> dtd <span class="keywordsign">#</span> namespace_manager <span class="keywordsign">#</span> add_namespace <span class="string">"nm"</span> <span class="string">"http://names.org"</span>;<br> </code><pre></pre> <p> For this to work you need access to the <code class="code">dtd</code> object before the parser actually starts it work. The parsing functions in <a href="Pxp_tree_parser.html"><code class="code"><span class="constructor">Pxp_tree_parser</span></code></a> have the special hook <code class="code">transform_dtd</code> that is called at the right moment, and allows the program to enter such special configurations into the DTD object. The resulting program could look then like: <p> <pre></pre><code class="code"> <span class="keyword">let</span> nsmng = <span class="constructor">Pxp_dtd</span>.create_namespace_manager()<br> <span class="keyword">let</span> config = <br> { <span class="constructor">Pxp_types</span>.default_config <span class="keyword">with</span><br> enable_namespace_processing = <span class="constructor">Some</span> nsmng<br> }<br> <span class="keyword">let</span> source = ...<br> <span class="keyword">let</span> spec = <span class="constructor">Pxp_tree_parser</span>.default_namespace_spec<br> <span class="keyword">let</span> transform_dtd dtd =<br> dtd <span class="keywordsign">#</span> namespace_manager <span class="keywordsign">#</span> add_namespace <span class="string">"addr"</span> <span class="string">"http://addresses.org"</span>;<br> dtd <span class="keywordsign">#</span> namespace_manager <span class="keywordsign">#</span> add_namespace <span class="string">"nm"</span> <span class="string">"http://names.org"</span>;<br> dtd<br> <span class="keyword">let</span> doc = <br> <span class="constructor">Pxp_tree_parser</span>.parse_document_entity ~transform_dtd config source spec<br> <span class="keyword">let</span> root = doc<span class="keywordsign">#</span>root<br> </code><pre></pre> <p> Alternatively, it is also possible to put special processing instructions into the DTD: <p> <pre></pre><code class="code"><?pxp:dtd namespace prefix=<span class="string">"addr"</span> uri=<span class="string">"http://addresses.org"</span><span class="keywordsign">?></span><br> <?pxp:dtd namespace prefix=<span class="string">"nm"</span> uri=<span class="string">"http://names.org"</span><span class="keywordsign">?></span><br> </code><pre></pre> <p> The advantage of configuring specific normprefixes is that one can now use them directly in programs, e.g. for matching: <p> <pre></pre><code class="code"> <span class="keyword">match</span> node<span class="keywordsign">#</span>node_type <span class="keyword">with</span><br> <span class="keywordsign">|</span> <span class="constructor">T_element</span> <span class="string">"addr:address"</span> <span class="keywordsign">-></span> ...<br> <span class="keywordsign">|</span> <span class="constructor">T_element</span> <span class="string">"nm:name"</span> <span class="keywordsign">-></span> ...<br> </code><pre></pre> <p> <a name="2_Gettingmoredetailsofnamespaces"></a> <h2>Getting more details of namespaces</h2> <p> There are two additional objects that are relevant. First, there is a namespace manager for the whole tree. This object gathers all namespace URI's up that occur in the XML text, and decides which normprefixes are associated with them: <a href="Pxp_dtd.namespace_manager.html"><code class="code"><span class="constructor">Pxp_dtd</span>.namespace_manager</code></a>. <p> Second, there is the namespace scope. An XML tree may have a lot of such objects. A new scope object is created whenever new namespaces are introduced, i.e. when there are "xmlns" declarations. The scope object has a pointer to the scope object for the surrounding XML text. Scope objects are documented here: <a href="Pxp_dtd.namespace_scope.html"><code class="code"><span class="constructor">Pxp_dtd</span>.namespace_scope</code></a>. <p> Some examples (when <code class="code">n</code> is a node): <p> <ul> <li>To find out which normprefix is used for a namespace URI, use <pre></pre><code class="code"> n <span class="keywordsign">#</span> namespace_manager <span class="keywordsign">#</span> get_normprefix uri </code><pre></pre> </li> <li>To find out the reverse, i.e. which URI is represented by a certain normprefix, use <pre></pre><code class="code"> n <span class="keywordsign">#</span> namespace_manager <span class="keywordsign">#</span> get_primary_uri prefix </code><pre></pre> </li> <li>To find out which namespace URI is meant by a display prefix, i.e. the prefix as it occurs literally in the XML text: <pre></pre><code class="code"> n <span class="keywordsign">#</span> namespace_scope <span class="keywordsign">#</span> uri_of_display_prefix prefix </code><pre></pre> </li> </ul> <br> </body></html>