Sophie

Sophie

distrib > Fedora > 14 > x86_64 > media > updates > by-pkgid > 71d40963b505df4524269198e237b3e3 > files > 841

virtuoso-opensource-doc-6.1.4-2.fc14.noarch.rpm

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
 <head profile="http://internetalchemy.org/2003/02/profile">
  <link rel="foaf" type="application/rdf+xml" title="FOAF" href="http://www.openlinksw.com/dataspace/uda/about.rdf" />
  <link rel="schema.dc" href="http://purl.org/dc/elements/1.1/" />
  <meta name="dc.title" content="13. XML Support" />
  <meta name="dc.subject" content="13. XML Support" />
  <meta name="dc.creator" content="OpenLink Software Documentation Team ;&#10;" />
  <meta name="dc.copyright" content="OpenLink Software, 1999 - 2009" />
  <link rel="top" href="index.html" title="OpenLink Virtuoso Universal Server: Documentation" />
  <link rel="search" href="/doc/adv_search.vspx" title="Search OpenLink Virtuoso Universal Server: Documentation" />
  <link rel="parent" href="webandxml.html" title="Chapter Contents" />
  <link rel="prev" href="xmlservices.html" title="Virtuoso XML Services" />
  <link rel="next" href="updategrams.html" title="Using UpdateGrams to Modify Data" />
  <link rel="shortcut icon" href="../images/misc/favicon.ico" type="image/x-icon" />
  <link rel="stylesheet" type="text/css" href="doc.css" />
  <link rel="stylesheet" type="text/css" href="/doc/translation.css" />
  <title>13. XML Support</title>
  <meta http-equiv="Content-Type" content="text/xhtml; charset=UTF-8" />
  <meta name="author" content="OpenLink Software Documentation Team ;&#10;" />
  <meta name="copyright" content="OpenLink Software, 1999 - 2009" />
  <meta name="keywords" content="" />
  <meta name="GENERATOR" content="OpenLink XSLT Team" />
 </head>
 <body>
  <div id="header">
    <a name="queryingxmldata" />
    <img src="../images/misc/logo.jpg" alt="" />
    <h1>13. XML Support</h1>
  </div>
  <div id="navbartop">
   <div>
      <a class="link" href="webandxml.html">Chapter Contents</a> | <a class="link" href="xmlservices.html" title="Virtuoso XML Services">Prev</a> | <a class="link" href="updategrams.html" title="Using UpdateGrams to Modify Data">Next</a>
   </div>
  </div>
  <div id="currenttoc">
   <form method="post" action="/doc/adv_search.vspx">
    <div class="search">Keyword Search: <br />
        <input type="text" name="q" /> <input type="submit" name="go" value="Go" />
    </div>
   </form>
   <div>
      <a href="http://www.openlinksw.com/">www.openlinksw.com</a>
   </div>
   <div>
      <a href="http://docs.openlinksw.com/">docs.openlinksw.com</a>
   </div>
    <br />
   <div>
      <a href="index.html">Book Home</a>
   </div>
    <br />
   <div>
      <a href="contents.html">Contents</a>
   </div>
   <div>
      <a href="preface.html">Preface</a>
   </div>
    <br />
   <div class="selected">
      <a href="webandxml.html">XML Support</a>
   </div>
    <br />
   <div>
      <a href="forxmlforsql.html">Rendering SQL Queries as XML (FOR XML Clause)</a>
   </div>
   <div>
      <a href="composingxmlinsql.html">XML Composing Functions in SQL Statements (SQLX)</a>
   </div>
   <div>
      <a href="xmlservices.html">Virtuoso XML Services</a>
   </div>
   <div class="selected">
      <a href="queryingxmldata.html">Querying Stored XML Data</a>
    <div>
        <a href="#xpathcontainsSQLPred" title="XPATH_CONTAINS SQL Predicate">XPATH_CONTAINS SQL Predicate</a>
        <a href="#qryusingxpath_eval" title="Using xpath_eval()">Using xpath_eval()</a>
        <a href="#wxmlextentrefinxml" title="External Entity References in Stored XML">External Entity References in Stored XML</a>
        <a href="#wamlschmdtdfuncs" title="XML Schema &amp; DTD Functions">XML Schema &amp; DTD Functions</a>
        <a href="#usingxmlfreetext" title="Using XML and Free Text">Using XML and Free Text</a>
        <a href="#xcontainspredicate" title="XCONTAINS predicate">XCONTAINS predicate</a>
        <a href="#textcontainsxpath" title="text-contains XPath Predicate">text-contains XPath Predicate</a>
        <a href="#xmlfreetextrules" title="XML Free Text Indexing Rules">XML Free Text Indexing Rules</a>
        <a href="#xmlencoding" title="XML Processing &amp; Free Text Encoding Issues">XML Processing &amp; Free Text Encoding Issues</a>
    </div>
   </div>
   <div>
      <a href="updategrams.html">Using UpdateGrams to Modify Data</a>
   </div>
   <div>
      <a href="xmltemplates.html">XML Templates</a>
   </div>
   <div>
      <a href="xmlschema.html">XML DTD and XML Schemas</a>
   </div>
   <div>
      <a href="xq.html">XQuery 1.0 Support</a>
   </div>
   <div>
      <a href="xslttrans.html">XSLT Transformation</a>
   </div>
   <div>
      <a href="xmltype.html">XMLType</a>
   </div>
   <div>
      <a href="xmldom.html">Changing XML entities in DOM style</a>
   </div>
    <br />
  </div>
  <div id="text">
    <a name="queryingxmldata" />
    <h2>13.4. Querying Stored XML Data</h2>
	
		<a name="xpathcontainsSQLPred" />
    <h3>13.4.1. XPATH_CONTAINS SQL Predicate</h3>
		<p>
XPath expressions can be used in SQL statements to decompose and match
XML data stored in columns.  The <span class="computeroutput">xpath_contains</span>
SQL predicate can be used either to test for an XML value matching a
path expression or to extract one or more entities from the XML value.
These values can then be used later in the query as contexts for other
XPath expressions.
	</p>
<div>
      <pre class="programlisting">
xpath_contains (xml_column, xp_expression[, query_variable]);
</pre>
    </div>
		<p>
The first argument, <span class="computeroutput">xml_column</span> is the name of the column on which to perform the XPath search.
The second argument, <span class="computeroutput">xp_expression</span>, takes an XPath
expression.
</p>
		<p>
The third argument is an optional query variable that gets bound to
each result entity value of the xpath expression.  If this variable is
omitted the xpath_contains predicate will qualify the query by
returning true for matches.  In this case the result will only return
one row per match.  If the variable is present, the result set could
contain multiple rows per result set row of the base table, one row for
each match.
</p>
		<p>
Consider the example:
	</p>
		<div>
      <pre class="programlisting">
select xt_file, t from xml_text
  where xpath_contains (xt_text, &#39;//chapter/title[position () = 1]&#39;, t);
</pre>
    </div>
		<p>
This SQL statement will select the first title child of any chapter
entities in the XML documents in the xt_text column of the table
<span class="computeroutput">xml_text</span>.  There can be several matching
entities per row of xml_text.  The result set will contain a row for
each matching entity.
</p>

		<p>
In XPath terms the path expression of
<span class="computeroutput">xpath_contains</span> is evaluated with the context
node set to the root node of the XML tree represented by the value of
the column that is the first argument of xpath_contains.  This node is
the only element of the context node set.
	</p>
<div class="note">
      <div class="notetitle">Note:</div>
		<p>
The &#39;t&#39; variable in the above example gets bound to XML entities, not
to their string values or other representations. One can thus use
these values as context nodes for other expressions.
	</p>
</div>
<p>
The XPATH expression can have a list of options in the beginning.
The list of options is surrounded by square brackets.
Options in the list are delimited by spaces.
The most popular option is <span class="computeroutput">__quiet</span> that allows to
process a set of rows if not all stored documents are valid XMLs;
if an error is signalled by the XML parser when it prepares a content document for the XPATH in question
and the XPATH contains <span class="computeroutput">__quiet</span> then the error is suppressed and
the row is silently ignored as if XPATH found nothing.
One can configure the DTD validator of the parser by placing its <a href="xmlschema.html#dtd_config">configuration parameters</a> in
the list of XPATH options.
</p>
<p>
The following example is almost identical to the previous one but it works even if
not all values of <span class="computeroutput">xt_text</span> are valid XMLs, and the resulting values of the
&#39;t&#39; variable are standalone entities even if source documents in xt_text contain external generic entities.
	</p>
<div>
      <pre class="programlisting">
select xt_file, t from xml_text
  where xpath_contains (xt_text, &#39;[__quiet BuildStandalone=ENABLE]//chapter/title[position () = 1]&#39;, t);
</pre>
    </div>

<br />

<a name="qryusingxpath_eval" />
    <h3>13.4.2. Using xpath_eval()</h3>

	<p>The <span class="computeroutput">xpath_eval()</span> function is used to filter
	out parts of an XML fragment that match a given XPATH expression.  It
	can be used to retrieve multiple-node answers to queries, as it is often
	the case that more than one node-set matches.

	Consider the following statements that create a table with XML stored inside.</p>

<div>
      <pre class="programlisting">
CREATE TABLE t_articles (
	article_id int NOT NULL,
	article_title varchar(255) NOT NULL,
	article_xml long varchar
	);

insert into t_articles (article_id, article_title) values (1, &#39;a&#39;);
insert into t_articles (article_id, article_title) values (2, &#39;b&#39;);

UPDATE t_articles SET article_xml = &#39;
&lt;beatles id = &quot;b1&quot;&gt;
&lt;beatle instrument = &quot;guitar&quot; alive = &quot;no&quot;&gt;john lennon&lt;/beatle&gt;
&lt;beatle instrument = &quot;guitar&quot; alive = &quot;no&quot;&gt;george harrison&lt;/beatle&gt;
&lt;/beatles&gt;&#39;
WHERE article_id = 1;

UPDATE t_articles SET article_xml = &#39;
&lt;beatles id = &quot;b2&quot;&gt;
&lt;beatle instrument = &quot;bass&quot; alive = &quot;yes&quot;&gt;paul mccartney&lt;/beatle&gt;
&lt;beatle instrument = &quot;drums&quot; alive = &quot;yes&quot;&gt;ringo starr&lt;/beatle&gt;
&lt;/beatles&gt;&#39;
WHERE article_id = 2;
</pre>
    </div>

<p>Now we make a query that will return a vector of results, each
vector element corresponding to a node-set of the result.</p>

<div>
      <pre class="programlisting">
SELECT xpath_eval(&#39;//beatle/@instrument&#39;, xml_tree_doc (article_xml), 0)
	AS beatle_instrument FROM t_articles WHERE article_id = 2;
</pre>
    </div>

<p>The repeating nodes are returned as part of a vector, the third argument
to <span class="computeroutput">xpath_eval()</span> is set to 0, which means that it is to return all nodes.</p>

<p>Otherwise, we can select the first node-set by supplying 1 as the third
parameter to <span class="computeroutput">xpath_eval()</span>: </p>

<div>
      <pre class="programlisting">
SELECT xpath_eval(&#39;//beatle/@instrument&#39;, xml_tree_doc (article_xml), 1)
	AS beatle_instrument FROM t_articles WHERE article_id = 2;
</pre>
    </div>


<div class="tip">
      <div class="tiptitle">See Also:</div>
  <p>
        <a href="fn_xpath_eval.html">xpath_eval()</a>
      </p>
  <p>
        <a href="fn_xquery_eval.html">xquery_eval()</a>
      </p>
  <p>
        <a href="">xmlupdate()</a>
      </p>
</div>

<br />


<a name="wxmlextentrefinxml" />
    <h3>13.4.3. External Entity References in Stored XML</h3>

	<p>
When an XML document is stored as either text or in persistent XML
format it can contain references to external parsed entities with the
&lt;!entity ...&gt; declaration and the &amp;xx; syntax.  These are
stored as references and not expanded at storage time if the entity is
external.  Such references are transparently followed by XPath and
XSLT.  A run-time error occurs if the referenced resource cannot be
accessed when needed.  The reference is only followed if the actual
subtree is selected by XPath or XSLT.  The resource is retrieved at
most once for each XPath or XSLT operation referencing it, regardless
of the number of times the link is traversed.  This is transparent, so
that the document node of the referenced entity appears as if it were
in the place of the reference.
</p>
	<p>
External entity references have an associated URI, which is either
absolute, with protocol identifier and full path, or relative.
Virtuoso resolves relative references with respect to the base URI of
the referencing document.  If the document is stored as a column value
in a table it does not have a natural base URI; therefore, the
application must supply one if relative references are to be
supported.  This is done by specifying an extra column of the same
table to contain a path, in the form of collections delimited by
slashes, like the path of a DAV resource or a Unix file system path.
This base URI is associated with an XML column with the IDENTIFIED BY
declaration:
</p>

	<div>
      <pre class="programlisting">
create table XML_TEXT (
	XT_ID integer,
	XT_FILE varchar,
	XT_TEXT long varchar identified by xt_file,
		primary key (XT_ID)
	);

create index XT_FILE on XML_TEXT (XT_FILE);
</pre>
    </div>

	<p>
Thus, each time the value of <span class="computeroutput">xt_text</span> is
retrieved for XML processing by <span class="computeroutput">xpath_contains()</span>
or <span class="computeroutput">xcontains()</span> the base URI is taken from
<span class="computeroutput">xt_file</span>.  The complete URI for the
<span class="computeroutput">xt_text</span> of a column of the sample table would
be:
</p>

	<div>
      <pre class="programlisting">
virt://&lt;qualified table name&gt;.&lt;uri column&gt;.&lt;text column&gt;:&lt;uri column value&gt;
</pre>
    </div>

	<p>
An example would be:
</p>

	<div>
      <pre class="programlisting">
&quot;virt://DB.DBA.XML_TEXT.XT_FILE.XT_TEXT:sqlreference.xml&quot;
</pre>
    </div>

	<p>
The &#39;..&#39; and &#39;.&#39; in relative paths are treated like file names when
combining relative references to base URIs. A relative reference
without a path just replaces the last part of the path in the base
URI.
</p>
<div class="tip">
      <div class="tiptitle">See Also:</div>
	<p>
<a href="fn_xml_uri_get.html">xml_uri_get() and
xml_uri_merge()</a> for more details.
</p>
</div>
<br />


<a name="wamlschmdtdfuncs" />
    <h3>13.4.4. XML Schema &amp; DTD Functions</h3>

<p>The following functions can be used to generate XML Schema or
DTD information about a given SQL query:</p>

  <ul>
    <li>
        <a href="fn_xml_auto_schema.html">xml_auto_schema()</a>
    </li>
    <li>
        <a href="fn_xml_auto_dtd.html">xml_auto_dtd()</a>
    </li>
    </ul>

<a name="ex_webandxmlautoschdtd" />
    <div class="example">
      <div class="exampletitle">Generating XML Schema and DTD Data</div>
<p>This example shows trivial use of the two functions
<span class="computeroutput">xml_auto_schema()</span> and <span class="computeroutput">xml_auto_dtd()</span>.
</p>
<div>
        <pre class="programlisting">
SQL&gt; select xml_auto_schema(&#39;select u_name from sys_users&#39;, &#39;root&#39;);
callret
VARCHAR
_______________________________________________________________________________

&lt;xsd:schema xmlns:xsd=&quot;http://www.w3.org/2001/XMLSchema&quot;&gt;

 &lt;xsd:annotation&gt;
  &lt;xsd:documentation&gt;
   Schema for output of the following SQL statement:

   &lt;![CDATA[select u_name from sys_users]]&gt;

  &lt;/xsd:documentation&gt;
 &lt;/xsd:annotation&gt;

 &lt;xsd:element name=&quot;root&quot; type=&quot;root__Type&quot;/&gt;

 &lt;xsd:complexType name=&quot;root__Type&quot;&gt;
  &lt;xsd:sequence&gt;
   &lt;xsd:element name=&quot;SYS_USERS&quot; type=&quot;SYS_USERS_Type&quot; minOccurs=&quot;0&quot; maxOccurs=&quot;unbounded&quot;/&gt;
  &lt;/xsd:sequence&gt;
 &lt;/xsd:complexType&gt;

 &lt;xsd:complexType name=&quot;SYS_USERS_Type&quot;&gt;
  &lt;xsd:attribute name=&quot;U_NAME&quot; type=&quot;xsd:string&quot;/&gt;
 &lt;/xsd:complexType&gt;

&lt;/xsd:schema&gt;

1 Rows. -- 1843 msec.
SQL&gt; select xml_auto_dtd(&#39;select u_name from sys_users&#39;, &#39;root&#39;);
callret
VARCHAR
_______________________________________________________________________________

&lt;!-- dtd for output of the following SQL statement:
select u_name from sys_users
--&gt;
&lt;!ELEMENT root (#PCDATA | SYS_USERS)* &gt;
&lt;!ELEMENT SYS_USERS (#PCDATA)* &gt;
&lt;!ATTLIST SYS_USERS
        U_NAME  CDATA   #IMPLIED        &gt;

1 Rows. -- 411 msec.
</pre>
      </div>
</div>

<br />


<a name="usingxmlfreetext" />
    <h3>13.4.5. Using XML and Free Text</h3>

	<p>
Virtuoso integrates classic free text retrieval and XML
semi-structured query features to offer a smart, scalable XML
repository.  When a column is declared as indexed XML with the CREATE
TEXT XML INDEX statement the text is checked for well-formedness at
time of storage.  The specific XML structure of the text is also
considered when making the free text index entries.  This XML-aware
free text index is used for processing XPath queries in the
<span class="computeroutput">xcontains</span> SQL predicate.  This predicate is
only applicable to columns for which there is an XML free text index.
</p>
	<p>
Arbitrary free text criteria can appear inside the XPath expression of
<span class="computeroutput">xcontains</span>.  These are introduced by the XPath
extension function <span class="computeroutput">text-contains()</span>, which may
only be used within <span class="computeroutput">xcontains</span> as it relies on
the underlying free text index.
</p>
<div class="note">
      <div class="notetitle">Note</div>
<p>
        <span class="computeroutput">xpath_contains()</span> does not require the
existence of a free text index and can thus apply to any well-formed
XML content.</p>
</div>
<br />

	
	<a name="xcontainspredicate" />
    <h3>13.4.6. XCONTAINS predicate</h3>

	<p>This predicate is used in a SQL statement, it returns &quot;true&quot; if a free text
  indexed column with XML content matches an XPATH expression.  Optionally
  produces the matching node set as a result set.</p>

	<p>
Syntax
</p>

	<div>
      <pre class="programlisting">
xcontains_pred:
	xcontains (column, expr [, result_var [, opt_or_value ...]])

opt_or_value:
	  DESCENDING
	  | START_ID &#39;,&#39; scalar_exp
	  | END_ID &#39;,&#39; scalar_exp
	  | SCORE_LIMIT &#39;,&#39; scalar_exp
	  | OFFBAND column

result_var:
	  IDENTIFIER
	  | NULL
</pre>
    </div>

	<p>
The <strong>column</strong> must refer to a column for which there exists a free text index.
</p>
<p>
The <strong>expr</strong> must be a narrow or wide string expression whose syntax matches the
rules in &#39;XPATH Query Syntax&#39;.
</p>
<p>
The <strong>result_var</strong> variable is a query variable which, if present, will be
successively bound to each element of the node set selected by the XPATH
expression.  if the value is not a node set and is true, the variable will
be once bound to this value.  The scope of the variable is the containing
select and its value is a scalar or an XML entity.
The <strong>result_var</strong> can be not an identifier but a NULL keyword to explicitly indicate that no query variable is required.
</p>
	<p>
The <strong>START_ID</strong> is the first allowed document ID to be selected by the
expression in its traversal order, e.g. least or equal for ascending and
greatest or equal for descending.
</p>
	<p>
<strong>END_ID</strong> is the last allowed id in the traversal order.  For descending order
the START_ID must be &gt;= END_ID for hits to be able to exist.  For ascending
order the START_ID must be &lt;= END_ID for hits to be able to exist.
</p>
	<p>
<strong>DESCENDING</strong> specifies that the search will produce the
hit with the greatest ID first, as defined by integer or composite collation.  This
has nothing to do with a possible ORDER BY of the
enclosing statement.  Even if there is an ORDER BY in the enclosing
statement the DESCENDING keyword of xcontains has an effect in the
interpretation of the STRT_ID and END_ID xcontains options.
</p>
	<p>
<strong>RANGES</strong> specifies that the query variable following the RANGES keyword will
be bound to the word position ranges of the hits of the expression inside
the document.  The variable is in scope inside the enclosing SELECT
statement.
</p>
	<p>
<strong>SCORE_LIMIT</strong> specifies a minimum score that hits must have or exceed  to be
considered matches of the predicate.
</p>
	<p>
<strong>OFFBAND</strong> specifies that the following column will be retrieved from the free
text index instead of the actual table.  For this to be possible the column
must have been declared as offband with the CLUSTERED WITH option of the
<a href="creatingtxtidxs.html#createtxtidxstmt">CREATE TEXT INDEX</a> statement.
</p>

	<p>
If the select statement containing
the xcontains predicate does not specify an exact match of the primary key of
the table having the xcontains predicate, then the contains predicate will be
the &#39;driving&#39; condition, meaning that rows come in ascending or descending
order of the free text document ID.  If there is a full equality match of the primary
key of the table, this will be the driving predicate and xcontains will only be used
to check if the text expression matches the single row identified by the full match
of the primary key.
</p>
	<p>
The xcontains predicate may not appear outside of a select statement and may
only reference a column for which a free text index has been declared.  The
first argument must be a column for which there is such an index.  The text
expression may be variable and computed, although it must be constant during
the evaluation of the select statement containing it.</p>
<p>
The xcontains predicate must be a part of the top level AND of the WHERE
clause of the containing select.  It may not for example be a term of an OR
predicate in the select but can be AND&#39;ed with an OR expression.
</p>

<a name="" />
    <div class="example">
<div class="exampletitle">Selecting Title Elements Called &#39;Key&#39;</div>
<div>
        <pre class="programlisting">
select xt_file from xml_text2 where
 xcontains (xt_text, &#39;//title = &quot;Key&quot;&#39;);
</pre>
      </div>
<p>
The query retrieves the <span class="computeroutput">xt_file</span> for rows
whose <span class="computeroutput">xt_text</span> is an XML document containing
&#39;Key&#39; as the text value of a title element.
</p>
<p>If not all values in <span class="computeroutput">xt_text</span> are valid XMLs then
&#39;__quiet&#39; option can be useful to disable error signalling. It is unusual to get
an incorrect XML stored in a column that has free text XML index because
both on insert and on update the text is parsed by an free text indexing routine,
but the error is possible if e.g. a non-standalone document is stored and
an important external entity was available at indexing time but disappeared later.
Thus a modified example might be better for a column with non-standalone documents;
<span class="computeroutput">
select xt_file from xml_text2 where
 xcontains (xt_text, &#39;[__quiet] //title = &quot;Key&quot;&#39;);
</span>
</p>
<div class="exampletitle">Selecting Title Element that Contains a Specified Text</div>
<div>
        <pre class="programlisting">
select n from xml_text2 where
 xcontains(xt_text,
 &#39;//title[. = &quot;AS Declaration - Column Aliasing&quot;]&#39;,0,n);
</pre>
      </div>
<p>
The query retrieves each title element from each row of
<span class="computeroutput">xml_text2</span> where the
<span class="computeroutput">xt_text</span> contains title elements with the text
value &quot;AS Declaration - Column Aliasing.&quot;
</p>
<div class="note">
        <div class="notetitle">Note</div> <p>The equality test is case- and
whitespace-sensitive, as normal in XPath.  The free text index is used
for the search but the final test is done according to XPath
rules.</p>
</div>
</div>

<div class="tip">
      <div class="tiptitle">See Also:</div>
<p>The <a href="queryingftcols.html#containspredicate">CONTAINS</a> Predicate.</p>
    </div>
<br />


<a name="textcontainsxpath" />
    <h3>13.4.7. text-contains XPath Predicate</h3>

<div>
      <pre class="programlisting">
text-contains (node-set, text-expression)
</pre>
    </div>
	<p>
This XPath predicate is true if any of the nodes in
<span class="computeroutput">node-set</span> have text values matching the
<span class="computeroutput">text-expression</span>.  The
<span class="computeroutput">text-expression</span> should be a constant string
whose syntax corresponds to the top production of the free text syntax
for patterns in <span class="computeroutput">contains()</span>. The string also may
not consist exclusively of spaces or noise words.
</p>
<div class="tip">
      <div class="tiptitle">See Also:</div>
<p>&quot;Noise Words&quot; in the <a href="htmlconductorbar.html#freetext">Free Text Search
chapter</a>.</p>
    </div>

<a name="" />
    <div class="example">
<div class="exampletitle">Selecting All Titles About Aliasing</div>
<div>
        <pre class="programlisting">
select n from xml_text2 where
  xcontains (xt_text,
  &#39;//title[text-contains (., &quot;Aliasing&quot;)]&#39;, 0, n);
</pre>
      </div>
</div>

	<p>
This selects all title elements that contain the word
&quot;Aliasing&quot; using free text match rules: case insensitive and
whole word.
</p>

<a name="" />
    <div class="example">
<div class="exampletitle">Select All Trees with Elements Containing &quot;sql reference&quot;</div>
<div>
        <pre class="programlisting">
select n from xml_text2 where
  xcontains (xt_text,
  &#39;//*[text-contains (., &#39;&#39;&quot;sql reference&quot;&#39;&#39;)]&#39;,
  0, n);
</pre>
      </div>
</div>

	<p>
This selects all elements whose text value contains the phrase
&quot;sql reference&quot;.  Free text matching rules apply.  This
produces all nodes in document order for all documents which contains
the phrase, starting with the document node and following downward
including all paths to the innermost element(s) whose text contains
the phrase.
</p>
<br />


<a name="xmlfreetextrules" />
    <h3>13.4.8. XML Free Text Indexing Rules</h3>

	<p>
XML documents are inserted into the free text index as follows:
</p>
<ul>
      <li>The process works on the parsed XML tree; therefore character
and local entity references are expanded.</li>
      <li>Whole words of text content, bounded by delimiters used for
free text, are each assigned an ordinal number.  Noise words defined
in the noise.txt file used by free text indexing are not
counted.</li>
      <li>Attribute names and values are not indexed.</li>
      <li>Element start and end tags are indexed using the expanded
names - that is, prefixed with the namespace URI + &#39;:&#39;.</li>
      <li>An element start tag&#39;s ordinal number is one less than
the ordinal number of the first whole word in the text value.</li>
      <li>A close tag&#39;s ordinal number is one greater than that of
the last word in the text value.</li>
    </ul>

	<p>
From these rules follows that:
</p>

<div>
      <pre class="programlisting">
&lt;html&gt;
  &lt;body&gt;
   &lt;title&gt;Title of Document&lt;/title&gt;
   &lt;p&gt;Some &lt;b&gt;bold&lt;/b&gt; text &lt;/p&gt;
  &lt;/body&gt;
&lt;/html&gt;
</pre>
    </div>

	<p>
will be indexed as follows:
</p>

<div>
      <pre class="programlisting">
&lt;html&gt;		0
&lt;body&gt;		0
&lt;title&gt;		0
Title		1
of		- no number, noise word
Document		2
&lt;/title&gt;		3
   &lt;p&gt;		3
Some		4
 &lt;b&gt;		4
bold		5
&lt;/b&gt;		6
 text		6
&lt;/p&gt;		6
  &lt;/body&gt;		6
&lt;/html&gt;		6
</pre>
    </div>

	<p>
As a result, the phrase &quot;some bold text&quot; is the string value
of the &lt;p&gt; tag and will match the free text expression
&quot;some bold text&quot; even though there is mark-up in it.
Conversely, the phrase &quot;Document some bold&quot; does not match.
Words will not considered adjacent if there is a mix of opening and
closing tags.  They will only be considered adjacent if there are
solely one or more either opening or closing tags between them.  This
can be circumvented by using the <span class="computeroutput">NEAR</span> connective
instead of the phrase construct.
</p>
	<p>
A free text condition will only be true of an element if all the words
needed to satisfy the condition are part of the element&#39;s string
value.  This string value includes text children of
descendants.</p>
<br />

	
		<a name="xmlencoding" />
    <h3>13.4.9. XML Processing &amp; Free Text Encoding Issues</h3>
		<p>
XML document may be written in a variety of encodings, and it may cause errors
if an incorrect encoding is used for reading a document.  Most common errors
can easily be eliminated by writing proper XML prologs in documents, but this
is not always possible, e.g. if documents are composed by third-party
applications.  Virtuoso provides various tools to support different types of
encodings and to specify encodings to use if a given document has no XML prolog.
</p>
		
			<a name="encodingsvscharsets" />
    <h4>13.4.9.1. Encodings: The Difference Between Encodings &amp; Character Sets</h4>
			<p>
Not all documents may be converted to Unicode by using simple character sets.
Some of them are stored in so-called &quot;multibyte&quot; encodings.
It means that every letter (or ideograph) is represented as a sequence of
one or more bytes, not by exactly one byte.  The conversion from such representation
to Unicode and back is usually significantly slower than simple transformation
via character sets, so these representations are supported by data import operations only, but not by internal RDBMS
routines.</p>
			<p>
The Virtuoso Server &quot;knows&quot; some number of built-in encodings, such as UTF-8,
UTF-16BE and UTF-16LE.  It can load additional encoding descriptions from
a &quot;UCM&quot; file, and can automatically create a new encoding from a known charset with
the same name.  See the <a href="queryingxmldata.html#ucmencodings">UCM Encodings</a> section
for more details.</p>
			<p>
An encoding may be used in the following places:
</p>
<ul>
      <li>The XML/HTML parser to convert source text to Unicode.</li>
      <li>The free-text indexing engine to convert plain-text or XML documents
  to Unicode during the indexing.</li>
      <li>It may be used by the compiler of free-text search expressions
  to convert string constants of the expression to Unicode.</li>
      <li>It may be used to convert string constants of XPath/XQuery expressions.</li>
    </ul>
			<p>You can only use character sets, not encodings as an ODBC connection
character set, as a character set attribute of a column of a database table,
as an output encoding of the built-in XSLT processor (it is for future versions).
UTF-8 is an exception, it is supported in many places where other encodings are not.</p>

  <div class="tip">
      <div class="tiptitle">Security Note:</div>
<p>Two strings converted to Unicode may be identical, but this does not
guarantee that their source strings were equal byte-by-byte due to the
nature of some encodings.  For this reason you should avoid processing
authorization data that are neither in Unicode nor in one of the standard
character sets (single-byte encodings).  Multibyte encodings and user-defined
character sets may be unsafe for such purposes.
  </p>
    </div>
			
				<a name="ucmencodings" />
    <h5>13.4.9.1.1. UCM Encodings</h5>
				<p>
The description of a multibyte encodings is much longer than the description
of a character set.  It is inconvenient to keep such amounts of data inside
the executable.  Virtuoso can load descriptions of required encodings
from external files in UCM format.  Every UCM file describes one encoding.</p>
				<p>
Virtuoso loads UCM files at system initialization.  The list of UCM files
is kept in the <a href="databaseadmsrv.html#VIRTINI">Virtuoso INI file</a> under a
section called [Ucms].  This section should contain a UcmPath parameter and
one or more parameters with names Ucm1, Ucm2, Ucm3 and so on (up to Ucm99).</p>
				<p>
The UcmPath parameter specifies the directory where UCM files are located,
and every UcmNN parameter specifies the name of a UCM file to load and
a list of names that the encoding can be identified by the
&lt;?xml ... encoding=&quot;...&quot; ?&gt; XML preamble.  A vertical bar character is
used to delimit names in the list.</p>

<a name="ucmdefininifile" />
    <div class="example">
<div class="exampletitle">Sample [Ucms] Section</div>
<div>
        <pre class="programlisting">
[Ucms]
UcmPath = /usr/local/javalib/ucm
Ucm1 = java-Cp933-1.3-P.ucm,Cp933
Ucm2 = java-Cp949-1.3-P.ucm,Cp949|Korean
</pre>
      </div>

<p>This section describes two UCM files located in /usr/local/javalib/ucm
directory: data from java-Cp933-1.3-P.ucm will be used for documents in the
&#39;Cp933&#39; encoding; data from java-Cp949-1.3-P.ucm will be used for documents
in the &#39;Cp949&#39; encoding and for documents in the &#39;Korean&#39; encoding
(because these two names refers to the same encoding).  </p>
</div>
<div class="note">
      <div class="notetitle">Note:</div>
  <p>The encoding name specified inside the UCM file itself is not used.</p>
  </div>
				<p>
The Virtuoso server will log the results of processing each UCM file specified
in the Virtuoso INI file.  If a UCM file specified is not found or contains
syntax errors, the error is logged, otherwise only the type and name(s) of
the encoding are logged.</p>
<div class="note">
      <div class="notetitle">Note:</div>
<p>If the virtuoso.ini contains a misspelled name of a parameter or section,
the parameter (or a whole section) is ignored without being reported as an error.
It is always wise to verify that the log contains a record about the encoding(s)
you load.</p>
    </div>

<div class="tip">
      <div class="tiptitle">See Also:</div>
<p>UCM files can be found freely from various sites concerning the &quot;International Components
for Unicode&quot; project, such as: <a href="http://www-124.ibm.com/icu">IBM ICU Homepage</a>
or the <a href="http://oss.software.ibm.com/cvs/icu/charset/data/ucm/">IBM UCM files directory</a>.</p>
<p>The <a href="cinterface.html">C Interface</a> chapter contains
further information regarding user customizable support for new encodings and
languages.  For almost all tasks, it is enough to define a new charset or to
load an additional UCM file, but some special tasks may require writing
additional C code.</p>
</div>
<br />
  <br />
	<a name="encodingattr" />
    <h4>13.4.9.2. The Encoding Attribute</h4>

<p>If an XML document contains the <span class="computeroutput">encoding</span>
parameter in its</p>
<div>
      <pre class="programlisting">&lt;?xml ... ?&gt;</pre>
    </div>
<p>prolog declaration, it will be properly decoded and converted into
UTF-8, so the application code is free from encoding problems.  If the
value of this attribute is the name of a pre-set or user-defined character
set, that character set will be used. Virtuoso will recognize names such as
<span class="computeroutput">UTF-8</span> and <span class="computeroutput">UTF8</span> as multi-character
or special encodings.  Virtuoso recognizes both official names and aliases.</p>

<p>If an encoding is not specified in an XML prolog, or if the document
contains no prolog, the default encoding will be used to read the
document.  If a built-in SQL function invokes the XML parser, it will
have an optional argument <span class="computeroutput">parser_mode</span> to
specify whether source text should be parsed as strict XML or as HTML.
If the source text is 8-bit, then UTF-8 will be used as the default
encoding for &quot;XML mode&quot;, and ISO-8859-1 (Latin-1) will be the
default for &quot;HTML mode&quot;.  If the source text is of some
wide-character type, Unicode is the default.  To make another encoding
the default, you may specify its official or alias name as the
<span class="computeroutput">content_encoding</span> argument of a built-in
function you call.</p>
  <br />
		
			<a name="encodingxpathexp" />
    <h4>13.4.9.3. Encoding in XPath Expressions</h4>
			<p>
Sometimes applications should perform XPath queries using the encoding specified
by a client.  For example, a search engine may ask a user to specify a pattern to
search and use the browser&#39;s current encoding as a hint to parse the pattern
properly.  In such cases you may wish to use the <span class="computeroutput">__enc</span>
XPath option to specify the encoding used for the rest of XPath string:</p>

  <a name="" />
    <div class="example">
      <div class="exampletitle">Specifying Search Encodings in XPath</div>
    <p>Create a sample table and store an XML with non-Latin-1 characters</p>
    <div>
        <pre class="programlisting">
create table ENC_XML_SAMPLE (
  ID integer,
  XPER long varchar,
  primary key (ID)
);

insert into ENC_XML_SAMPLE (ID, XPER)
values (
  1,
  xml_persistent (&#39;&lt;?xml version=&quot;1.0&quot; encoding=&quot;WINDOWS-1251&quot; ?&gt;
    &lt;book&gt;&lt;cit&gt;Îí äîáàâèë
      êàðòîøêè,
    ïîñîëèë è
    ïîñòàâèë
    àêâàðèóì íà
    îãîíü
    (Ì.Æâàíåöêèé
    )&lt;/cit&gt;&lt;/book&gt;&#39;
  )
);
...
  </pre>
      </div>
<p>Find the IDs of all XML documents whose texts contain a specified
phrase.  Note that there are pairs of single quotes (not double
quotes) around <span class="computeroutput">KOI8-R</span>.  The encoding name should
be in single quotes, but because it is inside a string constant the
quotes must be duplicated.
 </p>
 <div>
        <pre class="programlisting">
select ID from ENC_XML_SAMPLE where
  xcontains (XPER, &#39;[__enc &#39;&#39;KOI8-R&#39;&#39;] //cit[text-contains(.,
  &quot;&#39;&#39;ÐÏÓÔÁ×ÉÌ
    ÁË×ÁÒÉÕÍ
    ÎÁ ÏÇÏÎØ&#39;&#39;&quot;)]&#39;);

</pre>
      </div>
</div>
		<br />
		
			<a name="encodinginfttsp" />
    <h4>13.4.9.4. Encoding in Free Text Search Indexes &amp; Patterns</h4>
			<p>Like XML applications, free text searching may have encoding problems,
and Virtuoso offers a similar solution for them.</p>
			<p>Both the CREATE TEXT INDEX statement and vt_create_text_index()
      Virtuoso/PL procedure
have an optional argument to specify the encoding of the indexed data.
The specified encoding will be applied to all source text documents
(if the TEXT INDEX was created), or to all XML documents that have no encoding
attribute of the sort &lt;?xml ... encoding=&quot;...&quot; ?&gt;
(if the TEXT XML INDEX was created).</p>
			<p>
The option <span class="computeroutput">__enc</span> may be specified at the
beginning of free text search pattern, even if the pattern is inside
an XPath statement:</p>
  <a name="" />
    <div class="example">
<div class="exampletitle">Specifying an Encoding for Free Text Searching</div>
   <p>
Create a sample table and store a sample of text with non-Latin-1 characters
(assuming that client encoding is Windows-1251)
</p>
  <div>
        <pre class="programlisting">
create table ENC_TEXT_SAMPLE (
  ID integer,
  TEXT long nvarchar,
  primary key (ID)
);

insert into ENC_TEXT_SAMPLE (ID, XPER)
values (
  1,
  &#39;&lt;?xml version=&quot;1.0&quot; encoding=&quot;WINDOWS-1251&quot; ?&gt;
Îí äîáàâèë
    êàðòîøêè,
    ïîñîëèë è
    ïîñòàâèë
    àêâàðèóì
    íà îãîíü
    (Ì.Æâàíåöêèé&#39;)
);
...
</pre>
      </div>
  <p>Find the IDs of all text documents whose texts contain a specified phrase.
</p>

  <div>
        <pre class="programlisting">
select ID from ENC_SAMPLE where
  contains (TEXT, &#39;[__enc &#39;&#39;KOI8-R&#39;&#39;]
    &quot;ÐÏÓÔÁ×ÉÌ
    ÁË×ÁÒÉÕÍ
    ÎÁ ÏÇÏÎØ&quot;&#39;
  );
</pre>
      </div>
  <p>Encoding may be applied locally to an argument of the text-search predicate.
It may be used if the document contains citations in different encodings or if
the XML document contains non-ASCII characters in names of tags or attributes,
or if the encoding affects character codes of ASCII symbols such as &#39;/&#39; or &#39;[&#39;.
</p>
  <div>
        <pre class="programlisting">
select ID from ENC_XML_SAMPLE where
  xcontains (XPER, &#39;//cit[text-contains(., &quot;[__enc &#39;&#39;KOI8-R&#39;&#39;]
    &#39;&#39;ÐÏÓÔÁ×ÉÌ
    ÁË×ÁÒÉÕÍ ÎÁ
    ÏÇÏÎØ&#39;&#39;&quot;)]&#39;
  );
</pre>
      </div>
</div>
<div class="note">
      <div class="notetitle">Note:</div>
      <p>
You may have free-text a expression written as a literal constant:
e.g. if the argument of text-contains XPath function is a literal constant.
Be careful to not declare the __enc twice, once in the beginning of the whole
XPath expression and then again in the beginning of the free-text expression
constant, because words of the text expression will thus be converted twice.</p>
    </div>
  <br />
		<br />
	<table border="0" width="90%" id="navbarbottom">
    <tr>
        <td align="left" width="33%">
          <a href="xmlservices.html" title="Virtuoso XML Services">Previous</a>
          <br />Virtuoso XML Services</td>
     <td align="center" width="34%">
          <a href="webandxml.html">Chapter Contents</a>
     </td>
        <td align="right" width="33%">
          <a href="updategrams.html" title="Using UpdateGrams to Modify Data">Next</a>
          <br />Using UpdateGrams to Modify Data</td>
    </tr>
    </table>
  </div>
  <div id="footer">
    <div>Copyright© 1999 - 2009 OpenLink Software All rights reserved.</div>
   <div id="validation">
    <a href="http://validator.w3.org/check/referer">
        <img src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0!" height="31" width="88" />
    </a>
    <a href="http://jigsaw.w3.org/css-validator/">
        <img src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!" height="31" width="88" />
    </a>
   </div>
  </div>
 </body>
</html>