Sophie

Sophie

distrib > Fedora > 13 > i386 > media > os > by-pkgid > ba2dfbbbbde4620e3579a4df2bdd369e > files > 33

uriparser-devel-0.7.5-2.fc12.i686.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
<title>uriparser: Main Page</title>
<link href="tabs.css" rel="stylesheet" type="text/css">
<link href="doxygen.css" rel="stylesheet" type="text/css">
</head><body>
<!-- Generated by Doxygen 1.5.9 -->
<div class="navigation" id="top">
  <div class="tabs">
    <ul>
      <li class="current"><a href="index.html"><span>Main&nbsp;Page</span></a></li>
      <li><a href="annotated.html"><span>Data&nbsp;Structures</span></a></li>
      <li><a href="files.html"><span>Files</span></a></li>
    </ul>
  </div>
</div>
<div class="contents">
<h1>uriparser Documentation</h1>
<p>
<h3 align="center">0.7.5 </h3><h2><a class="anchor" name="SEC_TOC">
Table of Contents</a></h2>
<ul>
<li><a href="#intro">Introduction</a></li><li>Algorithms and Examples<ul>
<li><a href="#parsing">Parsing URIs</a> (from string to object)</li><li><a href="#recomposition">Recomposing URIs</a> (from object back to string)</li><li><a href="#resolution">Resolving References</a></li><li><a href="#shortening">Creating References</a></li><li><a href="#filenames">Filenames and URIs</a></li><li><a href="#normalization">Normalizing URIs</a></li><li><a href="#querystrings">Working with query strings</a></li></ul>
</li><li><a href="#chartypes">Ansi and Unicode</a></li><li><a href="#autoconf">Autoconf Check</a></li></ul>
<h2><a class="anchor" name="intro">
Introduction</a></h2>
Welcome to the short uriparser integration tutorial. It is intended to answer upcoming questions and to shed light where function prototypes alone are not enough. Please drop me a line if you need further assistance and I will see what I can do for you. Good luck with uriparser!<h3><a class="anchor" name="parsing">
Parsing URIs (from string to object)</a></h3>
Parsing a URI with uriparser looks like this:<p>
<div class="fragment"><pre class="fragment">        <a class="code" href="structUriParserStateStructA.html">UriParserStateA</a> state;
        <a class="code" href="structUriUriStructA.html">UriUriA</a> uri;

        state.<a class="code" href="structUriParserStateStructA.html#9a0cd66f9e53ac5d0ac2b2b8b1dd01d4">uri</a> = &amp;uri;
        <span class="keywordflow">if</span> (<a class="code" href="Uri_8h.html#b16a86cc9956c40da5a97007c0fd4802">uriParseUriA</a>(&amp;state, <span class="stringliteral">"file:///home/user/song.mp3"</span>) != URI_SUCCESS) {
                <span class="comment">/* Failure */</span>
                <a class="code" href="Uri_8h.html#a5f2edbe34fcde379ccca4fe3fadcd48">uriFreeUriMembersA</a>(&amp;uri);
                ...
        }
        ...
        <a class="code" href="Uri_8h.html#a5f2edbe34fcde379ccca4fe3fadcd48">uriFreeUriMembersA</a>(&amp;uri);
</pre></div><p>
While the URI object (<a class="el" href="Uri_8h.html#924841de923bfc02670dfee96cd25e62">UriUriA</a>) holds information about the recogized parts of the given URI string, the parser state object (<a class="el" href="Uri_8h.html#eca8bf10221333215c74cfba8cb9f07e">UriParserStateA</a>) keeps error code and position. This information does not belong to the URI itself, which is why there are two seperate objects.<p>
You can reuse parser state objects for parsing several URIs like this:<p>
<div class="fragment"><pre class="fragment">        <a class="code" href="structUriParserStateStructA.html">UriParserStateA</a> state;
        <a class="code" href="structUriUriStructA.html">UriUriA</a> uriOne;
        <a class="code" href="structUriUriStructA.html">UriUriA</a> uriTwo;

        state.<a class="code" href="structUriParserStateStructA.html#9a0cd66f9e53ac5d0ac2b2b8b1dd01d4">uri</a> = &amp;uriOne;
        <span class="keywordflow">if</span> (<a class="code" href="Uri_8h.html#b16a86cc9956c40da5a97007c0fd4802">uriParseUriA</a>(&amp;state, <span class="stringliteral">"file:///home/user/one"</span>) != URI_SUCCESS) {
                <span class="comment">/* Failure */</span>
                <a class="code" href="Uri_8h.html#a5f2edbe34fcde379ccca4fe3fadcd48">uriFreeUriMembersA</a>(&amp;uriOne);
                ...
        }
        ...
        state.<a class="code" href="structUriParserStateStructA.html#9a0cd66f9e53ac5d0ac2b2b8b1dd01d4">uri</a> = &amp;uriTwo;
        <span class="keywordflow">if</span> (<a class="code" href="Uri_8h.html#b16a86cc9956c40da5a97007c0fd4802">uriParseUriA</a>(&amp;state, <span class="stringliteral">"file:///home/user/two"</span>) != URI_SUCCESS) {
                <span class="comment">/* Failure */</span>
                <a class="code" href="Uri_8h.html#a5f2edbe34fcde379ccca4fe3fadcd48">uriFreeUriMembersA</a>(&amp;uriOne);
                <a class="code" href="Uri_8h.html#a5f2edbe34fcde379ccca4fe3fadcd48">uriFreeUriMembersA</a>(&amp;uriTwo);
                ...
        }
        ...
        <a class="code" href="Uri_8h.html#a5f2edbe34fcde379ccca4fe3fadcd48">uriFreeUriMembersA</a>(&amp;uriOne);
        <a class="code" href="Uri_8h.html#a5f2edbe34fcde379ccca4fe3fadcd48">uriFreeUriMembersA</a>(&amp;uriTwo);
</pre></div><h3><a class="anchor" name="recomposition">
Recomposing URIs (from object back to string)</a></h3>
According to <a href="http://tools.ietf.org/html/rfc3986#section-5.3" target="_blank">RFC 3986</a> glueing parts of a URI together to form a string is called recomposition. Before we can recompose a URI object we have to know how much space the resulting string will take:<p>
<div class="fragment"><pre class="fragment">        <a class="code" href="structUriUriStructA.html">UriUriA</a> uri;
        <span class="keywordtype">char</span> * uriString;
        <span class="keywordtype">int</span> charsRequired;
        ...
        <span class="keywordflow">if</span> (<a class="code" href="Uri_8h.html#e57c60e8166c44163af6d0434329fe73">uriToStringCharsRequiredA</a>(&amp;uri, &amp;charsRequired) != URI_SUCCESS) {
                <span class="comment">/* Failure */</span>
                ...
        }
        charsRequired++;
</pre></div><p>
Now we can tell <a class="el" href="Uri_8h.html#4a9539cfdd85866a1ef6916c631f8bc2">uriToStringA()</a> to write the string to a given buffer:<p>
<div class="fragment"><pre class="fragment">        uriString = malloc(charsRequired * <span class="keyword">sizeof</span>(<span class="keywordtype">char</span>));
        <span class="keywordflow">if</span> (uriString == NULL) {
                <span class="comment">/* Failure */</span>
                ...
        }
        <span class="keywordflow">if</span> (<a class="code" href="Uri_8h.html#4a9539cfdd85866a1ef6916c631f8bc2">uriToStringA</a>(uriString, &amp;uri, charsRequired, NULL) != URI_SUCCESS) {
                <span class="comment">/* Failure */</span>
                ...
        }
</pre></div><p>
<dl class="remark" compact><dt><b>Remarks:</b></dt><dd>Incrementing <code>charsRequired</code> by 1 is required since <a class="el" href="Uri_8h.html#e57c60e8166c44163af6d0434329fe73">uriToStringCharsRequiredA()</a> returns the length of the string as strlen() does, but <a class="el" href="Uri_8h.html#4a9539cfdd85866a1ef6916c631f8bc2">uriToStringA()</a> works with the number of maximum characters to be written <b>including</b> the zero-terminator.</dd></dl>
<h3><a class="anchor" name="resolution">
Resolving References</a></h3>
<a href="http://tools.ietf.org/html/rfc3986#section-5" target="_blank">Reference Resolution</a> is the process of turning a (relative) URI reference into an absolute URI by applying a base URI to it. In code it looks like this:<p>
<div class="fragment"><pre class="fragment">        <a class="code" href="structUriUriStructA.html">UriUriA</a> absoluteDest;
        <a class="code" href="structUriUriStructA.html">UriUriA</a> relativeSource;
        <a class="code" href="structUriUriStructA.html">UriUriA</a> absoluteBase;
        ...
        <span class="comment">/* relativeSource holds "../TWO" now */</span>
        <span class="comment">/* absoluteBase holds "file:///one/two/three" now */</span>
        <span class="keywordflow">if</span> (<a class="code" href="Uri_8h.html#4bcf2a1fa28bb86443f2b61561e692a7">uriAddBaseUriA</a>(&amp;absoluteDest, &amp;relativeSource, &amp;absoluteBase) != URI_SUCCESS) {
                <span class="comment">/* Failure */</span>
                <a class="code" href="Uri_8h.html#a5f2edbe34fcde379ccca4fe3fadcd48">uriFreeUriMembersA</a>(&amp;absoluteDest);
                ...
        }
        <span class="comment">/* absoluteDest holds "file:///one/TWO" now */</span>
        ...
        <a class="code" href="Uri_8h.html#a5f2edbe34fcde379ccca4fe3fadcd48">uriFreeUriMembersA</a>(&amp;absoluteDest);
</pre></div><p>
<dl class="remark" compact><dt><b>Remarks:</b></dt><dd><a class="el" href="Uri_8h.html#4bcf2a1fa28bb86443f2b61561e692a7">uriAddBaseUriA()</a> does not normalize the resulting URI. Usually you might want to pass it through <a class="el" href="Uri_8h.html#50176d4c0c0fb7d40e0e4990d0c7d7bf">uriNormalizeSyntaxA()</a> after.</dd></dl>
<h3><a class="anchor" name="shortening">
Creating References</a></h3>
Reference Creation is the inverse process of Reference Resolution: A common base URI is "substracted" from an absolute URI to make a (relative) reference. If the base URI is not common the remaining URI will still be absolute, i.e. will carry a scheme<p>
<div class="fragment"><pre class="fragment">        <a class="code" href="structUriUriStructA.html">UriUriA</a> dest;
        <a class="code" href="structUriUriStructA.html">UriUriA</a> absoluteSource;
        <a class="code" href="structUriUriStructA.html">UriUriA</a> absoluteBase;
        ...
        <span class="comment">/* absoluteSource holds "file:///one/TWO" now */</span>
        <span class="comment">/* absoluteBase holds "file:///one/two/three" now */</span>
        <span class="keywordflow">if</span> (<a class="code" href="Uri_8h.html#4a14ec47aeeadd9ff7a5e381a10234a0">uriRemoveBaseUriA</a>(&amp;dest, &amp;absoluteSource, &amp;absoluteBase, URI_FALSE) != URI_SUCCESS) {
                <span class="comment">/* Failure */</span>
                <a class="code" href="Uri_8h.html#a5f2edbe34fcde379ccca4fe3fadcd48">uriFreeUriMembersA</a>(&amp;dest);
                ...
        }
        <span class="comment">/* dest holds "../TWO" now */</span>
        ...
        <a class="code" href="Uri_8h.html#a5f2edbe34fcde379ccca4fe3fadcd48">uriFreeUriMembersA</a>(&amp;dest);
</pre></div><p>
The fourth parameter is the domain root mode. With <code>URI_FALSE</code> as above this will produce URIs relative to the base URI. With <code>URI_TRUE</code> the resulting URI will be relative to the domain root instead, e.g. "/one/TWO" in this case.<h3><a class="anchor" name="filenames">
Filenames and URIs</a></h3>
Converting filenames to and from URIs works on strings directly, i.e. without creating an URI object.<p>
<div class="fragment"><pre class="fragment">        <span class="keyword">const</span> <span class="keywordtype">char</span> * <span class="keyword">const</span> absFilename = <span class="stringliteral">"E:\\Documents and Settings"</span>;
        <span class="keyword">const</span> <span class="keywordtype">int</span> bytesNeeded = 8 + 3 * strlen(absFilename) + 1;
        <span class="keywordtype">char</span> * absUri = malloc(bytesNeeded * <span class="keyword">sizeof</span>(<span class="keywordtype">char</span>));
        <span class="keywordflow">if</span> (<a class="code" href="Uri_8h.html#4a071c1c4867b49b122dc4fcb1d3021a">uriWindowsFilenameToUriStringA</a>(absFilename, absUri) != URI_SUCCESS) {
                <span class="comment">/* Failure */</span>
                free(absUri);
                ...
        }
        <span class="comment">/* absUri is "file:///E:/Documents%20and%20Settings" now */</span>
        ...
        free(absUri);
</pre></div><p>
Conversion works ..<ul>
<li>for relative or absolute values,</li><li>in both directions (filenames &lt;--&gt; URIs) and</li><li>with Unix and Windows filenames.</li></ul>
<p>
All you have to do is to choose the right function for the task and allocate the required space (in characters) for the target buffer. Let me present you an overview:<p>
<ul>
<li>Filename --&gt; URI<ul>
<li><a class="el" href="Uri_8h.html#b394fe8e5e9b6863e4dfd2ae6b464960">uriUnixFilenameToUriStringA()</a><br>
 Space required: [<b>7</b> +] 3 * len(filename) + 1</li><li><a class="el" href="Uri_8h.html#4a071c1c4867b49b122dc4fcb1d3021a">uriWindowsFilenameToUriStringA()</a><br>
 Space required: [<b>8</b> +] 3 * len(filename) + 1</li></ul>
</li><li>URI --&gt; filename<ul>
<li><a class="el" href="Uri_8h.html#26920c57a9ad92041bc797c29b7bdb92">uriUriStringToUnixFilenameA()</a><br>
 Space required: len(uriString) + 1 [- <b>7]</b></li><li><a class="el" href="Uri_8h.html#b9cd18296649e2443495e26f661e1313">uriUriStringToWindowsFilenameA()</a><br>
 Space required: len(uriString) + 1 [- <b>8]</b></li></ul>
</li></ul>
<h3><a class="anchor" name="normalization">
Normalizing URIs</a></h3>
Sometimes we come accross unnecessarily long URIs like "http<b></b>://example.org/one/two/../../one". The algorithm we can use to shorten this URI down to "http<b></b>://example.org/one" is called <a href="http://tools.ietf.org/html/rfc3986#section-6.2.2" target="_blank">Syntax-Based Normalization</a>. Note that normalizing a URI does more than just "stripping dot segments". Please have a look at <a href="http://tools.ietf.org/html/rfc3986#section-6.2.2" target="_blank">Section 6.2.2 of RFC 3986</a> for the full description.<p>
As we asked <a class="el" href="Uri_8h.html#e57c60e8166c44163af6d0434329fe73">uriToStringCharsRequiredA()</a> for the required space when converting a URI object back to a sring, we can ask <a class="el" href="Uri_8h.html#8de8c90c3655e547cddcd50c663587fb">uriNormalizeSyntaxMaskRequiredA()</a> for the parts of a URI that require normalization and then pass this normalization mask to <a class="el" href="Uri_8h.html#bfcb66bec6bb0066fb086174692d5710">uriNormalizeSyntaxExA()</a>:<p>
<div class="fragment"><pre class="fragment">        <span class="keyword">const</span> <span class="keywordtype">unsigned</span> <span class="keywordtype">int</span> dirtyParts = <a class="code" href="Uri_8h.html#8de8c90c3655e547cddcd50c663587fb">uriNormalizeSyntaxMaskRequiredA</a>(&amp;uri);
        <span class="keywordflow">if</span> (<a class="code" href="Uri_8h.html#bfcb66bec6bb0066fb086174692d5710">uriNormalizeSyntaxExA</a>(&amp;uri, dirtyParts) != URI_SUCCESS) {
                <span class="comment">/* Failure */</span>
                ...
        }
</pre></div><p>
If you don't want to normalize all parts of the URI you can pass a custom mask as well:<p>
<div class="fragment"><pre class="fragment">        <span class="keyword">const</span> <span class="keywordtype">unsigned</span> <span class="keywordtype">int</span> normMask = <a class="code" href="UriBase_8h.html#c0a876ae3fbf22bdfa8d3e4a24838400ed80a4777751564b865f26940169fc23">URI_NORMALIZE_SCHEME</a> | <a class="code" href="UriBase_8h.html#c0a876ae3fbf22bdfa8d3e4a24838400b5268c99bba09a7624fc98f0780dc618">URI_NORMALIZE_USER_INFO</a>;
        <span class="keywordflow">if</span> (<a class="code" href="Uri_8h.html#bfcb66bec6bb0066fb086174692d5710">uriNormalizeSyntaxExA</a>(&amp;uri, normMask) != URI_SUCCESS) {
                <span class="comment">/* Failure */</span>
                ...
        }
</pre></div><p>
Please see <a class="el" href="UriBase_8h.html#c0a876ae3fbf22bdfa8d3e4a24838400">UriNormalizationMaskEnum</a> for the complete set of flags.<p>
On the other hand calling plain <a class="el" href="Uri_8h.html#50176d4c0c0fb7d40e0e4990d0c7d7bf">uriNormalizeSyntaxA()</a> (without the "Ex") saves you thinking about single parts, as it queries <a class="el" href="Uri_8h.html#8de8c90c3655e547cddcd50c663587fb">uriNormalizeSyntaxMaskRequiredA()</a> internally:<p>
<div class="fragment"><pre class="fragment">        <span class="keywordflow">if</span> (<a class="code" href="Uri_8h.html#50176d4c0c0fb7d40e0e4990d0c7d7bf">uriNormalizeSyntaxA</a>(&amp;uri) != URI_SUCCESS) {
                <span class="comment">/* Failure */</span>
                ...
        }
</pre></div><h2><a class="anchor" name="querystrings">
Working with query strings</a></h2>
<a href="http://tools.ietf.org/html/rfc3986" target="_blank">RFC 3986</a> itself does not understand the query part of a URI as a list of key/value pairs. But HTML 2.0 does and defines a media type <em>application/x-www-form-urlencoded</em> in in <a href="http://tools.ietf.org/html/rfc1866#section-8.2.1" target="blank">section 8.2.1</a> of <a href="http://tools.ietf.org/html/rfc1866" target="blank">RFC 1866</a>. uriparser allows you to dissect (or parse) a query string into unescaped key/value pairs and back.<p>
To dissect the query part of a just-parsed URI you could write code like this:<p>
<div class="fragment"><pre class="fragment">        <a class="code" href="structUriUriStructA.html">UriUriA</a> uri;
        <a class="code" href="structUriQueryListStructA.html">UriQueryListA</a> * queryList;
        <span class="keywordtype">int</span> itemCount;
        ...
        <span class="keywordflow">if</span> (<a class="code" href="Uri_8h.html#395240e558a980019bf02cc7518ee524">uriDissectQueryMallocA</a>(&amp;queryList, &amp;itemCount, uri.<a class="code" href="structUriUriStructA.html#cb555eb399898672c7b069da0f444157">query</a>.<a class="code" href="structUriTextRangeStructA.html#6f1e1048b5e74fe6c7a680ff99138f68">first</a>,
                        uri.<a class="code" href="structUriUriStructA.html#cb555eb399898672c7b069da0f444157">query</a>.<a class="code" href="structUriTextRangeStructA.html#86aea0aab8d5ee912c3b260bdd9af67e">afterLast</a>) != URI_SUCCESS) {
                <span class="comment">/* Failure */</span>
                ...
        }
        ...
        <a class="code" href="Uri_8h.html#692a7f1c4e37180050257025473b2f7a">uriFreeQueryListA</a>(queryList);
</pre></div><p>
<dl class="remark" compact><dt><b>Remarks:</b></dt><dd><ul>
<li><code>NULL</code> in the <code>value</code> member means there was <b>no</b> '=' in the item text as with "?abc&amp;def".</li><li>An empty string in the <code>value</code> member means there was '=' in the item as with "?abc=&amp;def".</li></ul>
</dd></dl>
To compose a query string from a query list you could write code like this:<p>
<div class="fragment"><pre class="fragment">        <span class="keywordtype">int</span> charsRequired;
        <span class="keywordtype">int</span> charsWritten;
        <span class="keywordtype">char</span> * queryString;
        ...
        <span class="keywordflow">if</span> (<a class="code" href="Uri_8h.html#7fd4395b984b9d9519ee2d2aba186613">uriComposeQueryCharsRequiredA</a>(queryList, &amp;charsRequired) != URI_SUCCESS) {
                <span class="comment">/* Failure */</span>
                ...
        }
        queryString = malloc((charsRequired + 1) * <span class="keyword">sizeof</span>(<span class="keywordtype">char</span>));
        <span class="keywordflow">if</span> (queryString == NULL) {
                <span class="comment">/* Failure */</span>
                ...
        }
        <span class="keywordflow">if</span> (<a class="code" href="Uri_8h.html#5d79be075b94fd3292844feb107e0b75">uriComposeQueryA</a>(queryString, queryList, charsRequired + 1, &amp;charsWritten) != URI_SUCCESS) {
                <span class="comment">/* Failure */</span>
                ...
        }
        ...
        free(queryString);
</pre></div><h2><a class="anchor" name="chartypes">
Ansi and Unicode</a></h2>
uriparser comes with two versions of every structure and function: one handling Ansi text (char *) and one working with Unicode text (wchar_t *), for instance<ul>
<li><a class="el" href="Uri_8h.html#b16a86cc9956c40da5a97007c0fd4802">uriParseUriA()</a> for Ansi and</li><li>uriParseUriW() for Unicode.</li></ul>
<p>
This tutorial only shows the usage of the Ansi editions but their Unicode counterparts work in the very same way.<h2><a class="anchor" name="autoconf">
Autoconf Check</a></h2>
You can use the code below to make <code>./configure</code> test for presence of uriparser 0.6.4 or later.<p>
<div class="fragment"><pre class="fragment">URIPARSER_MISSING=<span class="stringliteral">"Please install uriparser 0.6.4 or later.
   On a Debian-based system enter 'sudo apt-get install liburiparser-dev'."</span>
AC_CHECK_LIB(uriparser, uriParseUriA,, AC_MSG_ERROR(${URIPARSER_MISSING}))
AC_CHECK_HEADER(<a class="el" href="Uri_8h.html">uriparser/Uri.h</a>,, AC_MSG_ERROR(${URIPARSER_MISSING}))
<b></b>
URIPARSER_TOO_OLD=<span class="stringliteral">"uriparser 0.6.4 or later is required, your copy is too old."</span>
AC_COMPILE_IFELSE([
<span class="preprocessor">#include &lt;<a class="el" href="Uri_8h.html">uriparser/Uri.h</a>&gt;
#if (defined(URI_VER_MAJOR) &amp;&amp; defined(URI_VER_MINOR) &amp;&amp; defined(URI_VER_RELEASE) \<b></b>
&amp;&amp; ((URI_VER_MAJOR &gt; 0) \<b></b>
|| ((URI_VER_MAJOR == 0) &amp;&amp; (URI_VER_MINOR &gt; 6)) \<b></b>
|| ((URI_VER_MAJOR == 0) &amp;&amp; (URI_VER_MINOR == 6) &amp;&amp; (URI_VER_RELEASE &gt;= 4)) \<b></b>
))</span>
<span class="comment"><b></b>/<b></b>* FINE *<b></b>/</span>
<span class="preprocessor">#else
# error uriparser not recent enough
#endif</span>
],,AC_MSG_ERROR(${URIPARSER_TOO_OLD}))</pre></div> </div>
<hr size="1"><address style="text-align: right;"><small>Generated on Tue Jul 28 22:03:18 2009 for uriparser by&nbsp;
<a href="http://www.doxygen.org/index.html">
<img src="doxygen.png" alt="doxygen" align="middle" border="0"></a> 1.5.9 </small></address>
</body>
</html>