Sophie

Sophie

distrib > Mandriva > 2007.1 > i586 > by-pkgid > 09cecd41fd5510f1b4c6358078b3faaf > files > 171

haskell-HXT-7.1-2mdv2007.1.i586.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!--Rendered using the Haskell Html Library v0.2-->
<HTML
><HEAD
><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8"
><TITLE
>Text.XML.HXT.Arrow.ReadDocument</TITLE
><LINK HREF="haddock.css" REL="stylesheet" TYPE="text/css"
><SCRIPT SRC="haddock.js" TYPE="text/javascript"
></SCRIPT
></HEAD
><BODY
><TABLE CLASS="vanilla" CELLSPACING="0" CELLPADDING="0"
><TR
><TD CLASS="topbar"
><TABLE CLASS="vanilla" CELLSPACING="0" CELLPADDING="0"
><TR
><TD
><IMG SRC="haskell_icon.gif" WIDTH="16" HEIGHT="16" ALT=" "
></TD
><TD CLASS="title"
>hxt-7.1: </TD
><TD CLASS="topbut"
><A HREF="index.html"
>Contents</A
></TD
><TD CLASS="topbut"
><A HREF="doc-index.html"
>Index</A
></TD
></TR
></TABLE
></TD
></TR
><TR
><TD CLASS="modulebar"
><TABLE CLASS="vanilla" CELLSPACING="0" CELLPADDING="0"
><TR
><TD
><FONT SIZE="6"
>Text.XML.HXT.Arrow.ReadDocument</FONT
></TD
><TD ALIGN="right"
><TABLE CLASS="narrow" CELLSPACING="0" CELLPADDING="0"
><TR
><TD CLASS="infohead"
>Portability</TD
><TD CLASS="infoval"
>portable</TD
></TR
><TR
><TD CLASS="infohead"
>Stability</TD
><TD CLASS="infoval"
>experimental</TD
></TR
><TR
><TD CLASS="infohead"
>Maintainer</TD
><TD CLASS="infoval"
>Uwe Schmidt (uwe\@fh-wedel.de)</TD
></TR
></TABLE
></TD
></TR
></TABLE
></TD
></TR
><TR
><TD CLASS="s15"
></TD
></TR
><TR
><TD CLASS="section1"
>Description</TD
></TR
><TR
><TD CLASS="doc"
><P
>Version    : $Id: ReadDocument.hs,v 1.10 2006<EM
>11</EM
>24 07:41:37 hxml Exp $
</P
><P
>Compound arrows for reading an XML/HTML document or an XML/HTML string
</P
></TD
></TR
><TR
><TD CLASS="s15"
></TD
></TR
><TR
><TD CLASS="section1"
>Synopsis</TD
></TR
><TR
><TD CLASS="s15"
></TD
></TR
><TR
><TD CLASS="body"
><TABLE CLASS="vanilla" CELLSPACING="0" CELLPADDING="0"
><TR
><TD CLASS="decl"
><A HREF="#v%3AreadDocument"
>readDocument</A
> :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes"
>Attributes</A
> -&gt; String -&gt; <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow"
>IOStateArrow</A
> s b <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree"
>XmlTree</A
></TD
></TR
><TR
><TD CLASS="s8"
></TD
></TR
><TR
><TD CLASS="decl"
><A HREF="#v%3AreadFromDocument"
>readFromDocument</A
> :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes"
>Attributes</A
> -&gt; <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow"
>IOStateArrow</A
> s String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree"
>XmlTree</A
></TD
></TR
><TR
><TD CLASS="s8"
></TD
></TR
><TR
><TD CLASS="decl"
><A HREF="#v%3AreadString"
>readString</A
> :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes"
>Attributes</A
> -&gt; String -&gt; <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow"
>IOStateArrow</A
> s b <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree"
>XmlTree</A
></TD
></TR
><TR
><TD CLASS="s8"
></TD
></TR
><TR
><TD CLASS="decl"
><A HREF="#v%3AreadFromString"
>readFromString</A
> :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes"
>Attributes</A
> -&gt; <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow"
>IOStateArrow</A
> s String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree"
>XmlTree</A
></TD
></TR
><TR
><TD CLASS="s8"
></TD
></TR
><TR
><TD CLASS="decl"
><A HREF="#v%3Ahread"
>hread</A
> :: <A HREF="Text-XML-HXT-Arrow-XmlArrow.html#t%3AArrowXml"
>ArrowXml</A
> a =&gt; a String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree"
>XmlTree</A
></TD
></TR
><TR
><TD CLASS="s8"
></TD
></TR
><TR
><TD CLASS="decl"
><A HREF="#v%3Axread"
>xread</A
> :: <A HREF="Text-XML-HXT-Arrow-XmlArrow.html#t%3AArrowXml"
>ArrowXml</A
> a =&gt; a String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree"
>XmlTree</A
></TD
></TR
></TABLE
></TD
></TR
><TR
><TD CLASS="s15"
></TD
></TR
><TR
><TD CLASS="section1"
>Documentation</TD
></TR
><TR
><TD CLASS="s15"
></TD
></TR
><TR
><TD CLASS="decl"
><A NAME="v%3AreadDocument"
></A
><B
>readDocument</B
> :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes"
>Attributes</A
> -&gt; String -&gt; <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow"
>IOStateArrow</A
> s b <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree"
>XmlTree</A
></TD
></TR
><TR
><TD CLASS="doc"
><P
>the main document input filter
</P
><P
>this filter can be configured by an option list, a value of type <A HREF="Attributes.html"
>Attributes</A
>
</P
><P
>available options:
</P
><UL
><LI
> <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_parse_html"
>a_parse_html</A
></TT
>: use HTML parser, else use XML parser (default)
</LI
><LI
> <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_validate"
>a_validate</A
></TT
> : validate document againsd DTD (default), else skip validation
</LI
><LI
> <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_relax_schema"
>a_relax_schema</A
></TT
> : validate document with Relax NG, the options value is the schema URI
                     this implies using XML parser, no validation against DTD, and canonicalisation
</LI
><LI
> <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_check_namespaces"
>a_check_namespaces</A
></TT
> : check namespaces, else skip namespace processing (default)
</LI
><LI
> <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_canonicalize"
>a_canonicalize</A
></TT
> : canonicalize document (default), else skip canonicalization
</LI
><LI
> <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_preserve_comment"
>a_preserve_comment</A
></TT
> : preserve comments during canonicalization, else remove comments (default)
</LI
><LI
> <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_remove_whitespace"
>a_remove_whitespace</A
></TT
> : remove all whitespace, used for document indentation, else skip this step (default)
</LI
><LI
> <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_indent"
>a_indent</A
></TT
> : indent document by inserting whitespace, else skip this step (default)
</LI
><LI
> <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_issue_warnings"
>a_issue_warnings</A
></TT
> : issue warnings, when parsing HTML (default), else ignore HTML parser warnings
</LI
><LI
> <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_issue_errors"
>a_issue_errors</A
></TT
> : issue all error messages on stderr (default), or ignore all error messages (default)
</LI
><LI
> <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_trace"
>a_trace</A
></TT
> : trace level: values: 0 - 4
</LI
><LI
> <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_proxy"
>a_proxy</A
></TT
> : proxy for http access, e.g. www-cache:3128
</LI
><LI
> <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_use_curl"
>a_use_curl</A
></TT
> : for http access via external programm curl, default is native HTTP access
</LI
><LI
> <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_options_curl"
>a_options_curl</A
></TT
> : more options for external program curl
</LI
><LI
> <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_encoding"
>a_encoding</A
></TT
> : default document encoding (<TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Autf8"
>utf8</A
></TT
>, <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3AisoLatin1"
>isoLatin1</A
></TT
>, <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3AusAscii"
>usAscii</A
></TT
>, ...)
</LI
></UL
><P
>All attributes not evaluated by readDocument are stored in the created document root node for easy access of the various
options in e.g. the input/output modules
</P
><P
>If the document name is the empty string or an uri of the form &quot;stdin:&quot;, the document is read from standard input.
</P
><P
>examples:
</P
><PRE
> readDocument [ ] &quot;test.xml&quot;
</PRE
><P
>reads and validates a document &quot;test.xml&quot;, no namespace propagation, only canonicalization is performed 
</P
><PRE
> readDocument [ (a_validate, &quot;0&quot;)
              , (a_encoding, isoLatin1)
              ] &quot;test.xml&quot;
</PRE
><P
>reads document &quot;test.xml&quot; without validation, default encoding <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3AisoLatin1"
>isoLatin1</A
></TT
>.
</P
><PRE
> readDocument [ (a_parse_html, &quot;1&quot;)
              , (a_encoding, isoLatin1)
              ] &quot;&quot;
</PRE
><P
>reads a HTML document from standard input, no validation is done when parsing HTML, default encoding is <TT
><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3AisoLatin1"
>isoLatin1</A
></TT
>
</P
><PRE
> readDocument [ (a_parse_html,     &quot;1&quot;)
              , (a_proxy,          &quot;www-cache:3128&quot;)
              , (a_curl,           &quot;1&quot;)
              , (a_issue_warnings, &quot;0&quot;)
              ] &quot;http://www.haskell.org/&quot;
</PRE
><P
>reads Haskell homepage with HTML parser ignoring any warnings, with http access via external program curl and proxy &quot;www-cache&quot; at port 3128
</P
><PRE
> readDocument [ (a_validate,          &quot;1&quot;)
              , (a_check_namespace,   &quot;1&quot;)
              , (a_remove_whitespace, &quot;1&quot;)
              , (a_trace,             &quot;2&quot;)
              ] &quot;http://www.w3c.org/&quot;
</PRE
><P
>read w3c home page (xhtml), validate and check namespaces, remove whitespace between tags, trace activities with level 2
</P
><P
>for minimal complete examples see <TT
><A HREF="Text-XML-HXT-Arrow-WriteDocument.html#v%3AwriteDocument"
>writeDocument</A
></TT
> and <TT
><A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#v%3ArunX"
>runX</A
></TT
>, the main starting point for running an XML arrow.
</P
></TD
></TR
><TR
><TD CLASS="s15"
></TD
></TR
><TR
><TD CLASS="decl"
><A NAME="v%3AreadFromDocument"
></A
><B
>readFromDocument</B
> :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes"
>Attributes</A
> -&gt; <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow"
>IOStateArrow</A
> s String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree"
>XmlTree</A
></TD
></TR
><TR
><TD CLASS="doc"
>the arrow version of <TT
><A HREF="Text-XML-HXT-Arrow-ReadDocument.html#v%3AreadDocument"
>readDocument</A
></TT
>, the arrow input is the source URI
</TD
></TR
><TR
><TD CLASS="s15"
></TD
></TR
><TR
><TD CLASS="decl"
><A NAME="v%3AreadString"
></A
><B
>readString</B
> :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes"
>Attributes</A
> -&gt; String -&gt; <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow"
>IOStateArrow</A
> s b <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree"
>XmlTree</A
></TD
></TR
><TR
><TD CLASS="doc"
><P
>read a document that is stored in a normal Haskell String
</P
><P
>the same function as readDocument, but the parameter forms the input.
 All options available for <TT
><A HREF="Text-XML-HXT-Arrow-ReadDocument.html#v%3AreadDocument"
>readDocument</A
></TT
> are applicable for readString.
</P
><P
>Default encoding: No encoding is done, the String argument is taken as Unicode string
</P
></TD
></TR
><TR
><TD CLASS="s15"
></TD
></TR
><TR
><TD CLASS="decl"
><A NAME="v%3AreadFromString"
></A
><B
>readFromString</B
> :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes"
>Attributes</A
> -&gt; <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow"
>IOStateArrow</A
> s String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree"
>XmlTree</A
></TD
></TR
><TR
><TD CLASS="doc"
>the arrow version of <TT
><A HREF="Text-XML-HXT-Arrow-ReadDocument.html#v%3AreadString"
>readString</A
></TT
>, the arrow input is the source URI
</TD
></TR
><TR
><TD CLASS="s15"
></TD
></TR
><TR
><TD CLASS="decl"
><A NAME="v%3Ahread"
></A
><B
>hread</B
> :: <A HREF="Text-XML-HXT-Arrow-XmlArrow.html#t%3AArrowXml"
>ArrowXml</A
> a =&gt; a String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree"
>XmlTree</A
></TD
></TR
><TR
><TD CLASS="doc"
><P
>parse a string as HTML content, substitute all HTML entity refs and canonicalize tree
 (substitute char refs, ...). Errors are ignored.
</P
><P
>A simpler version of <TT
><A HREF="Text-XML-HXT-Arrow-ReadDocument.html#v%3AreadFromString"
>readFromString</A
></TT
> but with less functionality.
 Does not run in the IO monad
</P
></TD
></TR
><TR
><TD CLASS="s15"
></TD
></TR
><TR
><TD CLASS="decl"
><A NAME="v%3Axread"
></A
><B
>xread</B
> :: <A HREF="Text-XML-HXT-Arrow-XmlArrow.html#t%3AArrowXml"
>ArrowXml</A
> a =&gt; a String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree"
>XmlTree</A
></TD
></TR
><TR
><TD CLASS="doc"
>parse a string as XML content, substitute all predefined XML entity refs and canonicalize tree
 (substitute char refs, ...)
</TD
></TR
><TR
><TD CLASS="s15"
></TD
></TR
><TR
><TD CLASS="botbar"
>Produced by <A HREF="http://www.haskell.org/haddock/"
>Haddock</A
> version 0.8</TD
></TR
></TABLE
></BODY
></HTML
>