<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <!--Rendered using the Haskell Html Library v0.2--> <HTML ><HEAD ><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8" ><TITLE >Text.XML.HXT.Arrow.ReadDocument</TITLE ><LINK HREF="haddock.css" REL="stylesheet" TYPE="text/css" ><SCRIPT SRC="haddock.js" TYPE="text/javascript" ></SCRIPT ></HEAD ><BODY ><TABLE CLASS="vanilla" CELLSPACING="0" CELLPADDING="0" ><TR ><TD CLASS="topbar" ><TABLE CLASS="vanilla" CELLSPACING="0" CELLPADDING="0" ><TR ><TD ><IMG SRC="haskell_icon.gif" WIDTH="16" HEIGHT="16" ALT=" " ></TD ><TD CLASS="title" >hxt-7.1: </TD ><TD CLASS="topbut" ><A HREF="index.html" >Contents</A ></TD ><TD CLASS="topbut" ><A HREF="doc-index.html" >Index</A ></TD ></TR ></TABLE ></TD ></TR ><TR ><TD CLASS="modulebar" ><TABLE CLASS="vanilla" CELLSPACING="0" CELLPADDING="0" ><TR ><TD ><FONT SIZE="6" >Text.XML.HXT.Arrow.ReadDocument</FONT ></TD ><TD ALIGN="right" ><TABLE CLASS="narrow" CELLSPACING="0" CELLPADDING="0" ><TR ><TD CLASS="infohead" >Portability</TD ><TD CLASS="infoval" >portable</TD ></TR ><TR ><TD CLASS="infohead" >Stability</TD ><TD CLASS="infoval" >experimental</TD ></TR ><TR ><TD CLASS="infohead" >Maintainer</TD ><TD CLASS="infoval" >Uwe Schmidt (uwe\@fh-wedel.de)</TD ></TR ></TABLE ></TD ></TR ></TABLE ></TD ></TR ><TR ><TD CLASS="s15" ></TD ></TR ><TR ><TD CLASS="section1" >Description</TD ></TR ><TR ><TD CLASS="doc" ><P >Version : $Id: ReadDocument.hs,v 1.10 2006<EM >11</EM >24 07:41:37 hxml Exp $ </P ><P >Compound arrows for reading an XML/HTML document or an XML/HTML string </P ></TD ></TR ><TR ><TD CLASS="s15" ></TD ></TR ><TR ><TD CLASS="section1" >Synopsis</TD ></TR ><TR ><TD CLASS="s15" ></TD ></TR ><TR ><TD CLASS="body" ><TABLE CLASS="vanilla" CELLSPACING="0" CELLPADDING="0" ><TR ><TD CLASS="decl" ><A HREF="#v%3AreadDocument" >readDocument</A > :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes" >Attributes</A > -> String -> <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow" >IOStateArrow</A > s b <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree" >XmlTree</A ></TD ></TR ><TR ><TD CLASS="s8" ></TD ></TR ><TR ><TD CLASS="decl" ><A HREF="#v%3AreadFromDocument" >readFromDocument</A > :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes" >Attributes</A > -> <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow" >IOStateArrow</A > s String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree" >XmlTree</A ></TD ></TR ><TR ><TD CLASS="s8" ></TD ></TR ><TR ><TD CLASS="decl" ><A HREF="#v%3AreadString" >readString</A > :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes" >Attributes</A > -> String -> <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow" >IOStateArrow</A > s b <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree" >XmlTree</A ></TD ></TR ><TR ><TD CLASS="s8" ></TD ></TR ><TR ><TD CLASS="decl" ><A HREF="#v%3AreadFromString" >readFromString</A > :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes" >Attributes</A > -> <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow" >IOStateArrow</A > s String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree" >XmlTree</A ></TD ></TR ><TR ><TD CLASS="s8" ></TD ></TR ><TR ><TD CLASS="decl" ><A HREF="#v%3Ahread" >hread</A > :: <A HREF="Text-XML-HXT-Arrow-XmlArrow.html#t%3AArrowXml" >ArrowXml</A > a => a String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree" >XmlTree</A ></TD ></TR ><TR ><TD CLASS="s8" ></TD ></TR ><TR ><TD CLASS="decl" ><A HREF="#v%3Axread" >xread</A > :: <A HREF="Text-XML-HXT-Arrow-XmlArrow.html#t%3AArrowXml" >ArrowXml</A > a => a String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree" >XmlTree</A ></TD ></TR ></TABLE ></TD ></TR ><TR ><TD CLASS="s15" ></TD ></TR ><TR ><TD CLASS="section1" >Documentation</TD ></TR ><TR ><TD CLASS="s15" ></TD ></TR ><TR ><TD CLASS="decl" ><A NAME="v%3AreadDocument" ></A ><B >readDocument</B > :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes" >Attributes</A > -> String -> <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow" >IOStateArrow</A > s b <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree" >XmlTree</A ></TD ></TR ><TR ><TD CLASS="doc" ><P >the main document input filter </P ><P >this filter can be configured by an option list, a value of type <A HREF="Attributes.html" >Attributes</A > </P ><P >available options: </P ><UL ><LI > <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_parse_html" >a_parse_html</A ></TT >: use HTML parser, else use XML parser (default) </LI ><LI > <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_validate" >a_validate</A ></TT > : validate document againsd DTD (default), else skip validation </LI ><LI > <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_relax_schema" >a_relax_schema</A ></TT > : validate document with Relax NG, the options value is the schema URI this implies using XML parser, no validation against DTD, and canonicalisation </LI ><LI > <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_check_namespaces" >a_check_namespaces</A ></TT > : check namespaces, else skip namespace processing (default) </LI ><LI > <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_canonicalize" >a_canonicalize</A ></TT > : canonicalize document (default), else skip canonicalization </LI ><LI > <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_preserve_comment" >a_preserve_comment</A ></TT > : preserve comments during canonicalization, else remove comments (default) </LI ><LI > <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_remove_whitespace" >a_remove_whitespace</A ></TT > : remove all whitespace, used for document indentation, else skip this step (default) </LI ><LI > <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_indent" >a_indent</A ></TT > : indent document by inserting whitespace, else skip this step (default) </LI ><LI > <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_issue_warnings" >a_issue_warnings</A ></TT > : issue warnings, when parsing HTML (default), else ignore HTML parser warnings </LI ><LI > <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_issue_errors" >a_issue_errors</A ></TT > : issue all error messages on stderr (default), or ignore all error messages (default) </LI ><LI > <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_trace" >a_trace</A ></TT > : trace level: values: 0 - 4 </LI ><LI > <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_proxy" >a_proxy</A ></TT > : proxy for http access, e.g. www-cache:3128 </LI ><LI > <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_use_curl" >a_use_curl</A ></TT > : for http access via external programm curl, default is native HTTP access </LI ><LI > <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_options_curl" >a_options_curl</A ></TT > : more options for external program curl </LI ><LI > <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Aa_encoding" >a_encoding</A ></TT > : default document encoding (<TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3Autf8" >utf8</A ></TT >, <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3AisoLatin1" >isoLatin1</A ></TT >, <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3AusAscii" >usAscii</A ></TT >, ...) </LI ></UL ><P >All attributes not evaluated by readDocument are stored in the created document root node for easy access of the various options in e.g. the input/output modules </P ><P >If the document name is the empty string or an uri of the form "stdin:", the document is read from standard input. </P ><P >examples: </P ><PRE > readDocument [ ] "test.xml" </PRE ><P >reads and validates a document "test.xml", no namespace propagation, only canonicalization is performed </P ><PRE > readDocument [ (a_validate, "0") , (a_encoding, isoLatin1) ] "test.xml" </PRE ><P >reads document "test.xml" without validation, default encoding <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3AisoLatin1" >isoLatin1</A ></TT >. </P ><PRE > readDocument [ (a_parse_html, "1") , (a_encoding, isoLatin1) ] "" </PRE ><P >reads a HTML document from standard input, no validation is done when parsing HTML, default encoding is <TT ><A HREF="Text-XML-HXT-DOM-XmlKeywords.html#v%3AisoLatin1" >isoLatin1</A ></TT > </P ><PRE > readDocument [ (a_parse_html, "1") , (a_proxy, "www-cache:3128") , (a_curl, "1") , (a_issue_warnings, "0") ] "http://www.haskell.org/" </PRE ><P >reads Haskell homepage with HTML parser ignoring any warnings, with http access via external program curl and proxy "www-cache" at port 3128 </P ><PRE > readDocument [ (a_validate, "1") , (a_check_namespace, "1") , (a_remove_whitespace, "1") , (a_trace, "2") ] "http://www.w3c.org/" </PRE ><P >read w3c home page (xhtml), validate and check namespaces, remove whitespace between tags, trace activities with level 2 </P ><P >for minimal complete examples see <TT ><A HREF="Text-XML-HXT-Arrow-WriteDocument.html#v%3AwriteDocument" >writeDocument</A ></TT > and <TT ><A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#v%3ArunX" >runX</A ></TT >, the main starting point for running an XML arrow. </P ></TD ></TR ><TR ><TD CLASS="s15" ></TD ></TR ><TR ><TD CLASS="decl" ><A NAME="v%3AreadFromDocument" ></A ><B >readFromDocument</B > :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes" >Attributes</A > -> <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow" >IOStateArrow</A > s String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree" >XmlTree</A ></TD ></TR ><TR ><TD CLASS="doc" >the arrow version of <TT ><A HREF="Text-XML-HXT-Arrow-ReadDocument.html#v%3AreadDocument" >readDocument</A ></TT >, the arrow input is the source URI </TD ></TR ><TR ><TD CLASS="s15" ></TD ></TR ><TR ><TD CLASS="decl" ><A NAME="v%3AreadString" ></A ><B >readString</B > :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes" >Attributes</A > -> String -> <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow" >IOStateArrow</A > s b <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree" >XmlTree</A ></TD ></TR ><TR ><TD CLASS="doc" ><P >read a document that is stored in a normal Haskell String </P ><P >the same function as readDocument, but the parameter forms the input. All options available for <TT ><A HREF="Text-XML-HXT-Arrow-ReadDocument.html#v%3AreadDocument" >readDocument</A ></TT > are applicable for readString. </P ><P >Default encoding: No encoding is done, the String argument is taken as Unicode string </P ></TD ></TR ><TR ><TD CLASS="s15" ></TD ></TR ><TR ><TD CLASS="decl" ><A NAME="v%3AreadFromString" ></A ><B >readFromString</B > :: <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AAttributes" >Attributes</A > -> <A HREF="Text-XML-HXT-Arrow-XmlIOStateArrow.html#t%3AIOStateArrow" >IOStateArrow</A > s String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree" >XmlTree</A ></TD ></TR ><TR ><TD CLASS="doc" >the arrow version of <TT ><A HREF="Text-XML-HXT-Arrow-ReadDocument.html#v%3AreadString" >readString</A ></TT >, the arrow input is the source URI </TD ></TR ><TR ><TD CLASS="s15" ></TD ></TR ><TR ><TD CLASS="decl" ><A NAME="v%3Ahread" ></A ><B >hread</B > :: <A HREF="Text-XML-HXT-Arrow-XmlArrow.html#t%3AArrowXml" >ArrowXml</A > a => a String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree" >XmlTree</A ></TD ></TR ><TR ><TD CLASS="doc" ><P >parse a string as HTML content, substitute all HTML entity refs and canonicalize tree (substitute char refs, ...). Errors are ignored. </P ><P >A simpler version of <TT ><A HREF="Text-XML-HXT-Arrow-ReadDocument.html#v%3AreadFromString" >readFromString</A ></TT > but with less functionality. Does not run in the IO monad </P ></TD ></TR ><TR ><TD CLASS="s15" ></TD ></TR ><TR ><TD CLASS="decl" ><A NAME="v%3Axread" ></A ><B >xread</B > :: <A HREF="Text-XML-HXT-Arrow-XmlArrow.html#t%3AArrowXml" >ArrowXml</A > a => a String <A HREF="Text-XML-HXT-DOM-TypeDefs.html#t%3AXmlTree" >XmlTree</A ></TD ></TR ><TR ><TD CLASS="doc" >parse a string as XML content, substitute all predefined XML entity refs and canonicalize tree (substitute char refs, ...) </TD ></TR ><TR ><TD CLASS="s15" ></TD ></TR ><TR ><TD CLASS="botbar" >Produced by <A HREF="http://www.haskell.org/haddock/" >Haddock</A > version 0.8</TD ></TR ></TABLE ></BODY ></HTML >