.\" Automatically generated by Pod::Man 2.22 (Pod::Simple 3.07) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "PXPVALIDATE 1" .TH PXPVALIDATE 1 "2012-01-11" "PXP 1.1 tools" "www.ocaml-programming.de" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" pxpvalidate \- validate XML documents .SH "SYNOPSIS" .IX Header "SYNOPSIS" \&\fBpxpvalidate\fR [ \fI\s-1OPTION\s0\fR ... ] [ \fI\s-1URL\s0\fR ... ] .SH "DESCRIPTION" .IX Header "DESCRIPTION" This command validates the \s-1XML\s0 documents specified on the command line by the \s-1XML\s0 parser \s-1PXP\s0. The program produces warning and error messages on stderr, and it exits with a non-zero code if an error is found. The program stops at the first error. If the \s-1XML\s0 documents are all valid, the program exits with a code of 0. .SS "URLs" .IX Subsection "URLs" The documents are named by their \fIURLs\fR. By default, only \f(CW\*(C`file\*(C'\fR URLs are allowed (but you can optionally configure helper applications to process other \s-1URL\s0 schemes such as \f(CW\*(C`http\*(C'\fR or \f(CW\*(C`ftp\*(C'\fR, see below). For example, to validate the file stored in \f(CW\*(C`/directory/data.xml\*(C'\fR you can call \fBpxpvalidate\fR as follows: .PP .Vb 1 \& pxpvalidate file:///directory/data.xml .Ve .PP Note that the conventions for \s-1URL\s0 notations apply: Meta characters like \&\*(L"#\*(R" or \*(L"?\*(R" are reserved and must be written using a \*(L"%hex\*(R" encoding; for instance \*(L"%23\*(R" instead of \*(L"#\*(R", or \*(L"%3f\*(R" instead of \*(L"?\*(R". .PP If you do not pass an absolute \s-1URL\s0 to \fBpxpvalidate\fR, the \s-1URL\s0 will be interpreted relative to the current directory. So .PP .Vb 2 \& cd /directory \& pxpvalidate data.xml .Ve .PP works as well. .SS "External entities" .IX Subsection "External entities" The parser reads not only the documents passed on the command line but also every other document that is referred to as an external entity. For example, if the \s-1XML\s0 document is .PP .Vb 3 \& <!DOCTYPE sample SYSTEM "sample.dtd" [ \& <!ENTITY text SYSTEM "text.xml"> \& ]> \& \& <sample> \& &text; \& </sample> .Ve .PP the parser reads the files \*(L"sample.dtd\*(R" and \*(L"text.xml\*(R", too, and of course further files if further entity references occur in these files. .PP The parser indicates an error if it cannot resolve a reference to an external file. .SS "Extent of the validation checks" .IX Subsection "Extent of the validation checks" The parser checks all well-formedness and validation constraints specified in the \s-1XML\s0 1.0 standard. This includes: .IP "\(bu" 4 Whether the documents are well-formed (syntactically correct) .IP "\(bu" 4 Whether the elements and attributes are used as declared, i.e. meet the \f(CW\*(C`ELEMENT\*(C'\fR and \f(CW\*(C`ATTLIST\*(C'\fR declarations. .IP "\(bu" 4 Whether the document is standalone if flagged as standalone .IP "\(bu" 4 Whether the \s-1ID\s0 attributes are unique .IP "\(bu" 4 Whether fixed attributes have the declared value .IP "\(bu" 4 Whether entities exist .IP "\(bu" 4 Whether notations exist .PP This list is not complete, see the full specification of \s-1XML\s0 1.0 for details. .SH "OPTIONS" .IX Header "OPTIONS" .IP "\fB\-wf\fR" 4 .IX Item "-wf" The parser checks only the well-formedness of the documents, and omits any validation. .IP "\fB\-iso\-8859\-1\fR" 4 .IX Item "-iso-8859-1" The parser represents the documents internally as \s-1ISO\-8859\-1\s0 encoded strings and not as \s-1UTF\-8\s0 strings, the default. This results in faster processing if the documents are encoded in \s-1ISO\-8859\-1\s0, but it may cause problems if documents contain characters outside of the range of the \s-1ISO\-8859\-1\s0 character set. .IP "\fB\-namespaces\fR" 4 .IX Item "-namespaces" Enables the namespace support. If the \fB\-wf\fR option is turned on, too, this will only mean that element and attribute names must not contain more than one colon character. Unlike other parsers, \s-1PXP\s0 can validate documents using namespaces. See below for a discussion of this issue. .IP "\fB\-pubid\fR \fIid\fR=\fIfile\fR" 4 .IX Item "-pubid id=file" If the parser finds a \s-1PUBLIC\s0 identifier \fIid\fR, it will read the specified \fIfile\fR. (This is really a file name, and not a \s-1URL\s0.) This option overrides the system identifier found in the document for this public identifier. .Sp This option can be specified several times. .IP "\fB\-helper\fR \fIscheme\fR=\fIcommand\fR" 4 .IX Item "-helper scheme=command" Configures a helper command that gets the contents of a \s-1URL\s0 for the given \fIscheme\fR. For example, to use \fBwget\fR as helper application for ftp URLs, add the option .Sp .Vb 1 \& \-helper \*(Aqftp=wget \-O \- \-nv\*(Aq .Ve .Sp The command is expected to output the contents of the file to stdout. .Sp This option can be specified several times. .IP "\fB\-helper\-mh\fR \fIscheme\fR=\fIcommand\fR" 4 .IX Item "-helper-mh scheme=command" Configures a helper command that gets the contents of a \s-1URL\s0 for the given \fIscheme\fR. For example, to use \fBwget\fR as helper application for http URLs, add the option .Sp .Vb 1 \& \-helper\-mh \*(Aqhttp=wget \-O \- \-nv \-s\*(Aq .Ve .Sp The command is expected to output first a \s-1MIME\s0 header and then, separated by a blank line, the contents of the file to stdout. .Sp Using \fB\-helper\-mh\fR is preferred if \s-1MIME\s0 headers are available. The parser extracts the character encoding of the file from the \s-1MIME\s0 header. .Sp This option can be specified several times. .SH "ENCODINGS" .IX Header "ENCODINGS" .SS "The character encoding of URLs" .IX Subsection "The character encoding of URLs" URLs are interpreted as \*(L"%\-encoded \s-1UTF\-8\s0 strings\*(R", as suggested by the \&\s-1XML\s0 standard. .SS "The character encoding of filenames" .IX Subsection "The character encoding of filenames" The parser assumes that the file system stores filenames as \s-1UTF\-8\s0 strings. (Sorry, it is currently not possible to change this.) .SS "The encodings of entities" .IX Subsection "The encodings of entities" Every external entity can be encoded in a different character set. A document can refer to entities that are encoded differently. The parser supports \s-1UTF\-8\s0, \s-1UTF\-16\s0, \s-1UTF\-32\s0, all \s-1ISO\-8859\s0 encodings, and a list of other 8 bit character set. .SH "VALIDATION AND NAMESPACES" .IX Header "VALIDATION AND NAMESPACES" \&\s-1PXP\s0 can validate documents that use namespaces. However, it is necessary to add processing instructions to the \s-1DTD\s0 because the \s-1XML\s0 standard does not specify how to refer to namespaces in DTDs. .PP It is quite simple. The processing instruction .PP <?pxp:dtd namespace prefix="\fIprefix\fR\*(L" uri=\*(R"\fIuri\fR"?> .PP declares that the namespace \fIuri\fR can be referred to from the \s-1DTD\s0 by prefixing element and attribute names by the string \fIprefix\fR, followed by a colon character. For example, to define the prefix \&\f(CW\*(C`xh\*(C'\fR for the \s-1XHTML\s0 \s-1URI\s0 \f(CW\*(C`http://www.w3.org/1999/xhtml\*(C'\fR just add the line .PP .Vb 1 \& <?pxp:dtd namespace prefix="xh" uri="http://www.w3.org/1999/xhtml"?> .Ve .PP to the \s-1DTD\s0, and prepend the prefix \*(L"xh:\*(R" to the names of all elements in the \s-1DTD\s0 (e.g. \*(L"xh:body\*(R" instead of \*(L"body\*(R"). .PP If the author of the \s-1XHTML\s0 document prefers another prefix, \s-1PXP\s0 automatically converts the other prefix to the declared prefix \*(L"xh\*(R": .PP .Vb 3 \& <html:body xmlns:html="http://www.w3.org/1999/xhtml"> \& ... \& </html:body> .Ve .PP Note that if you do not declare the namespaces in the \s-1DTD\s0, the parser will use the first used prefix for a namespace as the reference prefix, and it will take this prefix to match the element names in the document with the element names in the \s-1DTD\s0. This may cause problems. .SH "AUTHOR" .IX Header "AUTHOR" The parser \s-1PXP\s0 and the frontend \fBpxpvalidate\fR have been written by Gerd Stolpmann (gerd@gerd\-stolpmann.de). .SH "WEB SITE" .IX Header "WEB SITE" The sources are published on the web site http://www.ocaml\-programming.de. You will find there also material about the programming language Objective Caml (in which \s-1PXP\s0 is written).