Sophie

Sophie

distrib > Mandriva > 9.0 > x86_64 > media > main > by-pkgid > b476a1a7ee06ea1d2924bb8a579e6ae4 > files > 62

perl-SGMLSpm-1.03ii-3mdk.noarch.rpm

\documentstyle[11pt]{article}

\setlength{\parskip}{3ex}
\raggedright

\title{SGMLS.pm: a perl5 class library for handling output from the
SGMLS and NSGMLS parsers (version 1.03)}
\author{David Megginson \\
  Department of English, \\
  University of Ottawa, \\
  Email: {\tt dmeggins@aix1.uottawa.ca} \\
}


\begin{document}
\maketitle


Welcome to {\sc SGMLS.pm}, an extensible {\sc perl5} class library for
processing the output from the {\sc sgmls} and {\sc nsgmls} parsers.
{\sc SGMLS.pm} is free, copyrighted software available by anonymous ftp in
the directory ftp://aix1.uottawa.ca/pub/dmeggins/.
You might also want to look at the documentation for {\sc sgmlspl},
a simple sample script which uses {\sc SGMLS.pm} to convert documents from
{\sc SGML} to other formats.


{\em\section{Terms}
\label{TERMS}


This program, along with its documentation, is free software;
you can redistribute it and/or modify it under the terms of the GNU
General Public License as published by the Free Software Foundation;
either version 2 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
General Public License for more details.

You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.


}



\section{What is {\sc SGMLS.pm}?}
\label{DEFINITION}


{\sc SGMLS.pm} is an extensible {\sc perl5}
class library for parsing the output from James Clark's popular
{\sc sgmls} and {\sc nsgmls} parsers, available on the Internet at {\tt ftp://jclark.com}.
This is {\em not\/} a complete system for translating
documents written the the {\em Standard Generalised Markup
Language\/} ({\sc SGML}) into other formats, but it can easily
form the basis of such a system (for a simple example, see the {\sc sgmlspl}
program included in this package).

The library recognises four basic types of {\sc SGML} objects: the
{\em element\/}, the
{\em attribute\/},
the {\em notation\/}, and the
{\em entity\/}; each
of these is a fully-developed class with methods for accessing
important information.




\section{How do I produce {\sc SGML} documents?}
\label{SGML}


I am presuming here that you are already experienced with {\sc SGML}
and the {\sc sgmls} or {\sc nsgmls} parser.  For help with the parsers see the
manual pages accompanying each one; for help with {\sc SGML} see Robin
Cover's SGML Web Page at {\tt http://www.sil.org/sgml/sgml.html}
on the Internet.




\section{How do I program in {\sc perl5}?}
\label{PERL5}


If you have to ask this question, you probably should not be
trying to use this library right now, since it is intended only for
experienced {\sc perl5} programmers.  That said, however, you can find the
{\sc perl5} documentation with the {\sc perl5} source distribution or on the
World-Wide Web at {\tt http://www.metronet.com/0/perlinfo/perl5/manual/perl.html}.

{\em Please\/} do not write to me for help on using
{\sc perl5}.




\section{How do I use {\sc SGMLS.pm}?}
\label{SGMLS}


First, you need to copy the file {\sc SGMLS.pm} to a directory where
perl can find it (on a Unix system, it might be
{\tt /usr/lib/perl5} or
{\tt /usr/local/lib/perl5}, or whatever the environment
variable {\tt PERL5LIB} is set to) and make certain that it
is world-readable.

Next, near the top of your {\sc perl5} program, type the following
line:

{\footnotesize\begin{verbatim}
use SGMLS;
\end{verbatim}}

You must then open up a file handle from which {\sc SGMLS.pm} can read the
data from an {\sc sgmls} or {\sc nsgmls} process, unless you are reading from
a standard handle like {\tt STDIN} {---} for example,
if you are piping the output from {\sc sgmls} to a {\sc perl5} script, using
something like

{\footnotesize\begin{verbatim}
sgmls foo.sgml | perl myscript.pl
\end{verbatim}}

then the predefined filehandle {\tt STDIN} will be
sufficient.  In DOS, you might want to dump the sgmls output to a file
and use it as standard input (or open it explicitly in perl), and in
Unix, you might actually want to open a pipe or socket for the input.
{\sc SGMLS.pm} doesn't need to seek, so any input stream should
work.

To parse the {\sc sgmls} or {\sc nsgmls} output from the handle, create
a new object instance of the {\tt SGMLS} class with
the handle as an argument, i.e.

{\footnotesize\begin{verbatim}
$parse = new SGMLS(STDIN);
\end{verbatim}}

(You may create more than one {\tt SGMLS}
object at once, but each object {\em must\/} have a
unique handle pointing to a unique stream, or
{\em chaos\/} will result.)  Now, you can retrieve and
process events using the {\tt next\_event} method:

{\footnotesize\begin{verbatim}
while ($event = $parse->next_event) {
    #do something with each event
}
\end{verbatim}}




\section{So what do I do with an event?}
\label{SGMLSEVENT}


The {\tt next\_event} method for the {\tt SGMLS} class returns an
object belonging to the class {\tt SGMLS\_Event}.
This class has several methods available, as listed in table \ref{TABLE.CLASS.SGMLS}.

\begin{table}[htbp]
\footnotesize
\caption{The {\tt SGMLS\_Event} class}
\label{TABLE.CLASS.SGMLS}
\vspace{2ex}\begin{tabular}{l|l|l}
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Method\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return Type\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Description\vspace{4pt}}	\\ \hline\hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt type}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the type of the event.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt data}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string, {\tt SGMLS\_Element}, or
{\tt SGMLS\_Entity}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return any data associated with the event.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt file}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the name of the {\sc SGML} source file which generated the
event, if available.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt line}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the line number of the {\sc SGML} source file which
generated the event, if available.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt element}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Element}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the element in force when the  event was
generated.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt parse}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the {\tt SGMLS} object for the current
parse.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt entity({\tt\sl ename\/})}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Look up an entity from those currently known to the parse.  An
alias for {\tt ->parse->entity(\$ename)}\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt notation({\tt\sl nname\/})}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Look up the notation from those currently known to the parse:
an alias for {\tt ->parse->notation(\$nname)}.\vspace{4pt}}	\\ \hline
\end{tabular}\end{table}

The {\tt file} and {\tt line} methods
will return useful information only if you called {\sc sgmls} or {\sc nsgmls}
with the {\tt\sl -l\/} flag to include file and
line-number information.




\section{What are the different event types and data?}
\label{EVENTS}


Table \ref{TABLE.CLASS.SGMLS.EVENT} lists the ten
different event types returned by the {\tt next\_event}
method of an {\tt SGMLS}
object and the different types of data associated with each of these
(note that these do {\em not\/} correspond to the
standard {\sc ESIS} events).

\begin{table}[htbp]
\footnotesize
\caption{The {\tt SGMLS\_Event} types}
\label{TABLE.CLASS.SGMLS.EVENT}
\vspace{2ex}\begin{tabular}{l|l|l}
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Event Type\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Event Data\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Description\vspace{4pt}}	\\ \hline\hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'start\_element'}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Element}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The beginning of an element.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'end\_element'}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Element}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The end of an element.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'cdata'}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Regular character data.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'sdata'}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Special system data.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 're'}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} [none]\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} A record-end (i.e., a newline).\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'pi'}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} A processing instruction\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'entity'}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Entity}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} A non-SGML external entity.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'start\_subdoc'}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Entity}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The beginning of an SGML subdocument.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'end\_subdoc'}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Entity}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The end of an SGML subdocument.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'conforming'}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} [none]\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The document was valid.\vspace{4pt}}	\\ \hline
\end{tabular}\end{table}

For example, if {\tt \$event->type} returns
{\tt 'start\_element'}, then
{\tt \$event->data} will return an object belonging to the
{\tt SGMLS\_Element}
class (which will contain a list of attributes, etc. {---} see
below), {\tt \$event->file} and
{\tt \$event->line} will return the file and line-number
in which the element appeared (if you called {\sc sgmls} or {\sc nsgmls} with
the {\tt\sl -l\/} flag), and
{\tt \$event->element} will return the element currently
in force (in this case, the same as
{\tt \$event->data}).




\section{What do I do with an {\tt SGMLS\_Element}?}
\label{SGMLSELEMENT}


Altogether, there are six classes in {\sc SGMLS.pm}, each with its
own methods: in addition to {\tt SGMLS} (for the parse) and
{\tt SGMLS\_Event}
(for a specific event), the classes are
{\tt SGMLS\_Element}, {\tt SGMLS\_Attribute},
{\tt SGMLS\_Entity},
and {\tt SGMLS\_Notation}.
Like all of these, {\tt SGMLS\_Element} has a number
of methods available for obtaining different types of information.
For example, if you were to use

{\footnotesize\begin{verbatim}
my $element = $event->data
\end{verbatim}}

to retrieve the data for a {\tt 'start\_element'} or
{\tt 'end\_element'} event, then you could use the methods
listed in table \ref{TABLE.CLASS.SGMLS.ELEMENT} to find more
information about the element.

\begin{table}[htbp]
\footnotesize
\caption{The {\tt SGMLS\_Element} class}
\label{TABLE.CLASS.SGMLS.ELEMENT}
\vspace{2ex}\begin{tabular}{l|l|l}
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Method\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return Type\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Description\vspace{4pt}}	\\ \hline\hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt name}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The name (or GI), in upper-case.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt parent}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Element}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The parent element, or {\tt ''} if this is the top
element.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt attributes}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} HASH\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return a reference to a hash table of
{\tt SGMLS\_Attribute} objects, keyed by the attribute
names (in upper-case).\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt attribute\_names}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} ARRAY\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} A list of all attribute names for the current element (in
upper-case).\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt attribute({\tt\sl aname\/})}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Attribute}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the attribute named ANAME.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt set\_attribute({\tt\sl attribute\/})}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} [none]\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The {\tt\sl attribute\/} argument should be an
object belonging to the {\tt SGMLS\_Attribute}
class.  Add it to the element, replacing any previous attribute with
the same name.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt in({\tt\sl name\/})}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Element}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} If the current element's parent is named
{\tt\sl name\/}, return the parent; otherwise, return
{\tt ''}.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt within({\tt\sl name\/})}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Element}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} If any ancestor of the current element is named
{\tt\sl name\/}, return it; otherwise, return
{\tt ''}.\vspace{4pt}}	\\ \hline
\end{tabular}\end{table}




\section{What do I do with an
{\tt SGMLS\_Attribute}?}
\label{SGMLSATTRIBUTE}


Note that objects of the {\tt SGMLS\_Attribute}
class do not have events in their own right, and are available only
through the {\tt attributes} or
{\tt attribute({\tt\sl aname\/})} methods for
{\tt SGMLS\_Element}
objects.  An object belonging to the
{\tt SGMLS\_Attribute} class will recognise the
methods listed in table \ref{TABLE.CLASS.SGMLS.ATTRIBUTE}.

\begin{table}[htbp]
\footnotesize
\caption{The {\tt SGMLS\_Attribute} class}
\label{TABLE.CLASS.SGMLS.ATTRIBUTE}
\vspace{2ex}\begin{tabular}{l|l|l}
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Method\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return Type\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Description\vspace{4pt}}	\\ \hline\hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt name}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The name of the attribute (in upper-case).\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt type}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The type of the attribute: {\tt 'IMPLIED'},
{\tt 'CDATA'}, {\tt 'NOTATION'},
{\tt 'ENTITY'}, or {\tt 'TOKEN'}.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt value}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string, {\tt SGMLS\_Entity}, or
{\tt SGMLS\_Notation}.\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The value of the attribute.  If the type is
{\tt 'CDATA'} or {\tt 'TOKEN'}, it will be a
simple string; if it is {\tt 'NOTATION'} it will be an
object belonging to the {\tt SGMLS\_Notation} class,
and if it is {\tt 'Entity'} it will be an object
belonging to the {\tt SGMLS\_Entity} class.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt is\_implied}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} boolean\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return true if the value of the attribute is implied, or false if
it has an explicit value.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt set\_type({\tt\sl type\/})}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} [none]\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Provide a new type for the current attribute -- no sanity
checking will be performed, so be careful.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt set\_value({\tt\sl value\/})}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} [none]\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Provide a new value for the current attribute -- no sanity
checking will be performed, so be careful.\vspace{4pt}}	\\ \hline
\end{tabular}\end{table}

Note that the type {\tt 'TOKEN'} includes both
individual tokens and lists of tokens (ie {\tt 'TOKENS'},
{\tt 'IDS'}, or {\tt 'IDREFS'} in the
original {\sc SGML} document), so you might need to use the perl function
'split' to break the value string into a list.




\section{What do I do with an {\tt SGMLS\_Entity}?}
\label{SGMLSENTITY}


An {\tt SGMLS\_Entity} object can come in an
{\tt 'entity'} event (in
which case it is always external), in a
{\tt 'start\_subdoc'} or {\tt 'end\_subdoc'}
event (in which case it always has the type
{\tt 'SUBDOC'}), or as the value of an attribute (in
which case it may be internal or external).  An object belonging to
the {\tt SGMLS\_Entity} class may use the methods
listed in table \ref{TABLE.CLASS.SGMLS.ENTITY}.

\begin{table}[htbp]
\footnotesize
\caption{The {\tt SGMLS\_Entity} class}
\label{TABLE.CLASS.SGMLS.ENTITY}
\vspace{2ex}\begin{tabular}{l|l|l}
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Method\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return Type\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Description\vspace{4pt}}	\\ \hline\hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt name}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The entity name.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt type}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The entity type: {\tt 'CDATA'},
{\tt 'SDATA'}, {\tt 'NDATA'}, or
{\tt 'SUBDOC'}.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt value}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The entity replacement text (internal entities
only).\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt sysid}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The system identifier (external entities only).\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt pubid}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The public identifier (external entities only).\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt filenames}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} ARRAY\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} A list of file names generated from the sysid and pubid
(external entities only).\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt notation}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Notation}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The associated notation (external data entities only).\vspace{4pt}}	\\ \hline
\end{tabular}\end{table}

An entity of type {\tt 'SUBDOC'} will have a sysid
and pubid, and external data entity will have a sysid, pubid,
filenames, and a notation, and an internal data entity will have a
value.




\section{What do I do with an {\tt SGMLS\_Notation}?}
\label{SGMLSNOTATION}


The fourth data class is the notation, which is available only
as a return value from the {\tt notation} method of an
{\tt SGMLS\_Entity}
or the {\tt value} method of an {\tt SGMLS\_Attribute}
with type {\tt 'NOTATION'}.  You can use the notation to
decide how to treat non-SGML data (such as graphics).  An object
belonging to the {\tt SGMLS\_Notation} class will have
access to the methods listed in table \ref{TABLE.CLASS.SGMLS.NOTATION}.

\begin{table}[htbp]
\footnotesize
\caption{The {\tt SGMLS\_Notation class}}
\label{TABLE.CLASS.SGMLS.NOTATION}
\vspace{2ex}\begin{tabular}{l|l|l}
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Method\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return Type\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Description\vspace{4pt}}	\\ \hline\hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt name}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The notation's name.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt sysid}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The notation's system identifier.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt pubid}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The notation's public identifier.\vspace{4pt}}	\\ \hline
\end{tabular}\end{table}

What you do with this information is
{\em entirely\/} up to you.




\section{Is there any extra information available from the {\sc SGML}
document?}
\label{XTRAINFO}


The {\tt SGMLS}
object which you created at the beginning of the parse has several
methods available in addition to {\tt next\_event} {---}
you will find them all listed in table \ref{TABLE.CLASS.SGMLS.EXTRA}.  There should normally be no need to
use the {\tt notation} and {\tt entity}
methods, since {\sc SGMLS.pm} will look up entities and notations for you
automatically as needed.

\begin{table}[htbp]
\footnotesize
\caption{Additional methods for the {\tt SGMLS}
class}
\label{TABLE.CLASS.SGMLS.EXTRA}
\vspace{2ex}\begin{tabular}{l|l|l}
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Method\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return Type\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Description\vspace{4pt}}	\\ \hline\hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt next\_event}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Event}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the next event.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt appinfo}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the APPINFO parameter from the {\sc SGML} declaration, if
any.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt notation({\tt\sl nname\/})}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Notation}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Look up a notation by name.\vspace{4pt}}	\\ \hline
\parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt entity({\tt\sl ename\/})}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Entity}\vspace{4pt}}	 & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Look up an entity by name.\vspace{4pt}}	\\ \hline
\end{tabular}\end{table}




\section{How about a simple example?}
\label{EXAMPLE}


OK.  The following script simply reports its events:

{\footnotesize\begin{verbatim}
#!/usr/bin/perl

use SGMLS;

$this_parse = new SGMLS(STDIN); # Read from standard input.

while ($this_event = $this_parse->next_event) {
    my $type = $this_event->type;
    my $data = $this_event->data;
  SWITCH: {
      $type eq 'start_element' && do {
          print "Beginning element: " . $data->name . "\n";
          last SWITCH;
      };
      $type eq 'end_element' && do {
          print "Ending element: " . $data->name . "\n";
          last SWITCH;
      };
      $type eq 'cdata' && do {
          print "Character data: " . $data . "\n";
          last SWITCH;
      };
      $type eq 'sdata' && do {
          print "Special data: " . $data . "\n";
          last SWITCH;
      };
      $type eq 're' && do {
          print "Record End\n";
          last SWITCH;
      };
      $type eq 'pi' && do {
          print "Processing Instruction: " . $data . "\n";
          last SWITCH;
      };
      $type eq 'entity' && do {
          print "External Data Entity: " . $data->name .
              " with notation " . $data->notation->name . "\n";
          last SWITCH;
      };
      $type eq 'start_subdoc' && do {
          print "Beginning Subdocument Entity: " . $data->name . "\n";
          last SWITCH;
      };
      $type eq 'end_subdoc' && do {
          print "Ending Subdocument Entity: " . $data->name . "\n";
          last SWITCH;
      };
      $type eq 'conforming' && do {
          print "This is a conforming SGML document\n";
          last SWITCH;
      };
  }
}

\end{verbatim}}

To use it under Unix, try something like

{\footnotesize\begin{verbatim}
sgmls document.sgml | perl sample.pl
\end{verbatim}}

and watch the output scroll down.




\section{How do I design my {\em own\/} classes?}
\label{EXTEND}


In addition to the methods listed above, all of the classes used
in {\sc SGMLS.pm} have an {\tt ext} method which returns a
reference to an initially-empty hash table.  You are free to use this
hash table to store {\em anything\/} you want {---} it
should be especially useful if you are building your own, derived
classes from the ones provided here.  The following example builds a
derived class {\tt My\_Element} from the {\tt SGMLS\_Element}
class, adding methods to set and get the current font:

{\footnotesize\begin{verbatim}
use SGMLS;

package My_Element;
@ISA = qw(SGMLS_Element);

sub new {
  my ($class,$element,$font) = @_;
  $element->ext->{'font'} = $font;
  return bless $element;
}

sub get_font {
  my ($self) = @_;
  return $self->ext->{'font'};
}

sub set_font {
  my ($self,$font) = @_;
  $self->ext->{'font'} = $font;
}
\end{verbatim}}

Note that the derived class does not need to have any knowledge
about the underlying structure of the {\tt SGMLS\_Element}
class, and need only avoid shadowing any of the methods currently
existing there.

If you decide to create a derived class from the {\tt SGMLS}, please note that in
addition to the methods listed above, that class uses internal methods
named {\tt element}, {\tt line}, and
{\tt file}, similar to the same methods in {\tt SGMLS\_Event} {---}
it is essential that you not shadow these method names.




\section{Are there any bugs?}
\label{BUGS}


Of course!  Right now, {\sc SGMLS.pm} silently ignores link attributes
({\sc nsgmls} only) and data attributes, and there may be many other bugs
which I have not yet found.


\end{document}