\documentstyle[11pt]{article} \setlength{\parskip}{3ex} \raggedright \title{SGMLS.pm: a perl5 class library for handling output from the SGMLS and NSGMLS parsers (version 1.03)} \author{David Megginson \\ Department of English, \\ University of Ottawa, \\ Email: {\tt dmeggins@aix1.uottawa.ca} \\ } \begin{document} \maketitle Welcome to {\sc SGMLS.pm}, an extensible {\sc perl5} class library for processing the output from the {\sc sgmls} and {\sc nsgmls} parsers. {\sc SGMLS.pm} is free, copyrighted software available by anonymous ftp in the directory ftp://aix1.uottawa.ca/pub/dmeggins/. You might also want to look at the documentation for {\sc sgmlspl}, a simple sample script which uses {\sc SGMLS.pm} to convert documents from {\sc SGML} to other formats. {\em\section{Terms} \label{TERMS} This program, along with its documentation, is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. } \section{What is {\sc SGMLS.pm}?} \label{DEFINITION} {\sc SGMLS.pm} is an extensible {\sc perl5} class library for parsing the output from James Clark's popular {\sc sgmls} and {\sc nsgmls} parsers, available on the Internet at {\tt ftp://jclark.com}. This is {\em not\/} a complete system for translating documents written the the {\em Standard Generalised Markup Language\/} ({\sc SGML}) into other formats, but it can easily form the basis of such a system (for a simple example, see the {\sc sgmlspl} program included in this package). The library recognises four basic types of {\sc SGML} objects: the {\em element\/}, the {\em attribute\/}, the {\em notation\/}, and the {\em entity\/}; each of these is a fully-developed class with methods for accessing important information. \section{How do I produce {\sc SGML} documents?} \label{SGML} I am presuming here that you are already experienced with {\sc SGML} and the {\sc sgmls} or {\sc nsgmls} parser. For help with the parsers see the manual pages accompanying each one; for help with {\sc SGML} see Robin Cover's SGML Web Page at {\tt http://www.sil.org/sgml/sgml.html} on the Internet. \section{How do I program in {\sc perl5}?} \label{PERL5} If you have to ask this question, you probably should not be trying to use this library right now, since it is intended only for experienced {\sc perl5} programmers. That said, however, you can find the {\sc perl5} documentation with the {\sc perl5} source distribution or on the World-Wide Web at {\tt http://www.metronet.com/0/perlinfo/perl5/manual/perl.html}. {\em Please\/} do not write to me for help on using {\sc perl5}. \section{How do I use {\sc SGMLS.pm}?} \label{SGMLS} First, you need to copy the file {\sc SGMLS.pm} to a directory where perl can find it (on a Unix system, it might be {\tt /usr/lib/perl5} or {\tt /usr/local/lib/perl5}, or whatever the environment variable {\tt PERL5LIB} is set to) and make certain that it is world-readable. Next, near the top of your {\sc perl5} program, type the following line: {\footnotesize\begin{verbatim} use SGMLS; \end{verbatim}} You must then open up a file handle from which {\sc SGMLS.pm} can read the data from an {\sc sgmls} or {\sc nsgmls} process, unless you are reading from a standard handle like {\tt STDIN} {---} for example, if you are piping the output from {\sc sgmls} to a {\sc perl5} script, using something like {\footnotesize\begin{verbatim} sgmls foo.sgml | perl myscript.pl \end{verbatim}} then the predefined filehandle {\tt STDIN} will be sufficient. In DOS, you might want to dump the sgmls output to a file and use it as standard input (or open it explicitly in perl), and in Unix, you might actually want to open a pipe or socket for the input. {\sc SGMLS.pm} doesn't need to seek, so any input stream should work. To parse the {\sc sgmls} or {\sc nsgmls} output from the handle, create a new object instance of the {\tt SGMLS} class with the handle as an argument, i.e. {\footnotesize\begin{verbatim} $parse = new SGMLS(STDIN); \end{verbatim}} (You may create more than one {\tt SGMLS} object at once, but each object {\em must\/} have a unique handle pointing to a unique stream, or {\em chaos\/} will result.) Now, you can retrieve and process events using the {\tt next\_event} method: {\footnotesize\begin{verbatim} while ($event = $parse->next_event) { #do something with each event } \end{verbatim}} \section{So what do I do with an event?} \label{SGMLSEVENT} The {\tt next\_event} method for the {\tt SGMLS} class returns an object belonging to the class {\tt SGMLS\_Event}. This class has several methods available, as listed in table \ref{TABLE.CLASS.SGMLS}. \begin{table}[htbp] \footnotesize \caption{The {\tt SGMLS\_Event} class} \label{TABLE.CLASS.SGMLS} \vspace{2ex}\begin{tabular}{l|l|l} \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Method\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return Type\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Description\vspace{4pt}} \\ \hline\hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt type}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the type of the event.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt data}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string, {\tt SGMLS\_Element}, or {\tt SGMLS\_Entity}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return any data associated with the event.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt file}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the name of the {\sc SGML} source file which generated the event, if available.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt line}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the line number of the {\sc SGML} source file which generated the event, if available.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt element}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Element}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the element in force when the event was generated.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt parse}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the {\tt SGMLS} object for the current parse.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt entity({\tt\sl ename\/})}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Look up an entity from those currently known to the parse. An alias for {\tt ->parse->entity(\$ename)}\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt notation({\tt\sl nname\/})}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Look up the notation from those currently known to the parse: an alias for {\tt ->parse->notation(\$nname)}.\vspace{4pt}} \\ \hline \end{tabular}\end{table} The {\tt file} and {\tt line} methods will return useful information only if you called {\sc sgmls} or {\sc nsgmls} with the {\tt\sl -l\/} flag to include file and line-number information. \section{What are the different event types and data?} \label{EVENTS} Table \ref{TABLE.CLASS.SGMLS.EVENT} lists the ten different event types returned by the {\tt next\_event} method of an {\tt SGMLS} object and the different types of data associated with each of these (note that these do {\em not\/} correspond to the standard {\sc ESIS} events). \begin{table}[htbp] \footnotesize \caption{The {\tt SGMLS\_Event} types} \label{TABLE.CLASS.SGMLS.EVENT} \vspace{2ex}\begin{tabular}{l|l|l} \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Event Type\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Event Data\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Description\vspace{4pt}} \\ \hline\hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'start\_element'}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Element}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The beginning of an element.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'end\_element'}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Element}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The end of an element.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'cdata'}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Regular character data.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'sdata'}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Special system data.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 're'}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} [none]\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} A record-end (i.e., a newline).\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'pi'}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} A processing instruction\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'entity'}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Entity}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} A non-SGML external entity.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'start\_subdoc'}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Entity}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The beginning of an SGML subdocument.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'end\_subdoc'}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Entity}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The end of an SGML subdocument.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt 'conforming'}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} [none]\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The document was valid.\vspace{4pt}} \\ \hline \end{tabular}\end{table} For example, if {\tt \$event->type} returns {\tt 'start\_element'}, then {\tt \$event->data} will return an object belonging to the {\tt SGMLS\_Element} class (which will contain a list of attributes, etc. {---} see below), {\tt \$event->file} and {\tt \$event->line} will return the file and line-number in which the element appeared (if you called {\sc sgmls} or {\sc nsgmls} with the {\tt\sl -l\/} flag), and {\tt \$event->element} will return the element currently in force (in this case, the same as {\tt \$event->data}). \section{What do I do with an {\tt SGMLS\_Element}?} \label{SGMLSELEMENT} Altogether, there are six classes in {\sc SGMLS.pm}, each with its own methods: in addition to {\tt SGMLS} (for the parse) and {\tt SGMLS\_Event} (for a specific event), the classes are {\tt SGMLS\_Element}, {\tt SGMLS\_Attribute}, {\tt SGMLS\_Entity}, and {\tt SGMLS\_Notation}. Like all of these, {\tt SGMLS\_Element} has a number of methods available for obtaining different types of information. For example, if you were to use {\footnotesize\begin{verbatim} my $element = $event->data \end{verbatim}} to retrieve the data for a {\tt 'start\_element'} or {\tt 'end\_element'} event, then you could use the methods listed in table \ref{TABLE.CLASS.SGMLS.ELEMENT} to find more information about the element. \begin{table}[htbp] \footnotesize \caption{The {\tt SGMLS\_Element} class} \label{TABLE.CLASS.SGMLS.ELEMENT} \vspace{2ex}\begin{tabular}{l|l|l} \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Method\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return Type\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Description\vspace{4pt}} \\ \hline\hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt name}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The name (or GI), in upper-case.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt parent}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Element}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The parent element, or {\tt ''} if this is the top element.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt attributes}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} HASH\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return a reference to a hash table of {\tt SGMLS\_Attribute} objects, keyed by the attribute names (in upper-case).\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt attribute\_names}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} ARRAY\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} A list of all attribute names for the current element (in upper-case).\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt attribute({\tt\sl aname\/})}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Attribute}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the attribute named ANAME.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt set\_attribute({\tt\sl attribute\/})}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} [none]\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The {\tt\sl attribute\/} argument should be an object belonging to the {\tt SGMLS\_Attribute} class. Add it to the element, replacing any previous attribute with the same name.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt in({\tt\sl name\/})}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Element}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} If the current element's parent is named {\tt\sl name\/}, return the parent; otherwise, return {\tt ''}.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt within({\tt\sl name\/})}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Element}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} If any ancestor of the current element is named {\tt\sl name\/}, return it; otherwise, return {\tt ''}.\vspace{4pt}} \\ \hline \end{tabular}\end{table} \section{What do I do with an {\tt SGMLS\_Attribute}?} \label{SGMLSATTRIBUTE} Note that objects of the {\tt SGMLS\_Attribute} class do not have events in their own right, and are available only through the {\tt attributes} or {\tt attribute({\tt\sl aname\/})} methods for {\tt SGMLS\_Element} objects. An object belonging to the {\tt SGMLS\_Attribute} class will recognise the methods listed in table \ref{TABLE.CLASS.SGMLS.ATTRIBUTE}. \begin{table}[htbp] \footnotesize \caption{The {\tt SGMLS\_Attribute} class} \label{TABLE.CLASS.SGMLS.ATTRIBUTE} \vspace{2ex}\begin{tabular}{l|l|l} \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Method\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return Type\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Description\vspace{4pt}} \\ \hline\hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt name}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The name of the attribute (in upper-case).\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt type}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The type of the attribute: {\tt 'IMPLIED'}, {\tt 'CDATA'}, {\tt 'NOTATION'}, {\tt 'ENTITY'}, or {\tt 'TOKEN'}.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt value}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string, {\tt SGMLS\_Entity}, or {\tt SGMLS\_Notation}.\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The value of the attribute. If the type is {\tt 'CDATA'} or {\tt 'TOKEN'}, it will be a simple string; if it is {\tt 'NOTATION'} it will be an object belonging to the {\tt SGMLS\_Notation} class, and if it is {\tt 'Entity'} it will be an object belonging to the {\tt SGMLS\_Entity} class.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt is\_implied}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} boolean\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return true if the value of the attribute is implied, or false if it has an explicit value.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt set\_type({\tt\sl type\/})}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} [none]\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Provide a new type for the current attribute -- no sanity checking will be performed, so be careful.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt set\_value({\tt\sl value\/})}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} [none]\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Provide a new value for the current attribute -- no sanity checking will be performed, so be careful.\vspace{4pt}} \\ \hline \end{tabular}\end{table} Note that the type {\tt 'TOKEN'} includes both individual tokens and lists of tokens (ie {\tt 'TOKENS'}, {\tt 'IDS'}, or {\tt 'IDREFS'} in the original {\sc SGML} document), so you might need to use the perl function 'split' to break the value string into a list. \section{What do I do with an {\tt SGMLS\_Entity}?} \label{SGMLSENTITY} An {\tt SGMLS\_Entity} object can come in an {\tt 'entity'} event (in which case it is always external), in a {\tt 'start\_subdoc'} or {\tt 'end\_subdoc'} event (in which case it always has the type {\tt 'SUBDOC'}), or as the value of an attribute (in which case it may be internal or external). An object belonging to the {\tt SGMLS\_Entity} class may use the methods listed in table \ref{TABLE.CLASS.SGMLS.ENTITY}. \begin{table}[htbp] \footnotesize \caption{The {\tt SGMLS\_Entity} class} \label{TABLE.CLASS.SGMLS.ENTITY} \vspace{2ex}\begin{tabular}{l|l|l} \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Method\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return Type\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Description\vspace{4pt}} \\ \hline\hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt name}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The entity name.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt type}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The entity type: {\tt 'CDATA'}, {\tt 'SDATA'}, {\tt 'NDATA'}, or {\tt 'SUBDOC'}.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt value}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The entity replacement text (internal entities only).\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt sysid}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The system identifier (external entities only).\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt pubid}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The public identifier (external entities only).\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt filenames}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} ARRAY\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} A list of file names generated from the sysid and pubid (external entities only).\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt notation}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Notation}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The associated notation (external data entities only).\vspace{4pt}} \\ \hline \end{tabular}\end{table} An entity of type {\tt 'SUBDOC'} will have a sysid and pubid, and external data entity will have a sysid, pubid, filenames, and a notation, and an internal data entity will have a value. \section{What do I do with an {\tt SGMLS\_Notation}?} \label{SGMLSNOTATION} The fourth data class is the notation, which is available only as a return value from the {\tt notation} method of an {\tt SGMLS\_Entity} or the {\tt value} method of an {\tt SGMLS\_Attribute} with type {\tt 'NOTATION'}. You can use the notation to decide how to treat non-SGML data (such as graphics). An object belonging to the {\tt SGMLS\_Notation} class will have access to the methods listed in table \ref{TABLE.CLASS.SGMLS.NOTATION}. \begin{table}[htbp] \footnotesize \caption{The {\tt SGMLS\_Notation class}} \label{TABLE.CLASS.SGMLS.NOTATION} \vspace{2ex}\begin{tabular}{l|l|l} \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Method\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return Type\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Description\vspace{4pt}} \\ \hline\hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt name}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The notation's name.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt sysid}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The notation's system identifier.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt pubid}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} The notation's public identifier.\vspace{4pt}} \\ \hline \end{tabular}\end{table} What you do with this information is {\em entirely\/} up to you. \section{Is there any extra information available from the {\sc SGML} document?} \label{XTRAINFO} The {\tt SGMLS} object which you created at the beginning of the parse has several methods available in addition to {\tt next\_event} {---} you will find them all listed in table \ref{TABLE.CLASS.SGMLS.EXTRA}. There should normally be no need to use the {\tt notation} and {\tt entity} methods, since {\sc SGMLS.pm} will look up entities and notations for you automatically as needed. \begin{table}[htbp] \footnotesize \caption{Additional methods for the {\tt SGMLS} class} \label{TABLE.CLASS.SGMLS.EXTRA} \vspace{2ex}\begin{tabular}{l|l|l} \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Method\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return Type\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Description\vspace{4pt}} \\ \hline\hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt next\_event}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Event}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the next event.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt appinfo}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} string\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Return the APPINFO parameter from the {\sc SGML} declaration, if any.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt notation({\tt\sl nname\/})}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Notation}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Look up a notation by name.\vspace{4pt}} \\ \hline \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt entity({\tt\sl ename\/})}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} {\tt SGMLS\_Entity}\vspace{4pt}} & \parbox[c]{1.48333333333333in}{\raggedright\vspace{4pt} Look up an entity by name.\vspace{4pt}} \\ \hline \end{tabular}\end{table} \section{How about a simple example?} \label{EXAMPLE} OK. The following script simply reports its events: {\footnotesize\begin{verbatim} #!/usr/bin/perl use SGMLS; $this_parse = new SGMLS(STDIN); # Read from standard input. while ($this_event = $this_parse->next_event) { my $type = $this_event->type; my $data = $this_event->data; SWITCH: { $type eq 'start_element' && do { print "Beginning element: " . $data->name . "\n"; last SWITCH; }; $type eq 'end_element' && do { print "Ending element: " . $data->name . "\n"; last SWITCH; }; $type eq 'cdata' && do { print "Character data: " . $data . "\n"; last SWITCH; }; $type eq 'sdata' && do { print "Special data: " . $data . "\n"; last SWITCH; }; $type eq 're' && do { print "Record End\n"; last SWITCH; }; $type eq 'pi' && do { print "Processing Instruction: " . $data . "\n"; last SWITCH; }; $type eq 'entity' && do { print "External Data Entity: " . $data->name . " with notation " . $data->notation->name . "\n"; last SWITCH; }; $type eq 'start_subdoc' && do { print "Beginning Subdocument Entity: " . $data->name . "\n"; last SWITCH; }; $type eq 'end_subdoc' && do { print "Ending Subdocument Entity: " . $data->name . "\n"; last SWITCH; }; $type eq 'conforming' && do { print "This is a conforming SGML document\n"; last SWITCH; }; } } \end{verbatim}} To use it under Unix, try something like {\footnotesize\begin{verbatim} sgmls document.sgml | perl sample.pl \end{verbatim}} and watch the output scroll down. \section{How do I design my {\em own\/} classes?} \label{EXTEND} In addition to the methods listed above, all of the classes used in {\sc SGMLS.pm} have an {\tt ext} method which returns a reference to an initially-empty hash table. You are free to use this hash table to store {\em anything\/} you want {---} it should be especially useful if you are building your own, derived classes from the ones provided here. The following example builds a derived class {\tt My\_Element} from the {\tt SGMLS\_Element} class, adding methods to set and get the current font: {\footnotesize\begin{verbatim} use SGMLS; package My_Element; @ISA = qw(SGMLS_Element); sub new { my ($class,$element,$font) = @_; $element->ext->{'font'} = $font; return bless $element; } sub get_font { my ($self) = @_; return $self->ext->{'font'}; } sub set_font { my ($self,$font) = @_; $self->ext->{'font'} = $font; } \end{verbatim}} Note that the derived class does not need to have any knowledge about the underlying structure of the {\tt SGMLS\_Element} class, and need only avoid shadowing any of the methods currently existing there. If you decide to create a derived class from the {\tt SGMLS}, please note that in addition to the methods listed above, that class uses internal methods named {\tt element}, {\tt line}, and {\tt file}, similar to the same methods in {\tt SGMLS\_Event} {---} it is essential that you not shadow these method names. \section{Are there any bugs?} \label{BUGS} Of course! Right now, {\sc SGMLS.pm} silently ignores link attributes ({\sc nsgmls} only) and data attributes, and there may be many other bugs which I have not yet found. \end{document}