Sophie

Sophie

distrib > Fedora > 15 > i386 > by-pkgid > 1efa0e430d87d29888f7bec1d7cd13be > files > 35

linuxdoc-tools-0.9.66-9.fc15.i686.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.66">
 <TITLE>LinuxDoc-Tools User's Guide: How LinuxDoc-Tools Works</TITLE>
 <LINK HREF="guide-5.html" REL=previous>
 <LINK HREF="guide.html#toc6" REL=contents>
</HEAD>
<BODY>
<IMG SRC="next.png" ALT="Next">
<A HREF="guide-5.html"><IMG SRC="prev.png" ALT="Previous"></A>
<A HREF="guide.html#toc6"><IMG SRC="toc.png" ALT="Contents"></A>
<HR>
<H2><A NAME="sgml"></A> <A NAME="s6">6.</A> <A HREF="guide.html#toc6">How LinuxDoc-Tools Works</A></H2>

<P>Technically, the tags and conventions we've explored in previous
sections of this use's guide are what is called a <EM>markup
language</EM> -- a way to embed formatting information in a document
so that programs can do useful things with it.  HTML, Tex, and Unix
manual-page macros are well-known examples of markup languages.</P>

<H2><A NAME="ss6.1">6.1</A> <A HREF="guide.html#toc6.1">Overview of SGML</A>
</H2>

<P>LinuxDoc-Tools uses a way of describing markup languages called SGML
(Standard Generalized Markup Language).  SGML itself doesn't describe 
a markup language; rather, it's a language for writing specifications 
for markup languages.  The reason SGML is useful is that an SGML markup 
specification for a language can be used to generate programs that 
``know'' that language with much less effort (and a much lower bugginess 
rate!) than if they had to be coded by hand.</P>
<P>In SGML jargon, a markup language specification is called a ``DTD''
(Document Type Definition).  A DTD allows you to specify the
<EM>structure</EM> of a kind of document; that is, what parts, in what
order, make up a document of that kind.  Given a DTD, an SGML parser
can check a document for correctness.  An SGML-parser/DTD combination
can also make it easy to write programs that translate that structure
into another markup language -- and this is exactly how LinuxDoc-Tools
actually works.</P>
<P>LinuxDoc-Tools provides a SGML DTD called ``linuxdoc'' and a set of
``replacement files'' which convert the linuxdoc documents to groff,
LaTeX, HTML, GNU info, LyX, and RTF source.  This is why the example 
document has a magic cookie at the top of it that says ``linuxdoc system'';
that is how one tells an SGML parser what DTD to use.</P>
<P>Actually, LinuxDoc-Tools provides a couple of closely related DTDs.  But
the ones other than linuxdoc are still experimental, and you probably
do not want to try working with them unless you are an LinuxDoc-Tools guru.</P>
<P>If you are an SGML guru, you may find it interesting to know that the 
LinuxDoc-Tools DTDs are based heavily on the QWERTZ DTD by Tom Gordon,
<CODE>thomas.gordon@gmd.de</CODE>.</P>
<P>If you are not an SGML guru, you may not know that HTML (the markup
language used on the World Wide Web) is itself defined by a DTD.</P>

<H2><A NAME="ss6.2">6.2</A> <A HREF="guide.html#toc6.2">How SGML Works</A>
</H2>

<P>An SGML DTD like linuxdoc specifies the names of ``elements'' within a
document type.  An element is just a bit of structure; like a
section, a subsection, a paragraph, or even something smaller like
<EM>emphasized text</EM>.</P>
<P>Unlike in LaTeX, however, these elements are not in any way intrinsic to
SGML itself.  The linuxdoc DTD happens to define elements that look a
lot like their LaTeX counterparts---you have sections, subsections,
verbatim ``environments'', and so forth.  However, using SGML you can
define any kind of structure for the document that you like.  In a
way, SGML is like low-level TeX, while the linuxdoc DTD is like LaTeX.</P>
<P>Don't be confused by this analogy.  SGML is <EM>not</EM> a text-formatting system.
There is no ``SGML formatter'' per se.  SGML source is <EM>only</EM> converted
to other formats for processing.  Furthermore, SGML itself is used only to 
specify the document structure.  There are no text-formatting facilities or
``macros'' intrinsic to SGML itself.  All of those things are defined within
the DTD.  You can't use SGML without a DTD, a DTD defines what SGML does.</P>

<H2><A NAME="ss6.3">6.3</A> <A HREF="guide.html#toc6.3">What Happens When LinuxDoc-Tools Processes A Document</A>
</H2>

<P>Here's how processing a document with LinuxDoc-Tools works.  First, you
need a DTD, which sets up the structure of the document.  A small
portion of the normal (linuxdoc) DTD looks like this:</P>
<P>
<BLOCKQUOTE><CODE>
<PRE>
&lt;!element article - -
    (titlepag, header?, 
     toc?, lof?, lot?, p*, sect*, 
     (appendix, sect+)?, biblio?) +(footnote)>
</PRE>
</CODE></BLOCKQUOTE>
</P>
<P>This part sets up the overall structure for an ``article'', which is like
a ``documentstyle'' within LaTeX.  The article consists of a titlepage
(<CODE>titlepag</CODE>), an optional header (<CODE>header</CODE>), an optional table of 
contents (<CODE>toc</CODE>), optional lists of figures (<CODE>lof</CODE>) and tables
(<CODE>lot</CODE>), any number of paragraphs (<CODE>p</CODE>), any number of top-level
sections (<CODE>sect</CODE>), optional appendices (<CODE>appendix</CODE>), an optional
bibliography (<CODE>biblio</CODE>) and footnotes (<CODE>footnote</CODE>).  </P>
<P>As you can see, the DTD doesn't say anything about how the document should
be formatted or what it should look like.  It just defines what parts make
up the document.  Elsewhere in the DTD the structure of the 
<CODE>titlepag</CODE>, <CODE>header</CODE>, <CODE>sect</CODE>, and other elements are defined.  </P>
<P>You don't need to know anything about the syntax of the DTD in order
to write documents.  We're just presenting it here so you know what it
looks like and what it does.  You <EM>do</EM> need to be familiar with the
document <EM>structure</EM> that the DTD defines.  If not, you might
violate the structure when attempting to write a document, and be very
confused about the resulting error messages.</P>
<P>The next step is to write a document using the structure defined by
the DTD.  Again, the linuxdoc DTD makes documents look a lot like
LaTeX or HTML -- it's very easy to follow.  In SGML jargon a single
document written using a particular DTD is known as an ``instance'' of
that DTD.</P>
<P>In order to translate the SGML source into another format (such as LaTeX
or groff) for processing, the SGML source (the document that you wrote)
is <EM>parsed</EM> along with the DTD by the SGML <EM>parser</EM>. LinuxDoc-Tools 
uses the <CODE>onsgmls</CODE> parser in OpenJade, or <CODE>nsgmls</CODE> parser in Jade.
The former is the successor of the latter. <CODE>sgmls</CODE> parser was written 
by James Clark, <CODE>jjc@jclark.com</CODE>, who also happens to be the author 
of <CODE>groff</CODE>.  We're in good hands.
The parser (<CODE>onsgmls</CODE> or <CODE>nsgmls</CODE>) simply picks through your document
and verifies that it follows the structure set forth by the DTD.  
It also spits out a more explicit form of your document, with all 
``macros'' and elements expanded, which is understood by <CODE>sgmlsasp</CODE>, 
the next part of the process.  </P>
<P><CODE>sgmlsasp</CODE> is responsible for converting the output of <CODE>sgmls</CODE> to
another format (such as LaTeX).  It does this using <EM>replacement files</EM>,
which describe how to convert elements in the original SGML document into
corresponding source in the ``target'' format (such as LaTeX or groff).  </P>
<P>For example, part of the replacement file for LaTeX looks like:
<BLOCKQUOTE><CODE>
<PRE>
&lt;itemize>    +    "\\begin{itemize}   +
&lt;/itemize>   +    "\\end{itemize}    +
</PRE>
</CODE></BLOCKQUOTE>

Which says that whenever you begin an <CODE>itemize</CODE> element in the 
SGML source, it should be replaced with 
<BLOCKQUOTE><CODE>
<PRE>
\begin{itemize}
</PRE>
</CODE></BLOCKQUOTE>

in the LaTeX source.  (As I said, elements in the DTD
are very similar to their LaTeX counterparts).  </P>
<P>So, to convert the SGML to another format, all you have to do is write
a new replacement file for that format that gives the appropriate 
analogies to the SGML elements in that new format.  In practice, it's not
that simple---for example, if you're trying to convert to a format that
isn't structured at all like your DTD, you're going to have trouble.  In 
any case, it's much easier to do than writing individual parsers and
translators for many kinds of output formats; SGML provides a generalized
system for converting one source to many formats.</P>
<P>Once <CODE>sgmlsasp</CODE> has completed its work, you have LaTeX source which
corresponds to your original SGML document, which you can format using
LaTeX as you normally would.</P>

<H2><A NAME="ss6.4">6.4</A> <A HREF="guide.html#toc6.4">Further Information</A>
</H2>

<P>
<UL>
<LI>The QWERTZ User's Guide is available from 
<CODE>
<A HREF="ftp://ftp.cs.cornell.edu/pub/mdw/SGML">ftp://ftp.cs.cornell.edu/pub/mdw/SGML</A></CODE>.
QWERTZ (and hence, LinuxDoc-Tools) supports many features such as 
mathematical formulae, tables, figures, and so forth.
If you'd like to write general 
documentation in SGML, I suggest using the original QWERTZ DTD instead 
of the hacked-up linuxdoc DTD, which I've modified for use 
particularly by the Linux HOWTOs and other such documentation.  
</LI>
<LI>Tom Gordon's original QWERTZ tools can be found at 
<CODE>
<A HREF="ftp://ftp.gmd.de/GMD/sgml">ftp://ftp.gmd.de/GMD/sgml</A></CODE>.
</LI>
<LI>More information on SGML can be found at the following WWW 
pages: 
<OL>
<LI><CODE>
<A HREF="http://www.w3.org/hypertext/WWW/MarkUp/SGML/">SGML and the Web</A></CODE></LI>
<LI><CODE>
<A HREF="http://www.sil.org/sgml/sgml.html">SGML Web Page</A></CODE></LI>
<LI><CODE>
<A HREF="http://www.yahoo.com/Computers_and_Internet/Software/Data_Formats/SGML">Yahoo's SGML Page</A></CODE></LI>
</OL>

</LI>
<LI>James Clark's <CODE>sgmls</CODE> parser, and it's successor <CODE>nsgmls</CODE>
and other tools can be found at
<CODE>
<A HREF="ftp://ftp.jclark.com">ftp://ftp.jclark.com</A></CODE> and at <CODE>
<A HREF="http://www.jclark.com">James Clark's WWW Page</A></CODE>.
</LI>
<LI>The emacs psgml package can be found at
<CODE>
<A HREF="ftp://ftp.lysator.liu.se/pub/sgml">ftp://ftp.lysator.liu.se/pub/sgml</A></CODE>.  This package
provides a lot of SGML functionality.
</LI>
<LI>More information on <CODE>LyX</CODE> can be found at the
<CODE>
<A HREF="http://wsiserv.informatik.uni-tuebingen.de/~ettrich/">LyX WWW Page</A></CODE>.  <CODE>LyX</CODE> is a high-level word processor 
frontend to LaTeX.  Quasi-WYSIWYG interface, many LaTeX styles and 
layouts automatically generated.  Speeds up learning LaTeX and makes 
complicated layouts easy and intuitive.
</LI>
</UL>
</P>
<HR>
<IMG SRC="next.png" ALT="Next">
<A HREF="guide-5.html"><IMG SRC="prev.png" ALT="Previous"></A>
<A HREF="guide.html#toc6"><IMG SRC="toc.png" ALT="Contents"></A>
</BODY>
</HTML>