<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/TR/html4/loose.dtd"> <HTML ><HEAD ><TITLE >Automatic Indexing with the DocBook DSSSL Stylesheets</TITLE ><META NAME="GENERATOR" CONTENT="Modular DocBook HTML Stylesheet Version 1.79"></HEAD ><BODY CLASS="ARTICLE" BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#840084" ALINK="#0000FF" ><DIV CLASS="ARTICLE" ><DIV CLASS="TITLEPAGE" ><H1 CLASS="TITLE" ><A NAME="AEN2" >Automatic Indexing with the DocBook DSSSL Stylesheets</A ></H1 ><H3 CLASS="AUTHOR" ><A NAME="AEN4" >Norman Walsh</A ></H3 ><P CLASS="PUBDATE" >17 Nov 1998<BR></P ><DIV ><DIV CLASS="ABSTRACT" ><P ></P ><A NAME="AEN8" ></A ><P >Automatic indexing is an often requested feature. This article describes how it is implemented in the DocBook DSSSL Stylesheets.</P ><P ></P ></DIV ></DIV ><HR></DIV ><DIV CLASS="SECT1" ><H1 CLASS="SECT1" ><A NAME="AEN10" >Authoring for Indexing</A ></H1 ><P >There are two parts to building an index automatically, creating the index terms and incorporating the generated index into your document.</P ><DIV CLASS="SECT2" ><H2 CLASS="SECT2" ><A NAME="AEN13" >Creating Index Terms</A ></H2 ><P >The generated index is constructed from <CODE CLASS="SGMLTAG" >IndexTerm</CODE >s in your document. DocBook <CODE CLASS="SGMLTAG" >IndexTerm</CODE >s are not part of the flow.</P ><PRE CLASS="SCREEN" ><para> This paragraph contains an interesting thing<indexterm id="thing"> <primary>thing</primary><secondary>interesting</secondary></indexterm> that will appear in the index. </para></PRE ><P >It is not absolutely necessary to provide an ID for each index term, but the performance of the print backends may degrade significantly if you have a large number of index terms that do not have IDs.</P ></DIV ><DIV CLASS="SECT2" ><H2 CLASS="SECT2" ><A NAME="AEN20" >Incorporating the Index</A ></H2 ><P >The index will be generated as a separate file. You must arrage to have this file incorporated into your document. The easiest way to do this is by file entity reference. At the top of your document, add an internal subset that defines the index file entity:</P ><PRE CLASS="SCREEN" ><!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [ <!ENTITY genindex.sgm SYSTEM "genindex.sgm"> ]> <book> ... &genindex.sgm; <!-- Put this after the end tag of the last chapter or appendix, or --> <!-- wherever you want the index to appear. It must be a valid location --> <!-- for an index. --> </book></PRE ><P >Before you can process this document, you must make sure that <TT CLASS="FILENAME" >genindex.sgm</TT > exists. This is a chicken and egg problem, but it can be solved with the <B CLASS="COMMAND" >collateindex.pl</B > command:</P ><PRE CLASS="SCREEN" >perl collateindex.pl -N -o genindex.sgm</PRE ><P >The <CODE CLASS="OPTION" >-N</CODE > option creates a new index; <CODE CLASS="OPTION" >-o</CODE > indentifies the name of the output file. This name must be the same as the name you specified in the internal subset.</P ></DIV ></DIV ><DIV CLASS="SECT1" ><H1 CLASS="SECT1" ><A NAME="AEN31" >Creating an Index</A ></H1 ><P >Creating an index is a multi-step, two-pass process:</P ><DIV CLASS="PROCEDURE" ><OL TYPE="1" ><LI CLASS="STEP" ><P >In order to create an index, you must first generate the raw index data. This is done with the HTML Stylesheet (<SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >even if you want print output</I ></SPAN >).</P ><P >Process your document with <B CLASS="COMMAND" >jade</B > using the HTML Stylesheet with the <CODE CLASS="OPTION" >-V html-index</CODE > option:</P ><PRE CLASS="SCREEN" >jade -t sgml -d <TT CLASS="REPLACEABLE" ><I >html/docbook.dsl</I ></TT > -V html-index <TT CLASS="REPLACEABLE" ><I >yourdocument.sgm</I ></TT ></PRE ><P >This will produce a file called <TT CLASS="FILENAME" >HTML.index</TT > that contains raw index data.</P ><P >If you're planning to generate your final document as a single HTML file using the <CODE CLASS="OPTION" >nochunks</CODE > option, make sure you generate the <TT CLASS="FILENAME" >HTML.index</TT > file with that option as well:</P ><PRE CLASS="SCREEN" >jade -t sgml -d <TT CLASS="REPLACEABLE" ><I >html/docbook.dsl</I ></TT > -V html-index -V nochunks <TT CLASS="REPLACEABLE" ><I >yourdocument.sgm</I ></TT ></PRE ></LI ><LI CLASS="STEP" ><P >Generate an index document with <B CLASS="COMMAND" >collateindex.pl</B >:</P ><PRE CLASS="SCREEN" >perl collateindex.pl -o genindex.sgm HTML.index</PRE ><P >There are a multitude of options to <B CLASS="COMMAND" >collateindex.pl</B >; see <A HREF="collateindex.html" TARGET="_top" >the reference page</A > for more information.</P ></LI ><LI CLASS="STEP" ><P >Process your original document again, using whichever stylesheet is appropriate. The new document will contain the generated index.</P ></LI ></OL ></DIV ></DIV ><DIV CLASS="SECT1" ><H1 CLASS="SECT1" ><A NAME="AEN61" >Drawbacks</A ></H1 ><P >Any generated index is perhaps better than none, but there are still a few things that <SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >cannot</I ></SPAN > be accomplished:</P ><P ></P ><OL TYPE="1" ><LI ><P >Duplicate page numbers are not suppressed in the index. If the document contains three indexing hits on page 4, the generated index will contain “4, 4, 4”.</P ></LI ><LI ><P >Ranges are not automatically constructed. If the document contains indexing hits on pages 4, 5, 6, and 7, the generated index will contain “4, 5, 6, 7” instead of “4–7”.</P ></LI ></OL ><P >It is possible that the TeX backend could be made smart enough to do these things automatically. (Sebastian will probably kill me for suggesting that). For the RTF backend, at least in MS Word, it's probably possible to write a WordBasic macro that would automatically fix the index. (If someone does, please pass it along).</P ></DIV ></DIV ></BODY ></HTML >