%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % tex4ht_doc.tex 2008-02-12-09:30 % % Copyright (C) 2005, 2008 Kapil H. Paranjape % % % % This work may be distributed and/or modified under the % % conditions of the General Public License, either % % version 2 of this license or (at your option) any % % later version. The latest version of this license is % % in % % http://www.gnu.org/gpl.txt % % and version 2 or later is part of all distributions % % of Debian. % % % % This Current Maintainer of this work % % is Kapil H. Paranjape. % % % % If you modify this work your changing its signature % % with a directive of the following form will be % % appreciated. % % kapil@imsc.res.in % % http://www.imsc.res.in/~kapil % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \documentclass{amsart} \usepackage{hyperref} \begin{document} \title{A brief introduction to TeX4ht} \author{Kapil Hari Paranjape} \maketitle \section{What do we have here?} What follows is a brief introduction to the TeX4ht system designed and currently maintained by Eitan M. Gurari. The source for this document is in the file \verb|tex4ht_doc.tex| and can be processed using the command \verb|htlatex tex4ht_doc.tex| as explained below. It is hoped that such processing will prove instructive as well. \section{Executive summary} TeX4ht is a system to convert TeX input into hypertext documents of different kinds. TeX4ht operates on input that is ``standard'' \TeX\ or \LaTeX (but please check the last section for some differences). This input is processed by \verb|tex| in the usual way except that certain additional macros are loaded which create some hooks in the output that can be used to produce the hypertext. The output is then post-processed by the program \verb|tex4ht| which produces the hypertext. Auxiliary files such as \verb|.css| files and image files are produced by the program \verb|t4ht|. Usage is simplified via the Perl script \verb|mk4ht| which can be called directly to combine the above operations transparently. For example the source of this document can be processed using \begin{verbatim} mk4ht htlatex tex4ht_doc.tex \end{verbatim} This will produce \verb|tex4ht_doc.html| and some supplementary files which is the HTML version of this documentation. Similarly, \begin{verbatim} mk4ht xhmlatex tex4ht_doc.tex \end{verbatim} will produce the XML version with MATH-ML and \begin{verbatim} mk4ht mzlatex tex4ht_doc.tex \end{verbatim} will produce MATH-ML which uses fonts that are rendered well via the ``Gecko'' engine of \verb|mozilla|. Additional such commands are \begin{verbatim} mk4ht oolatex tex4ht_doc.tex \end{verbatim} to a format that can be read by \verb|OpenOffice| and \begin{verbatim} mk4ht dblatex tex4ht_doc.tex \end{verbatim} for DocBook and \begin{verbatim} mk4ht teilatex tex4ht_doc.tex \end{verbatim} for TEI format XML output. The broad structure of the \verb|mk4ht| command-line is \begin{verbatim} mk4ht #1 #2 #3 #4 #5 \end{verbatim} The first argument is the type of conversion required. Using \verb|mk4ht| without arguments lists the conversions available. The second argument is the name of the file that is to be processed. The third, fourth and fifth arguments are optional and are described is some detail below. The rest of this document introduces the system in a little more detail. See\ \cite{authdoc} and \cite{website} for authoritative information. In the first following section (Section~\ref{style}) we examine the options for modifying the way in which \TeX\ processes the source; specifically these can be thought of as options for the macros in \verb|tex4ht.sty|. The next section (Section~\ref{postproc}) deals with the post-processing that converts \TeX's output into hypertext. The final section (Section~\ref{supple}) shows how one can change the way the system generates the supplementary files like images and style-sheets for the hypertext output. This document is assumes that the reader has some familiarity with the \TeX\ and \LaTeX\ systems; see \cite{tex} and \cite{latex} for more information. \section{Options for Styles}\label{style} Options for \TeX\ and \LaTeX\ processing can be added as the first optional argument (\verb|#3| above) to the \verb|mk4ht| command. For example, the command \begin{verbatim} mk4ht xhmlatex tex4ht_doc.tex \end{verbatim} is in fact similar\footnote{The differences lie in the font files chosen as described in section~\ref{postproc}} to the command \begin{verbatim} mk4ht htlatex tex4ht_doc.tex "xhtml,mathml" \end{verbatim} Similarly, \begin{verbatim} mk4ht oolatex tex4ht_doc.tex \end{verbatim} is in fact similar to the command \begin{verbatim} mk4ht htlatex tex4ht_doc.tex "xhtml,ooffice" \end{verbatim} In most cases this list of options begins with \verb|html| or \verb|xhtml|. Additional options available can be found by searching for the string \verb|--- Note ---| at the start of a line in the resulting log file. For example \begin{verbatim} mk4ht htlatex tex4ht_doc.tex grep -A 1 '^--- Note ---' tex4ht_doc.log \end{verbatim} will list all the available options for \verb|html| conversion. When this list of options does not start with \verb|html| or \verb|xhtml| then the system looks for a file with the name given by the first option and the \verb|.cfg| extension. The simplest use of this feature is as follows. Create a file called \verb|bgimage.cfg| containing the lines \begin{verbatim} \Preamble{html} \begin{document} \Css{BODY { background-image : url(background.png); }} \EndPreamble \end{verbatim} After this \begin{verbatim} mk4ht htlatex tex4ht_doc.tex "bgimage" \end{verbatim} will add an additional line to \verb|tex4ht_doc.css| incorporating the image \verb|background.png|. See the main documentation \cite{authdoc} for more details on creating configuration files. \section{Post processing}\label{postproc} The optional arguments \verb|#4| and \verb|#5| refer to options for the \verb|tex4ht| and \verb|t4ht| commands respectively. Both these commands make use of the configuration file \verb|tex4ht.env| (which may be over-ridden by \verb|.tex4ht| in the current directory or the user's home directory). This configuration file is called the ``environment file'' in the main documentation \cite{authdoc} in order to avoid confusing it with the configuration file described in the previous section. The program \verb|tex4ht| has to look for ``font descriptions'' that describe how various non-standard glyphs are to be ``rendered'' in hypertext. The TeX4ht system provides a number of possibilities like using Unicode or fonts suited to the Gecko engine of the Mozilla browser and so on. So the command \begin{verbatim} mk4ht mzlatex tex4ht_doc.tex \end{verbatim} is almost\footnote{There is an additional option as explained in section~\ref{supple} below.} equivalent to \begin{verbatim} mk4ht htlatex tex4ht_doc.tex "xhtml,mozilla" "-cmozhtf" \end{verbatim} The \verb|-c<tagname>| option for \verb|tex4ht| picks up the tagged section from the \verb|tex4ht.env| environment file. Any other command-line option of \verb|tex4ht| can also be used as part of \verb|#4| which is just a space separated list of options for this command. \section{Creating Supplementary Files}\label{supple} The final step of conversion is the creation of supplementary files like image files for formulae and equations like \[ \frac{x^n-1}{x-1} = \sum_{i=0}^{n-1} x^i \] which is the rendering of the \LaTeX\ input string \begin{verbatim} \[ \frac{x^n-1}{x-1} = \sum_{i=0}^{n-1} x^i \] \end{verbatim} In most cases such \TeX\ constructions can only be rendered as images. The \verb|tex4ht| program creates a series of instructions for the \verb|t4ht| program in a \verb|.lg| file. The latter carries out these instructions by making use of external programs like \verb|dvipng| or \verb|convert| to create these images. The most useful option in the argument list \verb|#5| is \verb|-p| which prevents images from being generated. Another useful option is \verb|-cvalidate| which causes the net output to be validated using an external validation program such as \verb|xmllint|. All the options in the argument list \verb|#5| are passed on \verb|t4ht|. \section{Some difference between TeX4ht and TeX} We document some differences between the systems. For more up-to-date information please see the author's documentation\cite{authdoc}. \subsection{Regarding filenames} In short, do {\em not} use special characters in your filenames; ideally stick with filenames which are composed of standard ASCII alphanumerics wherever possible. Some explanations follow. \TeX\ nowadays accepts files with names that contain all manner of characters and so it is natural to imagine that TeX4ht will do so to. However, one has to be concerned with the filenames used in output as well as those used for input. Since the latter will appear in URL's that will appear within the hypertext using special characters will cause hyperlinks to break. Thus TeX4ht does not currently behave well if special characters are used in input file names. \subsection{Extra braces required} In short, when in doubt enclosed sub- and super- scripts in braces if they are longer than a single character. In this respect the syntax of the TeX language that is accepted by TeX4ht is stricter than that accepted by \TeX\ and \LaTeX. \begin{thebibliography}{00} \bibitem[1]{authdoc} \url{http://www.cse.ohio-state.edu/~gurari/mn.html} The authoritative documentation maintained by Eitan M. Gurari. \bibitem[2]{website} \url{http://www.cse.ohio-state.edu/~gurari} Eitan M.~Gurari's web page that discusses related projects. \bibitem[3]{tex} \url{http://www.tug.org/} The \TeX\ User's group primary web site. \bibitem[4]{latex} \url{http://www.latex-project.org/} The \LaTeX\ project's primary web site. \end{thebibliography} \end{document}