Sophie

Sophie

distrib > Mandriva > 9.1 > ppc > media > contrib > by-pkgid > 25ca84c110011f96886ab6e436285790 > files > 47

catdoc-0.91.5-3mdk.ppc.rpm

CATDOC version 0.90

CATDOC is program which reads MS-Word file and prints readable 
ASCII text to stdout, just like Unix cat command.
It also able to produce correct escape sequences if some UNICODE
charachers have to be represented specially in your typesetting system
such as (La)TeX.

This is completely new version of catdoc, rewritten from scratch. 
It features runtime configuration, proper charset handling,
user-definable output formats and support
for Word97 files, which contain UNICODE internally.

catdoc doesn't even attempt to parse OLE streams. It is too complicated
and doesn't worth effort, becouse Microsoft would change everything when
new version of MS-Word comes out.

This rough approach inevitable results in some garbage in output file,
especially near the end of file and if file contains embedded OLE objects,
such as pictures or equations.

So, if you are looking for purely authomatic way to convert Word to LaTeX,
you can better investigate word2x, mswordview or LAOLA.


Catdoc is distributed under GNU Public License version 2 or above.


Your bug reports and suggestions are welcome.

There is also major work to do - define correct TeX commands
for accented latin letters into tex.specchars file and commands
for mathematical symbols (unicode 20xx-25xx). 


Contributions are welcome.

See files INSTALL and INSTALL.dos for information about  compiling and
installing catdoc.

Catdoc is documented in its UNIX-style manual page. For those who don't
have man command (i.e. MS-DOS users) plain text and postscript versions
of manual are provided in doc directory
                    Victor Wagner <vitus@ice.ru>