Sophie: xmlppm-0.95-2mdk i586

xmlppm-0.95-2mdk.i586.rpm

XMLPPM 0.95 README 

James Cheney 11/30/2000

ABOUT XMLPPM

This directory contains version 0.95 of XMLPPM, an XML-specific compressor.
XMLPPM reads well-formed XML text from standard input, compresses
it, and sends the compressed bits to standard output.  The companion
decompressor, XMLUNPPM, restores the text version of the XML data from
the compressed bits.  (Actually, the restored version might be slightly
different, for example, some whitespace might be stripped).

XMLPPM is *experimental*.  I do *not* recommend that you use XMLPPM to
archive important files, as XMLPPM is not fully tested and future 
versions of XMLPPM may not be compatible with this initial version.
This version is being made available for research purposes.



COPYRIGHT and LICENSE TERMS

Portions of the XMLPPM source code are based on Alistair Moffat's
arithmetic coding sources and Bill Teahan's sources for the PPMD+
text compressor, used with permission.  Those files are copyright their
respective authors as described in the source files.  The rest of the
source code is copyright James Cheney, November 2000.

This code (or whatever portions of it I speak for) is covered by the
Gnu Public License.


INSTALLATION

This is the XMLPPM source code distribution, so to use XMLPPM you need
to compile the sources.  XMLPPM uses version 1.95 of the "expat" XML
parser, and so you need to get and install the development version of that
parser before you can compile XMLPPM.  In the future, if there is demand,
I may make statically linked binaries available for selected platforms.

Expat (and the installation instructions whereof) is available at:
http://expat.sourceforge.net/

Once you have installed expat, go to src/xmlppm-0.95 (or wherever you
installed the XMLPPM sources) and do:

make all

This should create two binary files, xmlppm and xmlunppm.

Because XMLPPM is still undergoing development, I don't recommend
performing further installation steps like putting xmlppm in /usr/bin, 
because then other users of your machine might think it's a "real"
(i.e. fully tested) utility.

USING XMLPPM

XMLPPM and its companion decompressor XMLUNPPM are command-line driven
and interact only with stdin and stdout.  Also, XMLPPM only reads and
compresses XML text files.  What counts as an XML text file actually
depends on the underlying XML parser, expat; if expat does not know how
to parse a document, XMLPPM will print expat's error message and quit.
If XMLPPM won't compress your document, it's most likely due to a problem
in expat, not in XMLPPM, so I may not be able to do anything about it.


Supposing you do have an XML file that expat likes, to compress it do:

./xmlppm < doc.xml > doc.xppm

You can of course call the compressed file anything you like, but I'm
planning on making xppm the depault extension (xpm already being taken).

To expand the compressed document, do:

./xmlunppm < doc.xppm > doc.new.xml

(I don't recommend that you overwrite the original document).

BUGS

As far as I know, XMLPPM works on all XML documents.  I have tested
it on a wide variety of XML documents, and found and fixed many bugs.
It's likely that there are still some in there.

XMLPPM doesn't compress the XML text directly, but rather the SAX events
generated by expat as it parses.  This makes XMLPPM slightly lossy
in that some information such as exact whitespace is not reported in
these events, in particular in internal DTDs.  

Also, XMLPPM runs into problems with entities.  Currently, XMLPPM 
conservatively replaces all occurrences of reserved characters such as 
&, ;, and < with their predefined entity references.  This may change 
your document in an essential way.  Be warned!

TO DO

* Fix the above entity bug so that the XML text is preserved as exactly
as possible

* Factor the XMLUNPPM component into an "event decoder" that decodes the
compressed event stream and calls SAX event handlers, and a "printing"
event handler.

* Port to other XML parsing libraries

* Add the capability to directly compress/decompress XML stored in memory
as DOM trees

* Generally nicen up the code -- it's pretty messy and ugly

CONTACT 

James Cheney, jcheney@cs.cornell.edu