Sophie

Sophie

distrib > Mageia > 5 > i586 > by-pkgid > e4b7ea989087cb3ab9e6e72793e02115 > files > 17

apache-poi-manual-3.10.1-3.mga5.noarch.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<!--*** This is a generated file.  Do not edit.  ***-->
<link rel="stylesheet" href="../skin/tigris.css" type="text/css">
<link rel="stylesheet" href="../skin/mysite.css" type="text/css">
<link rel="stylesheet" href="../skin/site.css" type="text/css">
<link media="print" rel="stylesheet" href="../skin/print.css" type="text/css">
<title>POI-HPBF - A Guide to the Publisher File Format</title>
</head>
<body bgcolor="white" class="composite">
<!--================= start Banner ==================-->
<div id="banner">
<table width="100%" cellpadding="8" cellspacing="0" summary="banner" border="0">
<tbody>
<tr>
<!--================= start Group Logo ==================-->
<td width="50%" align="left">
<div class="groupLogo">
<a href="http://poi.apache.org"><img border="0" class="logoImage" alt="Apache POI" src="../resources/images/group-logo.jpg"></a>
</div>
</td>
<!--================= end Group Logo ==================-->
<!--================= start Project Logo ==================--><td width="50%" align="right">
<div align="right" class="projectLogo">
<a href="http://poi.apache.org/"><img border="0" class="logoImage" alt="POI" src="../resources/images/project-logo.jpg"></a>
</div>
</td>
<!--================= end Project Logo ==================-->
</tr>
</tbody>
</table>
</div>
<!--================= end Banner ==================-->
<!--================= start Main ==================-->
<table width="100%" cellpadding="0" cellspacing="0" border="0" summary="nav" id="breadcrumbs">
<tbody>
<!--================= start Status ==================-->
<tr class="status">
<td>
<!--================= start BreadCrumb ==================--><a href="http://www.apache.org/">Apache</a> | <a href="http://poi.apache.org/">POI</a><a href=""></a>
<!--================= end BreadCrumb ==================--></td><td id="tabs">
<!--================= start Tabs ==================-->
<div class="tab">
<span class="selectedTab"><a class="base-selected" href="../index.html">Home</a></span> | <script language="Javascript" type="text/javascript">
function printit() {  
if (window.print) {
    window.print() ;  
} else {
    var WebBrowser = '<OBJECT ID="WebBrowser1" WIDTH="0" HEIGHT="0" CLASSID="CLSID:8856F961-340A-11D0-A96B-00C04FD705A2"></OBJECT>';
document.body.insertAdjacentHTML('beforeEnd', WebBrowser);
    WebBrowser1.ExecWB(6, 2);//Use a 1 vs. a 2 for a prompting dialog box    WebBrowser1.outerHTML = "";  
}
}
</script><script language="Javascript" type="text/javascript">
var NS = (navigator.appName == "Netscape");
var VERSION = parseInt(navigator.appVersion);
if (VERSION > 3) {
    document.write('  <a title="PRINT this page OUT" href="javascript:printit()">PRINT</a>');
}
</script>
</div>
<!--================= end Tabs ==================-->
</td>
</tr>
</tbody>
</table>
<!--================= end Status ==================-->
<table id="main" width="100%" cellpadding="8" cellspacing="0" summary="" border="0">
<tbody>
<tr valign="top">
<!--================= start Menu ==================-->
<td id="leftcol">
<div id="navcolumn">
<div class="menuBar">
<div class="menu">
<span class="menuLabel">Apache POI</span>
        
<div class="menuItem">
<a href="../index.html">Top</a>
</div>
    
</div>
<div class="menu">
<span class="menuLabel">HPBF</span>
        
<div class="menuItem">
<a href="index.html">Overview</a>
</div>
        
<div class="menuItem">
<a href="file-format.xml">File Format</a>
</div>
	
</div>
</div>
</div>
<form target="_blank" action="http://www.google.com/search" method="get">
<table summary="search" border="0" cellspacing="0" cellpadding="0">
<tr>
<td><img height="1" width="1" alt="" src="../skin/images/spacer.gif" class="spacer"></td><td nowrap="nowrap">
                          Search Apache POI<br>
<input value="poi.apache.org" name="sitesearch" type="hidden"><input size="10" name="q" id="query" type="text"><img height="1" width="5" alt="" src="../skin/images/spacer.gif" class="spacer"><input name="Search" value="GO" type="submit"></td><td><img height="1" width="1" alt="" src="../skin/images/spacer.gif" class="spacer"></td>
</tr>
<tr>
<td colspan="3"><img height="7" width="1" alt="" src="../skin/images/spacer.gif" class="spacer"></td>
</tr>
<tr>
<td class="bottom-left-thick"></td><td bgcolor="#a5b6c6"><img height="1" width="1" alt="" src="../skin/images/spacer.gif" class="spacer"></td><td class="bottom-right-thick"></td>
</tr>
</table>
</form>
</td>
<!--================= end Menu ==================-->
<!--================= start Content ==================--><td>
<div id="bodycol">
<div class="app">
<div align="center">
<h1>POI-HPBF - A Guide to the Publisher File Format</h1>
</div>
<div class="h3">
    

    
        
<a name="Document+Streams"></a>
<div class="h3">
<h3>Document Streams</h3>
</div>
		
<p>
		The file is made up of a number of POIFS streams. A typical
        file will be made up as follows:
		</p>

<pre class="code">
Root Entry -
  Objects -
    (no children)
  SummaryInformation &lt;(0x05)SummaryInformation&gt;
  DocumentSummaryInformation &lt;(0x05)DocumentSummaryInformation&gt;
  Escher -
    EscherStm
    EscherDelayStm
  Quill -
    QuillSub -
      CONTENTS
      CompObj &lt;(0x01)CompObj&gt;
  Envelope
  Contents
  Internal &lt;(0x03)Internal&gt;
  CompObj &lt;(0x01)CompObj&gt;
  VBA -
    (no children)
</pre>
		
        
<a name="Changing+Text"></a>
<div class="h3">
<h3>Changing Text</h3>
</div>
		
<p>If you make a change to the text of a file, but not change
          how much text there is, then the <em>CONTENTS</em> stream
          will undergo a small change, and the <em>Contents</em> stream
          will undergo a large change.</p>
        
<p>If you make a change to the text of a file, and change the
          amount of text there is, then both the <em>Contents</em> and
          the <em>CONTENTS</em> streams change.</p>
		
        
<a name="Changing+Shapes"></a>
<div class="h3">
<h3>Changing Shapes</h3>
</div>
        
<p>If you alter the size of a textbox, but make no text changes,
          then both <em>Contents</em> and <em>CONTENTS</em> streams
          change. There are no changes to the Escher streams.</p>
        
<p>If you set the background colour of a textbox, but make
          no changes to the text, (to finish off)</p>
		
        
<a name="Structure+of+CONTENTS"></a>
<div class="h3">
<h3>Structure of CONTENTS</h3>
</div>
        
<p>First we have "CHNKINK ", followed by 24 bytes.</p>
        
<p>Next we have 20 sequences of 24 bytes each. If the first two bytes
         at 0x1800, then that sequence entry exists, but if it's 0x0000 then
         the entry doesn't exist. If it does exist, we then have 4 bytes of
         upper case ASCII text, followed by three little endian shorts.
         The first of these seems to be the count of that type, the second is
         usually 1, the third is usually zero. The we have another 4 bytes of
         upper case ASCII text, normally but not always the same as the first
         text. Finally, we have an unsigned little endian 32 bit offset to
         the start of the data for this, then an unsigned little endian
         32 bit offset of the length of this section.</p>
        
<p>Normally, the first sequence entry is for TEXT, and the text data
         will start at 0x200. After that is normally two or three STSH entries
         (so the first short has values 0, then 1, then 2). After that it
         seems to vary.</p>
        
<p>At 0x200 we have the text, stored as little endian 16 bit unicode.</p>
        
<p>After the text comes all sorts of other stuff, presumably as 
         described by the sequences.</p>
        
<p>For a contents stream of length 7168 / 0x1c00 bytes, the start 
         looks something like:</p>

<pre class="code">
CHNKINK       // "CHNKINK "
04 00 07 00   // Normally 04 00 07 00
13 00 00 03   // Normally ## 00 00 03
00 02 00 00   // Normally 00 ## 00 00
00 1c 00 00   // Normally length of the stream
f8 01 13 00   // Normally f8 01 11/13 00
ff ff ff ff   // Normally seems to be ffffffff

18 00 
TEXT 00 00 01 00 00 00       // TEXT 0 1 0
TEXT 00 02 00 00 d0 03 00 00 // TEXT from: 200 (512), len: 3d0 (976)
18 00 
STSH 00 00 01 00 00 00       // STSH 0 1 0
STSH d0 05 00 00 1e 00 00 00 // STSH from: 5d0 (1488), len: 1e (30)
18 00 
STSH 01 00 01 00 00 00       // STSH 1 1 0
STSH ee 05 00 00 b8 01 00 00 // STSH from: 5ee (1518), len: 1b8 (440)
18 00 
STSH 02 00 01 00 00 00       // STSH 2 1 0
STSH a6 07 00 00 3c 00 00 00 // STSH from: 7a6 (1958), len: 3c (60)
18 00 
FDPP 00 00 01 00 00 00       // FDPP 0 1 0
FDPP 00 08 00 00 00 02 00 00 // FDPP from: 800 (2048), len: 200 (512)
18 00 
FDPC 00 00 01 00 00 00       // FDPC 0 1 0
FDPC 00 0a 00 00 00 02 00 00 // FDPC from: a00 (2560), len: 200 (512)
18 00 
FDPC 01 00 01 00 00 00       // FDPC 1 1 0
FDPC 00 0c 00 00 00 02 00 00 // FDPC from: c00 (3072), len: 200 (512)
18 00 
SYID 00 00 01 00 00 00       // SYID 0 1 0
SYID 00 0e 00 00 20 00 00 00 // SYID from: e00 (3584), len: 20 (32)
18 00 
SGP  00 00 01 00 00 00       // SGP  0 1 0
SGP  20 0e 00 00 0a 00 00 00 // SGP  from: e20 (3616), len: a (10)
18 00 
INK  00 00 01 00 00 00       // INK  0 1 0
INK  2a 0e 00 00 04 00 00 00 // INK  from: e2a (3626), len: 4 (4)
18 00 
BTEP 00 00 01 00 00 00       // BTEP 0 1 0
PLC  2e 0e 00 00 18 00 00 00 // PLC  from: e2e (3630), len: 18 (24)
18 00 
BTEC 00 00 01 00 00 00       // BTEC 0 1 0
PLC  46 0e 00 00 20 00 00 00 // PLC  from: e46 (3654), len: 20 (32)
18 00 
FONT 00 00 01 00 00 00       // FONT 0 1 0
FONT 66 0e 00 00 48 03 00 00 // FONT from: e66 (3686), len: 348 (840)
18 00 
TCD  03 00 01 00 00 00       // TCD  3 1 0
PLC  ae 11 00 00 24 00 00 00 // PLC  from: 11ae (4526), len: 24 (36)
18 00 
TOKN 04 00 01 00 00 00       // TOKN 4 1 0
PLC  d2 11 00 00 0a 01 00 00 // PLC  from: 11d2 (4562), len: 10a (266)
18 00 
TOKN 05 00 01 00 00 00       // TOKN 5 1 0
PLC  dc 12 00 00 2a 01 00 00 // PLC  from: 12dc (4828), len: 12a (298)
18 00 
STRS 00 00 01 00 00 00       // STRS 0 1 0
PLC  06 14 00 00 46 00 00 00 // PLC  from: 1406 (5126), len: 46 (70)
18 00 
MCLD 00 00 01 00 00 00       // MCLD 0 1 0
MCLD 4c 14 00 00 16 06 00 00 // MCLD from: 144c (5196), len: 616 (1558)
18 00 
PL   00 00 01 00 00 00       // PL   0 1 0
PL   62 1a 00 00 48 00 00 00 // PL   from: 1a62 (6754), len: 48 (72)
00 00                        // Blank entry follows
00 00 00 00 00 00
00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00

(the text will then start)
</pre>
		
<p>We think that the first 4 bytes of text describes the
		 the function of the data at the offset. The first short is
		 then the count of that type, eg the 2nd will have 1. We
		 think that the second 4 bytes of text describes the format
		 of data block at the offset. The format of the text block
		 is easy, but we're still trying to figure out the others.</p>

           
<a name="Structure+of+TEXT+bit"></a>
<div class="h4">
<h4>Structure of TEXT bit</h4>
</div>
           
<p>This is very simple. All the text for the document is
            stored in a single bit of the Quill CONTENTS. The text
            is stored as little endian 16 bit unicode strings.</p>
           
           
<a name="Structure+of+PLC+bit"></a>
<div class="h4">
<h4>Structure of PLC bit</h4>
</div>
           
<p>The first four bytes seem to hold the count of the 
            entries in the bit, and the second four bytes seem to hold
            the type. There is then some pre-data, and then data for
            each of the entries, the exact format dependant on the type.</p>
           
<p>Type 0 has 4 2 byte unsigned ints, then a pair of 2 byte
            unsigned ints for each entry.</p>
           
<p>Type 4 has 4 2 byte unsigned ints, then a pair of 4 byte
            unsigned ints for each entry.</p>
           
<p>Type 8 has 7 2 byte unsigned ints, then a pair of 4 byte
            unsigned ints for each entry.</p>
           
<p>Type 12 holds hyperlinks, and is very much more complex.
            See <span class="codefrag">org.apache.poi.hpbf.model.qcbits.QCPLCBit</span>
            for our best guess as to how the contents match up.</p>
           
		
	

<div id="authors" align="right">by&nbsp;Nick Burch</div>
</div>
</div>
</div>
</td>
<!--================= end Content ==================-->
</tr>
</tbody>
</table>
<!--================= end Main ==================-->
<!--================= start Footer ==================-->
<div id="footer">
<table summary="footer" cellspacing="0" cellpadding="4" width="100%" border="0">
<tbody>
<tr>
<!--================= start Copyright ==================-->
<td colspan="2">
<div align="center">
<div class="copyright">
              Copyright &copy; 2002-2012&nbsp;The Apache Software Foundation. All rights reserved.<br>
              Apache POI, POI, Apache, the Apache feather logo, and the Apache 
              POI project logo are trademarks of The Apache Software Foundation.
            </div>
</div>
</td>
<!--================= end Copyright ==================-->
</tr>
<tr>
<td align="left">
<!--================= start Host ==================-->
<!--================= end Host ==================--></td><td align="right">
<!--================= start Credits ==================-->
<div align="right">
<div class="credit"></div>
</div>
<!--================= end Credits ==================-->
</td>
</tr>
</tbody>
</table>
</div>
<!--================= end Footer ==================-->
</body>
</html>