Sophie

Sophie

distrib > Mageia > 5 > i586 > by-pkgid > e4b7ea989087cb3ab9e6e72793e02115 > files > 91

apache-poi-manual-3.10.1-3.mga5.noarch.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<!--*** This is a generated file.  Do not edit.  ***-->
<link rel="stylesheet" href="../skin/tigris.css" type="text/css">
<link rel="stylesheet" href="../skin/mysite.css" type="text/css">
<link rel="stylesheet" href="../skin/site.css" type="text/css">
<link media="print" rel="stylesheet" href="../skin/print.css" type="text/css">
<title>POI-HSLF - A Quick Guide</title>
</head>
<body bgcolor="white" class="composite">
<!--================= start Banner ==================-->
<div id="banner">
<table width="100%" cellpadding="8" cellspacing="0" summary="banner" border="0">
<tbody>
<tr>
<!--================= start Group Logo ==================-->
<td width="50%" align="left">
<div class="groupLogo">
<a href="http://poi.apache.org"><img border="0" class="logoImage" alt="Apache POI" src="../resources/images/group-logo.jpg"></a>
</div>
</td>
<!--================= end Group Logo ==================-->
<!--================= start Project Logo ==================--><td width="50%" align="right">
<div align="right" class="projectLogo">
<a href="http://poi.apache.org/"><img border="0" class="logoImage" alt="POI" src="../resources/images/project-logo.jpg"></a>
</div>
</td>
<!--================= end Project Logo ==================-->
</tr>
</tbody>
</table>
</div>
<!--================= end Banner ==================-->
<!--================= start Main ==================-->
<table width="100%" cellpadding="0" cellspacing="0" border="0" summary="nav" id="breadcrumbs">
<tbody>
<!--================= start Status ==================-->
<tr class="status">
<td>
<!--================= start BreadCrumb ==================--><a href="http://www.apache.org/">Apache</a> | <a href="http://poi.apache.org/">POI</a><a href=""></a>
<!--================= end BreadCrumb ==================--></td><td id="tabs">
<!--================= start Tabs ==================-->
<div class="tab">
<span class="selectedTab"><a class="base-selected" href="../index.html">Home</a></span> | <script language="Javascript" type="text/javascript">
function printit() {  
if (window.print) {
    window.print() ;  
} else {
    var WebBrowser = '<OBJECT ID="WebBrowser1" WIDTH="0" HEIGHT="0" CLASSID="CLSID:8856F961-340A-11D0-A96B-00C04FD705A2"></OBJECT>';
document.body.insertAdjacentHTML('beforeEnd', WebBrowser);
    WebBrowser1.ExecWB(6, 2);//Use a 1 vs. a 2 for a prompting dialog box    WebBrowser1.outerHTML = "";  
}
}
</script><script language="Javascript" type="text/javascript">
var NS = (navigator.appName == "Netscape");
var VERSION = parseInt(navigator.appVersion);
if (VERSION > 3) {
    document.write('  <a title="PRINT this page OUT" href="javascript:printit()">PRINT</a>');
}
</script>
</div>
<!--================= end Tabs ==================-->
</td>
</tr>
</tbody>
</table>
<!--================= end Status ==================-->
<table id="main" width="100%" cellpadding="8" cellspacing="0" summary="" border="0">
<tbody>
<tr valign="top">
<!--================= start Menu ==================-->
<td id="leftcol">
<div id="navcolumn">
<div class="menuBar">
<div class="menu">
<span class="menuLabel">Apache POI</span>
        
<div class="menuItem">
<a href="../index.html">Top</a>
</div>
    
</div>
<div class="menu">
<span class="menuLabel">HSLF</span>
        
<div class="menuItem">
<a href="index.html">Overview</a>
</div>
        
<div class="menuItem">
<span class="menuSelected">Quick Guide</span>
</div>
        
<div class="menuItem">
<a href="how-to-shapes.html">HSLF Cookbok</a>
</div>
        
<div class="menuItem">
<a href="xslf-cookbook.html">XSLF Cookbok</a>
</div>
        
<div class="menuItem">
<a href="ppt-file-format.html">PPT File Format</a>
</div>
	
</div>
</div>
</div>
<form target="_blank" action="http://www.google.com/search" method="get">
<table summary="search" border="0" cellspacing="0" cellpadding="0">
<tr>
<td><img height="1" width="1" alt="" src="../skin/images/spacer.gif" class="spacer"></td><td nowrap="nowrap">
                          Search Apache POI<br>
<input value="poi.apache.org" name="sitesearch" type="hidden"><input size="10" name="q" id="query" type="text"><img height="1" width="5" alt="" src="../skin/images/spacer.gif" class="spacer"><input name="Search" value="GO" type="submit"></td><td><img height="1" width="1" alt="" src="../skin/images/spacer.gif" class="spacer"></td>
</tr>
<tr>
<td colspan="3"><img height="7" width="1" alt="" src="../skin/images/spacer.gif" class="spacer"></td>
</tr>
<tr>
<td class="bottom-left-thick"></td><td bgcolor="#a5b6c6"><img height="1" width="1" alt="" src="../skin/images/spacer.gif" class="spacer"></td><td class="bottom-right-thick"></td>
</tr>
</table>
</form>
</td>
<!--================= end Menu ==================-->
<!--================= start Content ==================--><td>
<div id="bodycol">
<div class="app">
<div align="center">
<h1>POI-HSLF - A Quick Guide</h1>
</div>
<div class="h3">
    

    
        
<a name="Basic+Text+Extraction"></a>
<div class="h3">
<h3>Basic Text Extraction</h3>
</div>
        
<p>For basic text extraction, make use of 
<span class="codefrag">org.apache.poi.hslf.extractor.PowerPointExtractor</span>. It accepts a file or an input
stream. The <span class="codefrag">getText()</span> method can be used to get the text from the slides, and the <span class="codefrag">getNotes()</span> method can be used to get the text
from the notes. Finally, <span class="codefrag">getText(true,true)</span> will get the text
from both.
		</p>
		
		
		
<a name="Specific+Text+Extraction"></a>
<div class="h3">
<h3>Specific Text Extraction</h3>
</div>
		
<p>To get specific bits of text, first create a <span class="codefrag">org.apache.poi.hslf.usermodel.SlideShow</span>
(from a <span class="codefrag">org.apache.poi.hslf.HSLFSlideShow</span>, which accepts a file or an input
stream). Use <span class="codefrag">getSlides()</span> and <span class="codefrag">getNotes()</span> to get the slides and notes.
These can be queried to get their page ID (though they should be returned
in the right order).</p>
		
<p>You can then call <span class="codefrag">getTextRuns()</span> on these, to get 
their blocks of text. (One TextRun normally holds all the text in a 
given area of the page, eg in the title bar, or in a box).
From the <span class="codefrag">TextRun</span>, you can extract the text, and check
what type of text it is (eg Body, Title). You can allso call
<span class="codefrag">getRichTextRuns()</span>, which will return the 
<span class="codefrag">RichTextRun</span>s that make up the <span class="codefrag">TextRun</span>. A 
<span class="codefrag">RichTextRun</span> is made up of a sequence of text, all having the
same character and paragraph formatting.
		</p>
		
		
        
<a name="Poor+Quality+Text+Extraction"></a>
<div class="h3">
<h3>Poor Quality Text Extraction</h3>
</div>
        
<p>If speed is the most important thing for you, you don't care
		about getting duplicate blocks of text, you don't care about 
		getting text from master sheets, and you don't care about getting
		old text, then 
		<span class="codefrag">org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor</span>
		might be of use.</p>
		
<p>QuickButCruddyTextExtractor doesn't use the normal record 
		parsing code, instead it uses a tree structure blind search 
		method to get all text holding records. You will get all the text,
		including lots of text you normally wouldn't ever want. However,
		you will get it back very very fast!</p>
		
<p>There are two ways of getting the text back. 
		<span class="codefrag">getTextAsString()</span> will return a single string with all
		the text in it. <span class="codefrag">getTextAsVector()</span> will return a 
		vector of strings, one for each text record found in the file.
		</p>
		

		
<a name="Changing+Text"></a>
<div class="h3">
<h3>Changing Text</h3>
</div>
		
<p>It is possible to change the text via 
		<span class="codefrag">TextRun.setText(String)</span> or
		<span class="codefrag">RichTextRun.setText(String)</span>. It is not yet possible
		to add additional TextRuns or RichTextRuns.</p>
		
<p>When calling <span class="codefrag">TextRun.setText(String)</span>, all
		the text will end up with the same formatting. When calling
		<span class="codefrag">RichTextRun.setText(String)</span>, the text will retain
		the old formatting of that <span class="codefrag">RichTextRun</span>.
		</p>
		

		
<a name="Adding+Slides"></a>
<div class="h3">
<h3>Adding Slides</h3>
</div>
		
<p>You may add new slides by calling
		<span class="codefrag">SlideShow.createSlide()</span>, which will add a new slide
		to the end of the SlideShow. It is not currently possible to
		re-order slides, nor to add new text to slides (currently only
		adding Escher objects to new slides is supported).
		</p>
		
		
		
<a name="Guide+to+key+classes"></a>
<div class="h3">
<h3>Guide to key classes</h3>
</div>
		
<ul>
		
<li>
<span class="codefrag">org.apache.poi.hslf.HSLFSlideShow</span>
		Handles reading in and writing out files. Calls 
		<span class="codefrag">org.apache.poi.hslf.record.record</span> to build a tree
		of all the records in the file, which it allows access to.
  		</li>
		
<li>
<span class="codefrag">org.apache.poi.hslf.record.record</span>
		Base class of all records. Also provides the main record generation
		code, which will build up a tree of records for a file.
  		</li>
  		
<li>
<span class="codefrag">org.apache.poi.hslf.usermodel.SlideShow</span>
  Builds up model entries from the records, and presents a user facing
  view of the file
  		</li>
  		
<li>
<span class="codefrag">org.apache.poi.hslf.model.Slide</span>
  A user facing view of a Slide in a slidesow. Allows you to get at the 
  Text of the slide, and at any drawing objects on it.
  		</li>
  		
<li>
<span class="codefrag">org.apache.poi.hslf.model.TextRun</span>
  Holds all the Text in a given area of the Slide, and will
  contain one or more <span class="codefrag">RichTextRun</span>s.
  		</li>
  		
<li>
<span class="codefrag">org.apache.poi.hslf.usermodel.RichTextRun</span>
  Holds a run of text, all having the same character and
  paragraph stylings. It is possible to modify text, and/or text stylings.
  		</li>
  		
<li>
<span class="codefrag">org.apache.poi.hslf.extractor.PowerPointExtractor</span>
  Uses the model code to allow extraction of text from files
		</li>
		
<li>
<span class="codefrag">org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor</span>
  Uses the record code to extract all the text from files very fast, 
  but including deleted text (and other bits of Crud).
		</li>
		
</ul>
		
	

<div id="authors" align="right">by&nbsp;Nick Burch</div>
</div>
</div>
</div>
</td>
<!--================= end Content ==================-->
</tr>
</tbody>
</table>
<!--================= end Main ==================-->
<!--================= start Footer ==================-->
<div id="footer">
<table summary="footer" cellspacing="0" cellpadding="4" width="100%" border="0">
<tbody>
<tr>
<!--================= start Copyright ==================-->
<td colspan="2">
<div align="center">
<div class="copyright">
              Copyright &copy; 2002-2012&nbsp;The Apache Software Foundation. All rights reserved.<br>
              Apache POI, POI, Apache, the Apache feather logo, and the Apache 
              POI project logo are trademarks of The Apache Software Foundation.
            </div>
</div>
</td>
<!--================= end Copyright ==================-->
</tr>
<tr>
<td align="left">
<!--================= start Host ==================-->
<!--================= end Host ==================--></td><td align="right">
<!--================= start Credits ==================-->
<div align="right">
<div class="credit"></div>
</div>
<!--================= end Credits ==================-->
</td>
</tr>
</tbody>
</table>
</div>
<!--================= end Footer ==================-->
</body>
</html>