Sophie

Sophie

distrib > Fedora > 14 > x86_64 > by-pkgid > 94f2f822b79da231841eb3b4cd7a4826 > files > 15

beagle-0.3.9-19.fc14.x86_64.rpm

<?xml version="1.0" encoding="utf-8"?>
<!--

This file allows you to use external programs to extract text from
more structured file formats.  For example, you could use pdftotext to
extract data from PDF files.  (Beagle includes a PDF filter, so this
isn't necessary, but you get the idea.)

There are some limitations to this system: the external programs must
take a filename on the command line and must output text to standard
out.  You cannot extract any metadata using this system.  Using our
PDF example, you could extract all of the text, but you couldn't
extract the author of the document as a special field.  For that, you
will have to write a more traditional filter.

Now, an example entry:

<filter>
  <mimetype>text/plain</mimetype>
  <extension>.txt</extension>
  <command>cat</command>
  <arguments>%s</arguments>
</filter>

mimetype - The mime type handled by this filter.  You may have 0 or
more of these for any filter.

extension - The file extension handled by this filter.  You may have 0
or more of these for any filter.

command - The filename of the command to run.  Do not put any command
line arguments in this.  This item is required.

arguments - Any arguments to pass into the given command.  The special
token "%s" means the filename to be passed in.  This item is required.

Here are some sample filters from the Wiki, simply move them into the 
'external-filters' tags to activate!

Simple TeX filter

    * Author: Stephan Hegel
    * Description: untex to remove LaTeX commands from input
    * Dependencies: untex 
<filter>
  <mimetype>text/x-tex</mimetype>
  <extension>.tex</extension>
  <command>untex</command>
  <arguments>-gascii %s</arguments>
</filter> 

Simple DVI filter

    * Author: Dav
    * Description: dvi to text using the "-q" option of dvi2tty
    * Dependencies: dvi2tty 
<filter>
  <mimetype>application/x-dvi</mimetype>
  <extension>.dvi</extension>
  <command>dvi2tty</command>
  <arguments>-q %s</arguments>
</filter>

Simple Postscript filter

    * Author: Ben Lee
    * Description: ps2ascii to extract text from postscript
    * Dependencies: ps2ascii 
<filter>
  <mimetype>application/postscript</mimetype>
  <extension>.ps</extension>
  <extension>.ai</extension>
  <extension>.eps</extension>
  <command>ps2ascii</command>
  <arguments>%s</arguments>
</filter>

Simple Djvu filter 
    * Author: Ben Lee
    * Description: djvutxt to extract text from Djvu files
    * Dependencies: djvutxt 
<filter>
  <mimetype>image/vnd.djvu</mimetype>
  <extension>.djvu</extension>
  <extension>.djv</extension>
  <command>djvutxt</command>
  <arguments>%s</arguments>
</filter>

-->

<external-filters>

</external-filters>