Sophie: clara-0.9.8-1mdk i586

clara-0.9.8-1mdk.i586.rpm

<HTML><HEAD><TITLE>Clara Book</TITLE></HEAD>
<BODY BGCOLOR=#D0D0D0>
<TABLE WIDTH=100% BORDER=1 BGCOLOR=#E2D3FC><TR><TD><CENTER><H1><BR>Clara OCR Tutorial<BR></H1></CENTER></TD></TR></TABLE>
<P>
<CENTER>
[<A href=index.html>Main</A>]
[<A href=clara-faq.html>FAQ</A>]
[<A href=clara-tut.html>Tutorial</A>]
[<A href=clara-adv.html>User's Manual</A>]
[<A href=clara-dev.html>Developer's Guide</A>]
</CENTER>

<P>
Welcome. Clara OCR is a free OCR, written for systems supporting
the C library and the X Windows System. Clara OCR is intended for the
cooperative OCR of books. There are some screenshots available at
<A HREF=http://www.claraocr.org/>http://www.claraocr.org/</A>.

<P>
This documentation is extracted automatically from the comments
of the Clara OCR source code. It is known as "The Clara OCR
Tutorial". There is also an advanced manual known as "The Clara
OCR Advanced User's Manual" (man page clara-adv(1), also
available in HTML format). Developers must read "The Clara OCR
Developer's Guide" (man page clara-dev(1), also available in HTML
format).

<P>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#79BEC6><FONT SIZE=+1><B> CONTENTS</B></FONT></TD></TR></TABLE>
<UL>
<P>
<LI> <A HREF=#1.>1. Making OCR</A>
<UL>
<P>
<LI> <A HREF=#1.1>    1.1 Starting Clara</A>
<LI> <A HREF=#1.2>    1.2 Some few command-line switches</A>
<LI> <A HREF=#1.3>    1.3 Training symbols</A>
<LI> <A HREF=#1.4>    1.4 Saving the session</A>
<LI> <A HREF=#1.5>    1.5 OCR steps</A>
<LI> <A HREF=#1.6>    1.6 Classification</A>
<LI> <A HREF=#1.7>    1.7 Note about how Clara OCR classification works</A>
<LI> <A HREF=#1.8>    1.8 Building the output</A>
<LI> <A HREF=#1.9>    1.9 Handling broken symbols</A>
<LI> <A HREF=#1.10>    1.10 Handling accents</A>
<LI> <A HREF=#1.11>    1.11 Browsing the book font</A>
<LI> <A HREF=#1.12>    1.12 Useful hints</A>
<LI> <A HREF=#1.13>    1.13 Fun codes</A>
<P>
</UL>
<LI> <A HREF=#2.>2. AVAILABILITY</A>
<UL>
<P>
</UL>
<LI> <A HREF=#3.>3. CREDITS</A>
<UL>
</UL>
</UL>
<A NAME=1.>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#79BEC6><FONT SIZE=+1><B>1. Making OCR</B></FONT></TD></TR></TABLE>
<P>
This section is a tutorial on the basic OCR features offerred by
Clara OCR. Clara OCR is not simple to use. A basic knowledge
about how it works is required for using it. Most complex
features are not covered by this tutorial. If you need to compile
Clara from the source code, read the INSTALL file and check (if
necessary) the compilation hints on the Clara OCR Advanced User's
Manual.

<P>

<P>
<A NAME=1.1>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.1 Starting Clara</B></FONT></TD></TR></TABLE>
<P>
So let's try it. The Clara distribution package contains one
small PBM file that you can use for a first test. The name of
this file is imre.pbm. If you cannot locate it, download it or
other files from <A HREF=http://www.claraocr.org/>http://www.claraocr.org/</A>. Alternatively, you can produce your
own 600-dpi PBM files scanning any printed document (hints for
scanning pages and converting them to PBM are given on the
section "Scanning books" of the Clara OCR Advanced User's
Manual).

<P>
Once you have a PBM file to try, cd to the directory where the
file resides and fire up Clara. Example:

<P>
<TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE>

    $ cd /tmp/clara
    $ clara &</PRE>
</TD></TR></TABLE></CENTER>
In order to make OCR tests, Clara will need to write files on
that directory, so write permission is required, just like some
free space.

<P>
Obs. As to version 0.9.8, Clara OCR heuristics are tuned
to handle 600 dpi bitmaps. When using a different resolution,
inform it using the -y switch:

<P>
<TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE>

    $ clara -y 300 &</PRE>
</TD></TR></TABLE></CENTER>
Then a window with menus and buttons will appear on your X
display:

<P>

<P>
<TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE>

    +-----------------------------------------------+
    | File Edit OCR ...                             |
    +-----------------------------------------------+
    | +--------+     +----+ +--------+ +-------+    |
    | |  zoom  |     |page| |patterns| | tune  |    |
    | +--------+   +-+    +-+        +-+       +-+  |
    | +--------+   | +-------------------------+ |  |
    | |  zone  |   | |                         | |  |
    | +--------+   | |                         | |  |
    | +--------+   | |                         | |  |
    | |  OCR   |   | |        WELCOME TO       | |  |
    | +--------+   | |                         | |  |
    | +--------+   | |    C L A R A    O C R   | |  |
    | |  stop  |   | |                         | |  |
    | +--------+   | |                         | |  |
    |      .       | |                         | |  |
    |      .       | |                         | |  |
    |              | |                         | |  |
    |              | |                         | |  |
    |              | +-------------------------+ |  |
    |              +-----------------------------+  |
    |                                               |
    | (status line)                                 |
    +-----------------------------------------------+</PRE>
</TD></TR></TABLE></CENTER>
Welcome aboard! The rectangle with the welcome message is called
"the plate". As you already guessed, the small rectangles with
the labels "zoom", "OCR", "stop", etc, are "the buttons". The
"tabs" are those flaps labelled "page", "patterns"
and "tune". On the menu bar you'll find the File menu, the Edit
menu, and so on. Popup the "Options" menu, and change the current
font size for better visualization, if required.

<P>
Press "L" to read the GPL, or select the "page" tab, and
subsequently, select on the plate the imre.pbm page (or any other
PBM file, if any). The OCR will load that file showing the
progress of this operation on the status line on the bottom of
the window.

<P>
note: the "page" tab is the flap labelled "page". This is
unrelated to the "tab" key.

<P>
When the load operation completes, Clara will show the loaded
file and two other windows (empty by now) on the plate. Move the
pointer along the plate and you'll see the tab label follow the
current window: "page", "page (output)" or "page (symbol)". Move
the pointer along the entire application window, and, for most
components, you'll see a short context help message on the status
line when the pointer reaches it (the buttons, for
instance). Dialogs (user confirmations) also use the status line
(like Emacs), instead of dialog boxes.

<P>
You can resize both the Clara application window or each of the
three windows currently on the plate ("page", "page (output)" and
"page (symbol)"). To resize the windows, select any point between
two of them and drag the mouse. The scrollbars can become hidden
(use the "hide scrollbars" on the View menu).

<P>
When the tab label is "page", press the "zoom" button using the
mouse button 1 and the scanned image will zoom out. If you use
the mouse button 2, the image will zomm in (the behaviour of the
"zoom" button depends on the current window).

<P>
Now try selecting the "page" tab many times, and you will
circulate the various display modes shared by this tab. These
modes are and will be referred as "PAGE", "PAGE (fatbits)" and
"PAGE (list)". Each display mode may have one or more windows
We've chosen this uncommon approach because an excess of tabs
transforms them in a useless decoration. The other tabs also
offer various modes, some will be presented later by this
tutorial.

<P>

<P>
<A NAME=1.2>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.2 Some few command-line switches</B></FONT></TD></TR></TABLE>
<P>
Besides the -y option used in the last subsection, Clara accepts
many others, documented on the Clara OCR Advanced User's
Manual. By now, from the various different ways to start Clara,
we'll limit ourselves to some few examples:

<P>
<TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE>

  clara
  clara -h</PRE>
</TD></TR></TABLE></CENTER>
In the first case, Clara is just started. On the second, it will
display a short help and exit.

<P>
<TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE>

  clara -f path
  clara -f path -w workdir</PRE>
</TD></TR></TABLE></CENTER>
The option -f informs the relative or absolute path of a scanned
page or a directory with scanned pages (PBM files). The option -w
informs the relative or absolute path of a work directory (where
Clara will create the output and data files).

<P>
<TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE>

  clara -i -f path -w workdir
  clara -b -f path -w workdir</PRE>
</TD></TR></TABLE></CENTER>
The option -i activates dead keys emulation for composition of
accents and characters. The -b switch is for batch
processing. Clara will automatically perform one OCR run on the
file informed through -f (or on all files found, if it is the
path of a directory) and exit without displaying its window.

<P>
<TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE>

  clara -Z 1 -F 7x13</PRE>
</TD></TR></TABLE></CENTER>
Clara will start with the smallest possible window size.

<P>
A full reference of command-line switches is given on the section
"Reference of command-line switches" of the Clara OCR Advanced
User's Manual.

<P>

<P>
<A NAME=1.3>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.3 Training symbols</B></FONT></TD></TR></TABLE>
<P>
Yes, Clara OCR must be trained. Training is a tedious procedure,
but it's a must for those who need a customizable OCR, apt to
adapt to a perhaps uncommon printing style.

<P>
On the "page" tab, observe the image of the document presented on
the top window. You'll see the symbols greyed, because the OCR
currently does not know their transliterations. Try to select one
symbol using the mouse (click the mouse button 1 over it). A
black elliptic cursor will appear around that symbol. This cursor
is called the "graphic cursor". You can move the graphic cursor
around the document using the arrow keys.

<P>
Now observe the bottom window on the "page" tab. That window
presents some detailed information on the current symbol (that
one identified by the graphic cursor). When the "show web clip"
option on the "View" menu is selected, a clip of the document
around the current symbol, is displayed too. In some cases, this
clip is useful for better visualization. The name "web clip" is
because this same image is exported to the Clara OCR web
interface when cooperative training and revision through the
Internet is being performed.

<P>
To inform the OCR about the transliteration of one symbol, just
type the corresponding key. For instance, if the current symbol
is a letter "a", just type the "a" key. Observe that the trained
symbol becomes black. Each symbol trained will be learned by the
OCR, its bitmap will be called a "pattern", and it will be used
as such when trying to deduce the transliteration of unknown
symbols.

<P>
Obs. in our test, the user chose the symbol to be trained. However,
Clara OCR can choose by itself the symbols to be trained. This feature
is called "build the bookfont automatically" (found on the "tune"
tab). To use it, select the corresponding checkbos and classify the
symbols as explained later.

<P>
Finally, when the transliteration cannot be informed through one
single keystroke or composition (for instance when you wish to
inform a TeX macro as being the transliteration of the current
symbol), write down the transliteration using the text input
field on the bottom window (select it using the mouse before).

<P>

<P>
<A NAME=1.4>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.4 Saving the session</B></FONT></TD></TR></TABLE>
<P>
Before going further, it's important to know how to save your
work. The file menu contains one item labelled "save
session". When selected, it will create or overwrite three files
on the working directory: "patterns", "acts" and "page.session",
where "page" is the name of the file currently loaded, without
the "pbm" extension (in out example, "imre"). So, to remove all
data produced by OCR sessions, remove manually the files
"*.session", "patterns" and "acts".

<P>
Note that the files "patterns" and "acts" are shared by all PBM
pages, so a symbol trained from one page is reused on the other
pages. The ".session" files however are per-page. Pages with the
same graphic characteristics, and only them, must be put on one
same directory, in order to share the same patterns.

<P>
When the "quit" option of the "File" menu is selected, the OCR
prompts the user for saving the session (answer pressing the key
"y" or "n"), unless there are no unsaved changes.

<P>

<P>

<P>
<A NAME=1.5>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.5 OCR steps</B></FONT></TD></TR></TABLE>
<P>
The OCR process is divided into various steps, for instance
"classification", "build", etc. These steps are acessible clicking
the mouse button 2 over the OCR button. Each one can be started
independently and/or repeated at any moment. In fact, the more
you know about these steps, the better you'll use them.

<P>
Clicking the "OCR" button with the mouse button 1, all steps will
be started in sequence. The "OCR" button remains on the
"selected" state while some step is running.

<P>
Yet we won't cover this stuff in the tutorial, a basic knowledge
on what each step perform is required for fine-tuning Clara OCR.
The tuning is an interactive effort where the usage of the
heuristics alternates with training and revision, guided by the
user experience and feeling.

<P>

<P>
<A NAME=1.6>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.6 Classification</B></FONT></TD></TR></TABLE>
<P>
After training some symbols, we're ready to apply the just
acquired knowledge to deduce the transliteration of non-trained
symbols. For that, Clara OCR will compare the non-trained symbols
with those trained ("patterns"). Clara OCR offers nice visual
modes to present the comparison of each symbol with each
pattern. To activate the visual modes, enter the View menu and
select (for instance) the "show comparisons" option.

<P>
Now start the "classification" step (click the mouse button 2
over the OCR button and select the "classification" item) and
observe what happens. Depending on your hardware and on the size
of the document, this operation may take long to complete
(e.g. 5 minutes). Hopefully it'll be much faster (say, 30
seconds).

<P>
When the classification finishes, observe that some nontrained
symbols became black. Each such symbol was found similar to some
pattern. Select one black symbol, and Clara will draw a gray
ellipse around each class member (except the selected symbol,
identified by the black graphic cursor). You can switch off this
feature unselecting the "Show current class" item on the "View"
menu.

<P>
In some cases, Clara will classify incorrectly some symbols. For
instance, a defective "e" may be classified as "c". If that
happens, you can inform Clara about the correct transliteration
of that symbol training it as explained before (in this example,
select the symbol and press "e"). This action will remove that
symbol from its current class, and will define a new class,
currently unitary and containing just that symbol.

<P>

<P>
<A NAME=1.7>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.7 Note about how Clara OCR classification works</B></FONT></TD></TR></TABLE>
<P>
The usual meaning of "classification" for OCRs is to deduce for
each symbol if it is a letter "a" or the letter "b", or a digit
"1", etc. As the total number of different symbols is small (some
tenths), there will be a small quantity of classes.

<P>
However, instead of classifying each symbol as being the letter
"a", or the digit "1", or whatever, Clara OCR builds classes of
symbols with similar shapes, not necessarily assigning a
transliteration for each symbol. So as sometimes the bitmap
comparison heuristics consider two true letters "a" dissimilar
(due to printing differences or defects), the Clara OCR
classifier will brake the set of all letters "a" in various
untransliterated subclasses.

<P>
Therefore, the classification result may be a much larger number
of classes (thousands or more), not only because of those small
differences or defects, but also because the classification
heuristics are currently unable to scale symbols or to "boldfy"
or "italicize" a symbol.

<P>
Note that each untransliterated subclass of letters "a" depends
on a punctual human revision effort to become transliterated
(trained). This is not an absurd strategy, because the revision
of each subset corresponds to part of the unavoidable human
revision effort required by any real-life digitalization
project. This is one of the principles that make possible to see
Clara OCR not as a traditional OCR, but as a productivity tool
able to reduce costs. Anyway, we expect to the future
improvements on the Clara OCR classifier, in order to lessen the
number of subclasses created.

<P>

<P>
<A NAME=1.8>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.8 Building the output</B></FONT></TD></TR></TABLE>
<P>
Now we're ready to build the OCR output. Just start the
"build" step. The action performed will be basically
to detect text words and lines, and output the transliterations,
trained or deduced, of all symbols. The output will be presented
on the "PAGE (output)" window.

<P>
Each character on the "PAGE (output)" window behaves like a
HTML hyperlink. Click it to select the current symbol both
on the "PAGE" window and on the "PAGE (symbol)" window. Note
that the transliteration of unknow symbols is substituted by
their internal IDs (for instance "[133]").

<P>
The result of the word detection heuristic can be visualized
checking the "show words" item on the "View" menu. As to version
0.9.8, Clara OCR does not offer controls to tune the word
detection techniques, so this visualization is currently useful
to diagnose problems but not to solve them.

<P>

<P>
<A NAME=1.9>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.9 Handling broken symbols</B></FONT></TD></TR></TABLE>
<P>
Obs. As to version 0.9.8 the merging heristics are only
partially implemented, and in most cases they won't produce any effect.

<P>
The build heuristics also try to merge the pieces of broken
symbols, just like the "u", the "h" and the "E" on the figure
(observe the absent pixels). Some letters have thin parts, and
depending on the paper and printing quality, these parts will
brake more or less frequently.

<P>

<P>
<TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE>

                 XXX            XXXXXXXXXXX
                  XX             XXX      X
                  XX             XXX
                  XX             XXX
    XXX   XXX     XX   XXX       XXX     X
     XX    XX     XXX     X      XXX  XXXX
     XX    XX     XX      XX     XXX     X
     XX    XX     XX      XX     XXX
     XX    XX     XX      XX     XXX
     XX    XX     XX      XX     XXX      X
      XX  XXXX   XXXX     XXX   XXXXXXXXXXX</PRE>
</TD></TR></TABLE></CENTER>
Clara OCR offers three symbol merging heuristics:
geometric-based, recognition-based and learned. Each one may be
activated or deactivated using the "tune" tab.

<P>
Geometric merging applies to fragments on the interior of the
symbol bounding box, like the "E" on the figure, and to some other
cases too.

<P>
The recognition merging searches unrecognized
symbols and, for each one, tries to merge it with some
neighbour(s), and checks if the result becomes similar to some
pattern.

<P>
Finally, learned merging will try to reproduce the
cases trained by the user. To train merging, just select the
symbol using the mouse button 1
(say, the left part of the "u" on the figure), click the mouse
button 2 on the fragment (the right part of the "u"), and select
the "merge with current symbol" entry. On the other hand, the
"disassemble" entry may be used to break a symbol into its
components.

<P>

<P>
<A NAME=1.10>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.10 Handling accents</B></FONT></TD></TR></TABLE>
<P>
Now let's talk about accents.

<P>
As a general rule, Clara OCR does not consider accents as
parts of letters, so merging does not apply to accents. Accents
are considered individual symbols, and must be trained separately.
Clara OCR will compose accents with the corresponding letters when
generating the output. The exception is when the accent is
graphically joined to the letter:

<P>
<TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE>

           XXX
           XX          XXX
          XX           XX
                      XX
       XXXX         XXXX
     XX    XX     XX    XX
    XX      XX   XX      XX
    XXXXXXXXXX   XXXXXXXXXX
    XX           XX
    XX           XX
     XX    XX     XX    XX
       XXXX         XXXX</PRE>
</TD></TR></TABLE></CENTER>
In the figure we have two samples of "e" letter with acute
accent. In the first one, the accent is graphically separated
from the letter. So the accent transliteration will be trained or
deduced as being "'", the letter transliteration
will be trained or deduced as beig "e". When generating the output,
Clara OCR will compose them as the macro "\'e" (or as the ISO
character 233, as soon as we provide this alternative behaviour).

<P>
On the second case the accent isn't graphically separable from
the letter, so we'll need to train the accented character as the
corresponding ISO character (code 233) or as the macro "\'e". As
the generation of accented characters depend on the local X
settings, the "Emulate deadkeys" item on the "Options" menu may
be useful in this case. It will enable the composition of accents
and letters performed directly by Clara OCR (like Emacs
iso-accents-mode feature).

<P>

<P>
<A NAME=1.11>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.11 Browsing the book font</B></FONT></TD></TR></TABLE>
<P>
As explained earlier, trained symbols become patterns (unless you
mark it "bad"). The collection of all patterns is called "book
font" (the term "book" is to distinguish it from the GUI
font). Clara OCR stores all pattern in the "patterns" file on the
work directory, when the "save session" entry on the "File" menu
is selected.

<P>
Clara OCR itself can choose the patterns and populate the book
font. To do so, just select the "Build the font automatically"
item on the "tune" tab, and classify the symbols.

<P>
To browse the patterns, click the "pattern" tab one or more times
to enter the "Pattern (list)" window. The "PATTERN (list)" mode
displays the bitmap and the properties
of each pattern in a (perhaps very long) form.
Click the "zoom" button to
adjust the size of the pattern bitmaps. Use the scroolbar or
the Next (Page Down) or Previous (Page Up) keys to navigate. Use
the sort options on the "Edit" menu to change the presentation order.

<P>
Now press the "pattern" tab again to reach the "Pattern" window. It
presents the "current" pattern with detailed properties. try
activating the "show web clip" option on the "View" menu to
visualize the pattern context. The left and
right arrows will move to the previous and to the next patterns. To
train the current pattern (being exhibited on the "Pattern" window),
just press the key corresponding to its transliteration (Clara will
automatically move to the next pattern) or fill the
input field. There is no need to press ENTER to submit the input
field contents.

<P>

<P>
<A NAME=1.12>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.12 Useful hints</B></FONT></TD></TR></TABLE>
<P>
If the GUI becomes trashed or blank, press C-l to redraw it.

<P>
By now, the GUI do not support cut-and-paste. To save to a file
the contents of the "PAGE (list)" window, use the "Write report"
item on the "File" menu.

<P>
The "OCR" button will enter "pressed" stated in some unexpected
situations, like during dialogs. This behaviour will be fixed
soon.

<P>
The "STOP" button do not stop immediately the OCR operation in
course (e.g. classification). Clara OCR only stops the operation
in course in "secure" points, where all data structures are
consistent.

<P>
The zone button allows the creation of only one zone, but the OCR
won't become restricted to that zone. By now, the zone is useful
only to be saved as a PBM file using the "save zone" option on
the "File" menu.

<P>
The OCR output is automatically saved to the file page.html, where
"page" is the name of the currently loaded page, without the "pbm"
extension. This file is created by the "generate output" step on the
menu that appears when the mouse button 2 is pressed over the OCR
button.

<P>
The following OCR steps are currently unfinished and perform no
action: "generate spelling hints", "detect blocks", and
"Geometric merging".

<P>

<P>
<A NAME=1.13>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.13 Fun codes</B></FONT></TD></TR></TABLE>
<P>
Clara OCR "fun codes" are similar to videogame "codes" (for those
who have never heard about that, videogame "codes" are special
sequences of mouse or key clicks that make your player
invulnerable, or obtain maximum energy, or perform an unexpected
action, etc).

<P>
The difference is that Clara OCR "fun codes" are not secret
(videogame "codes" are normally secret and very hard to discover
by chance). Clara OCR contains no secret feature. Fun codes are
intended to be used along public presentations. By now there is
only one fun code: just click one or more times the banner on the
welcome window to make it scroll.

<P>

<P>

<P>
<A NAME=2.>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#79BEC6><FONT SIZE=+1><B>2. AVAILABILITY</B></FONT></TD></TR></TABLE>
<P>
Clara OCR is free software. Its source code is distributed under
the terms of the GNU GPL (General Public License), and is
available at <A HREF=http://www.claraocr.org/>http://www.claraocr.org/</A>. If you don't know what is the GPL,
please read it and check the GPL FAQ at
<A HREF=http://www.gnu.org/copyleft/gpl-faq.html>http://www.gnu.org/copyleft/gpl-faq.html</A>. You should have
received a copy of the GNU General Public License along with this
software; if not, write to the Free Software Foundation, Inc., 59
Temple Place - Suite 330, Boston, MA 02111-1307, USA. The Free
Software Foundation can be found at <A HREF=http://www.fsf.org>http://www.fsf.org</A>.

<P>

<P>
<A NAME=3.>
<P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#79BEC6><FONT SIZE=+1><B>3. CREDITS</B></FONT></TD></TR></TABLE>
<P>
Clara OCR was written by Ricardo Ueda Karpischek. Imre Simon
contributed high-volume tests, discussions with experts,
selection of bibliographic resources, propaganda and many ideas
on how to make the software more useful.

<P>
Ricardo authored various free materials, some included in
Conectiva, Debian, FreeBSD and SuSE (the verb conjugator
"conjugue", the ispell dictionary br.ispell and the proxy
axw3). He recently ported the EiC interpreter to the Psion 5
handheld. Imre Simon promotes the usage and development of free
technologies and information from his research, teaching and
administrative labour at the University.

<P>
Ricardo Ueda Karpischek works as an independent developer and
instructor, and received no financial support to develop Clara
OCR. He's not an employee of any company or organization.

<P>
Roberto Hirata Junior and Marcelo Marcilio Silva contributed
ideas on character isolation and recognition. Richard Stallman
suggested improvements on how to generate HTML output. Marius
Vollmer is helping to add Guile support. Jacques Le Marois helped
on the announce process. We acknowledge Mike O'Donnell and Junior
Barrera for their good criticism. We acknowledge Peter Lyman for
his remarks about the Berkeley Digital Library, and Wanderley
Antonio Cavassin, Janos Simon and Roberto Marcondes Cesar Junior
for some web and bibliographic pointers. Bruno Barbieri Gnecco
provided hints and explanations about GOCR (main author: Jorg
Schulenburg). Luis Jose Cearra Zabala (author of OCRE) is gently
supporting our tentatives of using portions of his code. Adriano
Nagelschmidt Rodrigues and Carlos Juiti Watanabe carefully tried
the tutorial before the first announce. Eduardo Marcel Macan
packaged Clara OCR for Debian and suggested some
improvements. Mandrakesoft is hosting claraocr.org. We
acknowledge Conectiva and SuSE for providing copies of their
outstanding distributions. Finally, we acknowledge the late Jose
Hugo de Oliveira Bussab for his interest in our work.

<P>
The fonts used by the "view alphabet map" feature came from
Roman Czyborra's "The ISO 8859 Alphabet Soup" page at
<A HREF=http://czyborra.com/charsets/iso8859.html>http://czyborra.com/charsets/iso8859.html</A>.

<P>
Obs. see also the Changelog (<A HREF=http://www.claraocr.org/CHANGELOG>http://www.claraocr.org/CHANGELOG</A>).

<P>
</HR></BODY></HTML>