<HTML><HEAD><TITLE>Clara Book</TITLE></HEAD> <BODY BGCOLOR=#D0D0D0> <TABLE WIDTH=100% BORDER=1 BGCOLOR=#E2D3FC><TR><TD><CENTER><H1><BR>Clara OCR Tutorial<BR></H1></CENTER></TD></TR></TABLE> <P> <CENTER> [<A href=index.html>Main</A>] [<A href=clara-faq.html>FAQ</A>] [<A href=clara-tut.html>Tutorial</A>] [<A href=clara-adv.html>User's Manual</A>] [<A href=clara-dev.html>Developer's Guide</A>] </CENTER> <P> Welcome. Clara OCR is a free OCR, written for systems supporting the C library and the X Windows System. Clara OCR is intended for the cooperative OCR of books. There are some screenshots available at <A HREF=http://www.claraocr.org/>http://www.claraocr.org/</A>. <P> This documentation is extracted automatically from the comments of the Clara OCR source code. It is known as "The Clara OCR Tutorial". There is also an advanced manual known as "The Clara OCR Advanced User's Manual" (man page clara-adv(1), also available in HTML format). Developers must read "The Clara OCR Developer's Guide" (man page clara-dev(1), also available in HTML format). <P> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#79BEC6><FONT SIZE=+1><B> CONTENTS</B></FONT></TD></TR></TABLE> <UL> <P> <LI> <A HREF=#1.>1. Making OCR</A> <UL> <P> <LI> <A HREF=#1.1> 1.1 Starting Clara</A> <LI> <A HREF=#1.2> 1.2 Some few command-line switches</A> <LI> <A HREF=#1.3> 1.3 Training symbols</A> <LI> <A HREF=#1.4> 1.4 Saving the session</A> <LI> <A HREF=#1.5> 1.5 OCR steps</A> <LI> <A HREF=#1.6> 1.6 Classification</A> <LI> <A HREF=#1.7> 1.7 Note about how Clara OCR classification works</A> <LI> <A HREF=#1.8> 1.8 Building the output</A> <LI> <A HREF=#1.9> 1.9 Handling broken symbols</A> <LI> <A HREF=#1.10> 1.10 Handling accents</A> <LI> <A HREF=#1.11> 1.11 Browsing the book font</A> <LI> <A HREF=#1.12> 1.12 Useful hints</A> <LI> <A HREF=#1.13> 1.13 Fun codes</A> <P> </UL> <LI> <A HREF=#2.>2. AVAILABILITY</A> <UL> <P> </UL> <LI> <A HREF=#3.>3. CREDITS</A> <UL> </UL> </UL> <A NAME=1.> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#79BEC6><FONT SIZE=+1><B>1. Making OCR</B></FONT></TD></TR></TABLE> <P> This section is a tutorial on the basic OCR features offerred by Clara OCR. Clara OCR is not simple to use. A basic knowledge about how it works is required for using it. Most complex features are not covered by this tutorial. If you need to compile Clara from the source code, read the INSTALL file and check (if necessary) the compilation hints on the Clara OCR Advanced User's Manual. <P> <P> <A NAME=1.1> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.1 Starting Clara</B></FONT></TD></TR></TABLE> <P> So let's try it. The Clara distribution package contains one small PBM file that you can use for a first test. The name of this file is imre.pbm. If you cannot locate it, download it or other files from <A HREF=http://www.claraocr.org/>http://www.claraocr.org/</A>. Alternatively, you can produce your own 600-dpi PBM files scanning any printed document (hints for scanning pages and converting them to PBM are given on the section "Scanning books" of the Clara OCR Advanced User's Manual). <P> Once you have a PBM file to try, cd to the directory where the file resides and fire up Clara. Example: <P> <TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE> $ cd /tmp/clara $ clara &</PRE> </TD></TR></TABLE></CENTER> In order to make OCR tests, Clara will need to write files on that directory, so write permission is required, just like some free space. <P> Obs. As to version 0.9.8, Clara OCR heuristics are tuned to handle 600 dpi bitmaps. When using a different resolution, inform it using the -y switch: <P> <TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE> $ clara -y 300 &</PRE> </TD></TR></TABLE></CENTER> Then a window with menus and buttons will appear on your X display: <P> <P> <TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE> +-----------------------------------------------+ | File Edit OCR ... | +-----------------------------------------------+ | +--------+ +----+ +--------+ +-------+ | | | zoom | |page| |patterns| | tune | | | +--------+ +-+ +-+ +-+ +-+ | | +--------+ | +-------------------------+ | | | | zone | | | | | | | +--------+ | | | | | | +--------+ | | | | | | | OCR | | | WELCOME TO | | | | +--------+ | | | | | | +--------+ | | C L A R A O C R | | | | | stop | | | | | | | +--------+ | | | | | | . | | | | | | . | | | | | | | | | | | | | | | | | | | +-------------------------+ | | | +-----------------------------+ | | | | (status line) | +-----------------------------------------------+</PRE> </TD></TR></TABLE></CENTER> Welcome aboard! The rectangle with the welcome message is called "the plate". As you already guessed, the small rectangles with the labels "zoom", "OCR", "stop", etc, are "the buttons". The "tabs" are those flaps labelled "page", "patterns" and "tune". On the menu bar you'll find the File menu, the Edit menu, and so on. Popup the "Options" menu, and change the current font size for better visualization, if required. <P> Press "L" to read the GPL, or select the "page" tab, and subsequently, select on the plate the imre.pbm page (or any other PBM file, if any). The OCR will load that file showing the progress of this operation on the status line on the bottom of the window. <P> note: the "page" tab is the flap labelled "page". This is unrelated to the "tab" key. <P> When the load operation completes, Clara will show the loaded file and two other windows (empty by now) on the plate. Move the pointer along the plate and you'll see the tab label follow the current window: "page", "page (output)" or "page (symbol)". Move the pointer along the entire application window, and, for most components, you'll see a short context help message on the status line when the pointer reaches it (the buttons, for instance). Dialogs (user confirmations) also use the status line (like Emacs), instead of dialog boxes. <P> You can resize both the Clara application window or each of the three windows currently on the plate ("page", "page (output)" and "page (symbol)"). To resize the windows, select any point between two of them and drag the mouse. The scrollbars can become hidden (use the "hide scrollbars" on the View menu). <P> When the tab label is "page", press the "zoom" button using the mouse button 1 and the scanned image will zoom out. If you use the mouse button 2, the image will zomm in (the behaviour of the "zoom" button depends on the current window). <P> Now try selecting the "page" tab many times, and you will circulate the various display modes shared by this tab. These modes are and will be referred as "PAGE", "PAGE (fatbits)" and "PAGE (list)". Each display mode may have one or more windows We've chosen this uncommon approach because an excess of tabs transforms them in a useless decoration. The other tabs also offer various modes, some will be presented later by this tutorial. <P> <P> <A NAME=1.2> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.2 Some few command-line switches</B></FONT></TD></TR></TABLE> <P> Besides the -y option used in the last subsection, Clara accepts many others, documented on the Clara OCR Advanced User's Manual. By now, from the various different ways to start Clara, we'll limit ourselves to some few examples: <P> <TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE> clara clara -h</PRE> </TD></TR></TABLE></CENTER> In the first case, Clara is just started. On the second, it will display a short help and exit. <P> <TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE> clara -f path clara -f path -w workdir</PRE> </TD></TR></TABLE></CENTER> The option -f informs the relative or absolute path of a scanned page or a directory with scanned pages (PBM files). The option -w informs the relative or absolute path of a work directory (where Clara will create the output and data files). <P> <TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE> clara -i -f path -w workdir clara -b -f path -w workdir</PRE> </TD></TR></TABLE></CENTER> The option -i activates dead keys emulation for composition of accents and characters. The -b switch is for batch processing. Clara will automatically perform one OCR run on the file informed through -f (or on all files found, if it is the path of a directory) and exit without displaying its window. <P> <TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE> clara -Z 1 -F 7x13</PRE> </TD></TR></TABLE></CENTER> Clara will start with the smallest possible window size. <P> A full reference of command-line switches is given on the section "Reference of command-line switches" of the Clara OCR Advanced User's Manual. <P> <P> <A NAME=1.3> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.3 Training symbols</B></FONT></TD></TR></TABLE> <P> Yes, Clara OCR must be trained. Training is a tedious procedure, but it's a must for those who need a customizable OCR, apt to adapt to a perhaps uncommon printing style. <P> On the "page" tab, observe the image of the document presented on the top window. You'll see the symbols greyed, because the OCR currently does not know their transliterations. Try to select one symbol using the mouse (click the mouse button 1 over it). A black elliptic cursor will appear around that symbol. This cursor is called the "graphic cursor". You can move the graphic cursor around the document using the arrow keys. <P> Now observe the bottom window on the "page" tab. That window presents some detailed information on the current symbol (that one identified by the graphic cursor). When the "show web clip" option on the "View" menu is selected, a clip of the document around the current symbol, is displayed too. In some cases, this clip is useful for better visualization. The name "web clip" is because this same image is exported to the Clara OCR web interface when cooperative training and revision through the Internet is being performed. <P> To inform the OCR about the transliteration of one symbol, just type the corresponding key. For instance, if the current symbol is a letter "a", just type the "a" key. Observe that the trained symbol becomes black. Each symbol trained will be learned by the OCR, its bitmap will be called a "pattern", and it will be used as such when trying to deduce the transliteration of unknown symbols. <P> Obs. in our test, the user chose the symbol to be trained. However, Clara OCR can choose by itself the symbols to be trained. This feature is called "build the bookfont automatically" (found on the "tune" tab). To use it, select the corresponding checkbos and classify the symbols as explained later. <P> Finally, when the transliteration cannot be informed through one single keystroke or composition (for instance when you wish to inform a TeX macro as being the transliteration of the current symbol), write down the transliteration using the text input field on the bottom window (select it using the mouse before). <P> <P> <A NAME=1.4> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.4 Saving the session</B></FONT></TD></TR></TABLE> <P> Before going further, it's important to know how to save your work. The file menu contains one item labelled "save session". When selected, it will create or overwrite three files on the working directory: "patterns", "acts" and "page.session", where "page" is the name of the file currently loaded, without the "pbm" extension (in out example, "imre"). So, to remove all data produced by OCR sessions, remove manually the files "*.session", "patterns" and "acts". <P> Note that the files "patterns" and "acts" are shared by all PBM pages, so a symbol trained from one page is reused on the other pages. The ".session" files however are per-page. Pages with the same graphic characteristics, and only them, must be put on one same directory, in order to share the same patterns. <P> When the "quit" option of the "File" menu is selected, the OCR prompts the user for saving the session (answer pressing the key "y" or "n"), unless there are no unsaved changes. <P> <P> <P> <A NAME=1.5> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.5 OCR steps</B></FONT></TD></TR></TABLE> <P> The OCR process is divided into various steps, for instance "classification", "build", etc. These steps are acessible clicking the mouse button 2 over the OCR button. Each one can be started independently and/or repeated at any moment. In fact, the more you know about these steps, the better you'll use them. <P> Clicking the "OCR" button with the mouse button 1, all steps will be started in sequence. The "OCR" button remains on the "selected" state while some step is running. <P> Yet we won't cover this stuff in the tutorial, a basic knowledge on what each step perform is required for fine-tuning Clara OCR. The tuning is an interactive effort where the usage of the heuristics alternates with training and revision, guided by the user experience and feeling. <P> <P> <A NAME=1.6> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.6 Classification</B></FONT></TD></TR></TABLE> <P> After training some symbols, we're ready to apply the just acquired knowledge to deduce the transliteration of non-trained symbols. For that, Clara OCR will compare the non-trained symbols with those trained ("patterns"). Clara OCR offers nice visual modes to present the comparison of each symbol with each pattern. To activate the visual modes, enter the View menu and select (for instance) the "show comparisons" option. <P> Now start the "classification" step (click the mouse button 2 over the OCR button and select the "classification" item) and observe what happens. Depending on your hardware and on the size of the document, this operation may take long to complete (e.g. 5 minutes). Hopefully it'll be much faster (say, 30 seconds). <P> When the classification finishes, observe that some nontrained symbols became black. Each such symbol was found similar to some pattern. Select one black symbol, and Clara will draw a gray ellipse around each class member (except the selected symbol, identified by the black graphic cursor). You can switch off this feature unselecting the "Show current class" item on the "View" menu. <P> In some cases, Clara will classify incorrectly some symbols. For instance, a defective "e" may be classified as "c". If that happens, you can inform Clara about the correct transliteration of that symbol training it as explained before (in this example, select the symbol and press "e"). This action will remove that symbol from its current class, and will define a new class, currently unitary and containing just that symbol. <P> <P> <A NAME=1.7> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.7 Note about how Clara OCR classification works</B></FONT></TD></TR></TABLE> <P> The usual meaning of "classification" for OCRs is to deduce for each symbol if it is a letter "a" or the letter "b", or a digit "1", etc. As the total number of different symbols is small (some tenths), there will be a small quantity of classes. <P> However, instead of classifying each symbol as being the letter "a", or the digit "1", or whatever, Clara OCR builds classes of symbols with similar shapes, not necessarily assigning a transliteration for each symbol. So as sometimes the bitmap comparison heuristics consider two true letters "a" dissimilar (due to printing differences or defects), the Clara OCR classifier will brake the set of all letters "a" in various untransliterated subclasses. <P> Therefore, the classification result may be a much larger number of classes (thousands or more), not only because of those small differences or defects, but also because the classification heuristics are currently unable to scale symbols or to "boldfy" or "italicize" a symbol. <P> Note that each untransliterated subclass of letters "a" depends on a punctual human revision effort to become transliterated (trained). This is not an absurd strategy, because the revision of each subset corresponds to part of the unavoidable human revision effort required by any real-life digitalization project. This is one of the principles that make possible to see Clara OCR not as a traditional OCR, but as a productivity tool able to reduce costs. Anyway, we expect to the future improvements on the Clara OCR classifier, in order to lessen the number of subclasses created. <P> <P> <A NAME=1.8> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.8 Building the output</B></FONT></TD></TR></TABLE> <P> Now we're ready to build the OCR output. Just start the "build" step. The action performed will be basically to detect text words and lines, and output the transliterations, trained or deduced, of all symbols. The output will be presented on the "PAGE (output)" window. <P> Each character on the "PAGE (output)" window behaves like a HTML hyperlink. Click it to select the current symbol both on the "PAGE" window and on the "PAGE (symbol)" window. Note that the transliteration of unknow symbols is substituted by their internal IDs (for instance "[133]"). <P> The result of the word detection heuristic can be visualized checking the "show words" item on the "View" menu. As to version 0.9.8, Clara OCR does not offer controls to tune the word detection techniques, so this visualization is currently useful to diagnose problems but not to solve them. <P> <P> <A NAME=1.9> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.9 Handling broken symbols</B></FONT></TD></TR></TABLE> <P> Obs. As to version 0.9.8 the merging heristics are only partially implemented, and in most cases they won't produce any effect. <P> The build heuristics also try to merge the pieces of broken symbols, just like the "u", the "h" and the "E" on the figure (observe the absent pixels). Some letters have thin parts, and depending on the paper and printing quality, these parts will brake more or less frequently. <P> <P> <TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE> XXX XXXXXXXXXXX XX XXX X XX XXX XX XXX XXX XXX XX XXX XXX X XX XX XXX X XXX XXXX XX XX XX XX XXX X XX XX XX XX XXX XX XX XX XX XXX XX XX XX XX XXX X XX XXXX XXXX XXX XXXXXXXXXXX</PRE> </TD></TR></TABLE></CENTER> Clara OCR offers three symbol merging heuristics: geometric-based, recognition-based and learned. Each one may be activated or deactivated using the "tune" tab. <P> Geometric merging applies to fragments on the interior of the symbol bounding box, like the "E" on the figure, and to some other cases too. <P> The recognition merging searches unrecognized symbols and, for each one, tries to merge it with some neighbour(s), and checks if the result becomes similar to some pattern. <P> Finally, learned merging will try to reproduce the cases trained by the user. To train merging, just select the symbol using the mouse button 1 (say, the left part of the "u" on the figure), click the mouse button 2 on the fragment (the right part of the "u"), and select the "merge with current symbol" entry. On the other hand, the "disassemble" entry may be used to break a symbol into its components. <P> <P> <A NAME=1.10> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.10 Handling accents</B></FONT></TD></TR></TABLE> <P> Now let's talk about accents. <P> As a general rule, Clara OCR does not consider accents as parts of letters, so merging does not apply to accents. Accents are considered individual symbols, and must be trained separately. Clara OCR will compose accents with the corresponding letters when generating the output. The exception is when the accent is graphically joined to the letter: <P> <TABLE WIDTH=100%><TR><TD BGCOLOR=#E0E0E0><PRE> XXX XX XXX XX XX XX XXXX XXXX XX XX XX XX XX XX XX XX XXXXXXXXXX XXXXXXXXXX XX XX XX XX XX XX XX XX XXXX XXXX</PRE> </TD></TR></TABLE></CENTER> In the figure we have two samples of "e" letter with acute accent. In the first one, the accent is graphically separated from the letter. So the accent transliteration will be trained or deduced as being "'", the letter transliteration will be trained or deduced as beig "e". When generating the output, Clara OCR will compose them as the macro "\'e" (or as the ISO character 233, as soon as we provide this alternative behaviour). <P> On the second case the accent isn't graphically separable from the letter, so we'll need to train the accented character as the corresponding ISO character (code 233) or as the macro "\'e". As the generation of accented characters depend on the local X settings, the "Emulate deadkeys" item on the "Options" menu may be useful in this case. It will enable the composition of accents and letters performed directly by Clara OCR (like Emacs iso-accents-mode feature). <P> <P> <A NAME=1.11> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.11 Browsing the book font</B></FONT></TD></TR></TABLE> <P> As explained earlier, trained symbols become patterns (unless you mark it "bad"). The collection of all patterns is called "book font" (the term "book" is to distinguish it from the GUI font). Clara OCR stores all pattern in the "patterns" file on the work directory, when the "save session" entry on the "File" menu is selected. <P> Clara OCR itself can choose the patterns and populate the book font. To do so, just select the "Build the font automatically" item on the "tune" tab, and classify the symbols. <P> To browse the patterns, click the "pattern" tab one or more times to enter the "Pattern (list)" window. The "PATTERN (list)" mode displays the bitmap and the properties of each pattern in a (perhaps very long) form. Click the "zoom" button to adjust the size of the pattern bitmaps. Use the scroolbar or the Next (Page Down) or Previous (Page Up) keys to navigate. Use the sort options on the "Edit" menu to change the presentation order. <P> Now press the "pattern" tab again to reach the "Pattern" window. It presents the "current" pattern with detailed properties. try activating the "show web clip" option on the "View" menu to visualize the pattern context. The left and right arrows will move to the previous and to the next patterns. To train the current pattern (being exhibited on the "Pattern" window), just press the key corresponding to its transliteration (Clara will automatically move to the next pattern) or fill the input field. There is no need to press ENTER to submit the input field contents. <P> <P> <A NAME=1.12> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.12 Useful hints</B></FONT></TD></TR></TABLE> <P> If the GUI becomes trashed or blank, press C-l to redraw it. <P> By now, the GUI do not support cut-and-paste. To save to a file the contents of the "PAGE (list)" window, use the "Write report" item on the "File" menu. <P> The "OCR" button will enter "pressed" stated in some unexpected situations, like during dialogs. This behaviour will be fixed soon. <P> The "STOP" button do not stop immediately the OCR operation in course (e.g. classification). Clara OCR only stops the operation in course in "secure" points, where all data structures are consistent. <P> The zone button allows the creation of only one zone, but the OCR won't become restricted to that zone. By now, the zone is useful only to be saved as a PBM file using the "save zone" option on the "File" menu. <P> The OCR output is automatically saved to the file page.html, where "page" is the name of the currently loaded page, without the "pbm" extension. This file is created by the "generate output" step on the menu that appears when the mouse button 2 is pressed over the OCR button. <P> The following OCR steps are currently unfinished and perform no action: "generate spelling hints", "detect blocks", and "Geometric merging". <P> <P> <A NAME=1.13> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#E2D3FC><FONT SIZE=+1><B>1.13 Fun codes</B></FONT></TD></TR></TABLE> <P> Clara OCR "fun codes" are similar to videogame "codes" (for those who have never heard about that, videogame "codes" are special sequences of mouse or key clicks that make your player invulnerable, or obtain maximum energy, or perform an unexpected action, etc). <P> The difference is that Clara OCR "fun codes" are not secret (videogame "codes" are normally secret and very hard to discover by chance). Clara OCR contains no secret feature. Fun codes are intended to be used along public presentations. By now there is only one fun code: just click one or more times the banner on the welcome window to make it scroll. <P> <P> <P> <A NAME=2.> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#79BEC6><FONT SIZE=+1><B>2. AVAILABILITY</B></FONT></TD></TR></TABLE> <P> Clara OCR is free software. Its source code is distributed under the terms of the GNU GPL (General Public License), and is available at <A HREF=http://www.claraocr.org/>http://www.claraocr.org/</A>. If you don't know what is the GPL, please read it and check the GPL FAQ at <A HREF=http://www.gnu.org/copyleft/gpl-faq.html>http://www.gnu.org/copyleft/gpl-faq.html</A>. You should have received a copy of the GNU General Public License along with this software; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. The Free Software Foundation can be found at <A HREF=http://www.fsf.org>http://www.fsf.org</A>. <P> <P> <A NAME=3.> <P><TABLE BORDER=1 WIDTH=100%><TR><TD BGCOLOR=#79BEC6><FONT SIZE=+1><B>3. CREDITS</B></FONT></TD></TR></TABLE> <P> Clara OCR was written by Ricardo Ueda Karpischek. Imre Simon contributed high-volume tests, discussions with experts, selection of bibliographic resources, propaganda and many ideas on how to make the software more useful. <P> Ricardo authored various free materials, some included in Conectiva, Debian, FreeBSD and SuSE (the verb conjugator "conjugue", the ispell dictionary br.ispell and the proxy axw3). He recently ported the EiC interpreter to the Psion 5 handheld. Imre Simon promotes the usage and development of free technologies and information from his research, teaching and administrative labour at the University. <P> Ricardo Ueda Karpischek works as an independent developer and instructor, and received no financial support to develop Clara OCR. He's not an employee of any company or organization. <P> Roberto Hirata Junior and Marcelo Marcilio Silva contributed ideas on character isolation and recognition. Richard Stallman suggested improvements on how to generate HTML output. Marius Vollmer is helping to add Guile support. Jacques Le Marois helped on the announce process. We acknowledge Mike O'Donnell and Junior Barrera for their good criticism. We acknowledge Peter Lyman for his remarks about the Berkeley Digital Library, and Wanderley Antonio Cavassin, Janos Simon and Roberto Marcondes Cesar Junior for some web and bibliographic pointers. Bruno Barbieri Gnecco provided hints and explanations about GOCR (main author: Jorg Schulenburg). Luis Jose Cearra Zabala (author of OCRE) is gently supporting our tentatives of using portions of his code. Adriano Nagelschmidt Rodrigues and Carlos Juiti Watanabe carefully tried the tutorial before the first announce. Eduardo Marcel Macan packaged Clara OCR for Debian and suggested some improvements. Mandrakesoft is hosting claraocr.org. We acknowledge Conectiva and SuSE for providing copies of their outstanding distributions. Finally, we acknowledge the late Jose Hugo de Oliveira Bussab for his interest in our work. <P> The fonts used by the "view alphabet map" feature came from Roman Czyborra's "The ISO 8859 Alphabet Soup" page at <A HREF=http://czyborra.com/charsets/iso8859.html>http://czyborra.com/charsets/iso8859.html</A>. <P> Obs. see also the Changelog (<A HREF=http://www.claraocr.org/CHANGELOG>http://www.claraocr.org/CHANGELOG</A>). <P> </HR></BODY></HTML>