Sophie

Sophie

distrib > Mandriva > 2008.1 > x86_64 > by-pkgid > bc542aa8d7e880741a812d5695dce412 > files > 7

gocr-0.44-3mdv2008.1.x86_64.rpm

History: (Changes,ChangeLog)
 0.44pre7
       add volume to boxes (negative means white areas inside black areas)
       Fix overflow in despeckling routine (verbose mode, durt removing)
       reactivate composed chars, fix merge_boxes
       fix problems with uncertain line detection and not recognized "7"
       option -a has an effect now for the output
       adaptions to MICR E13-B font (see GnuMICR), ToDo: 4 extra-chars
       fix num_boxes in merge_boxes (affects line detection)
       reduce 2 prompts to one per char in database mode, ^A for skip all
       fix problem with smaller headlines
       fix problems with tall font (4)
       fix includes for non-linux-platforms

 0.43  fix problem with dark frame around image
       support multiple images, ex: giftopnm -image=all a.gif | gocr -
       invert if obviously white on black (black_mass>=4*white_mass)
       improve thresholding for discrete histograms
         (note: this can particularly lead to bad results, will be fixed later)
       speedup for big boxes (especially dark background)
       fix memory leak (setas(same string) + detect_barcode)
       fix uninitialized variables after insert spaces (num_frames)
       fix frame_vector for single pixels (twice + ERROR idx out of range)

 0.42  further parts of recognition engine relaced by vector version
       changed colored debug output for out??.png
       division of glued chars replaced (slower but more accurate)
       fix framing of small font
       fix problem with uninitialized pnm_readpaminit call (CPS 21Nov06)
       better progress output (see progress.[ch]), new image debug output
       switch to the new improved rotation detection

 0.41  (buggy if --with-netpbm=no, apply the pgm-patch!)
       otsu.c concentrates now only on high contrast regions
       fix pnm reads for 2 byte pixels (--with-libpbm=no) 
       update man-page (mail me your suggestions)
       fix g++ warnings, float-OPs replaced by int-OPs
       spacing reviewed; make distance() more sensitive
       xml-objects (barcode, melted chars) now also handled with weights
       fix division by zero bug for vertical positioned characters
       default output is UTF8 now, UTF-encoding bug fixed
       added certainty option 
       added uninstall to Makefile
       debug image format changed to png (using pipe) or ppm (fall-back)
       much better word spacing (line-by-line based)
       better DOT_ABOVE recognition
       fix output of char groups or strings stored in database, utf8 input
       fix buffer overflow in barcode decode39
       fix lost comma on end of line
       internal vector format added for future use (faster, scalable, rotable)
       line detection extended
       internal list management rewritten to fix memory leaks and segfaults

 0.40  update PNM file reader to maxval > 255
       (make rpm) updated
       barcode-patch UPC_addon by Michael van Rooyen
       CAPITAL_LETTER_A_WITH_OGONEK added
       no "(PICTURE)" output for UTF8+ASCII (better for Mobile OCR project)
       smooth_borders() bug fixed and reworked
       5x7 and prop10 font adaptions
       objects now detected by flood-fill algorithm (better?)
       XML-output changed
       changed auto dust detection (not final)

 0.39  XML output added (subject of change, suggestions are welcome)
       netpbm-link-error fixed in gocr.c and configure.in:
         gocr.c: <config.h> changed to "config.h"
       configure-option --with-netpbm=PATH and --without-netpbm added
       update configure.in according to autoconf 2.57  
       wchar_h miss-configuration fixed in pgm2asc.c
       fix compiler warnings
       char filter accepts abbreviations now, like "0-9A-F" (but slow)
       update READMEde.txt
       output barcode tags (also improved recognition)
       fix pnm.c for files like example.eps.pbm
       fix detect.c for barcodes
       fix ocr0n.c 0<->8g

 0.38  move UTF/HTML/TeX decoding to getTextLine, return (char *) now
       out_format HTML step towards detailed XML output
       correct line detection for footnotes (detect.c)
       "y" now seen as vowel (pgm2asc.c), I<vowel> susbtituted by l<vowel>
       &eacute;-detection, &aacute;-output fixed
       default dust_size is -1 now (auto detection = mean_size/10)
       char filter added
        ex: -C 0123456789ABCDEF    - recognize only hexcodes
       man page updated (hopefully correct syntax)
       database bug fixed (small fonts, example by Chris)
       several bugs fixed by W. Webber (thanks)
       speed improved by 3rd-pass matrix filter in pixel() (pixel.c) (code from W. Webber)
       bug in remove_dust (remove.c) fixed
       for fonts bigger than 20x40 smooth_borders() changed (b/w-scans)
       bug in O0-detection fixed

 0.37   best-fit generates probability, not perfect but better results
        bug in line detection removed (happens for lot of small boxes)
        progress output (option -x <fileID|fname>)
        counting versions number as floating point now
        MACRON and DOT_ABOVE (not complete) defined (latin2)
        adaptions for 5x7 and 6x12 screen font
	doc/ocr.tex changed to doc/gocr.html (now independent of LaTeX)
        symbols {} added
        OCR-B font tested succesfull
        better headline/picture distinction
        bug removed (struct box.modifier is wchar_t now)

        known bugs: to much newlines

 0.3.6
        CARON and Omega defined,
        output of not defined chars (HTML="&#xxx;", TeX="\symbol{xxx}")
        system dependend bug: isupper(>255) SIGSEGV fixed
        better line detection for lines with lowercase chars only
        lot of possible SIGSEGV in list_del() fixed
        barcode recognition (UPC,code128)
        .ps .eps via pstopnm supported
        -m 256 switches off the main ocr engine (usefull together with -m 2 for identical chars)
        strings added to database ("ff","ft","special-symbol")
        gocr.tcl adapted to gocr v0.3
        internal detection probability introduced

 0.3.5
        minor and major fixes (string\0 bugs)
        memory leak fixes by Duncan Edwards
        layout analysis or zoning (-m 4) improved, 
        now it detects pictures and columns much better
        the behavior of setting threshold (-l) is slightly changed
        wcsdup defined for non-gnu-systems (BSD), further Problems?
        better context correction for 10 (IO,lO)
        Fixes for S.Koledin examples "GlS"
        Euro-currency-sign detection added
        better pitch estimation for proportional font (needs to be improved)
        make install DESTDIR= instead configure --prefix= (better?)
        use wchar_t by default, more simple code and -f works with nonLinuxOS
        line detection more robust against vertical glued chars (js)
        -f UTF8 added (usefull for xterm -u8), should be default?
        handle vertical glued boxes (ex: g over T)
 0.3.4
        some BSD adaptions (no WCHAR?), tell me if there are still problems
        use unicode in database (4-8 hex digits)
        new option: -p database_path/
        TILDE fixed, #, &AElig;, &Aring;, etc. added (swedish,norwegian)
        layout analysis improved
 0.3.3
        database (-m 2) bug fixed and interactive mode (-m 130) added
          its not finished, but you can test it
          result should be ok for machine generated images (no scans)           
        engine improved a bit
 0.3.2
        ocr-engine improved for screen fonts (thanks for examples)
        option -f [HTML,TeX,...] added
 0.3.1
        make install updated
 0.3.0  some parts of the code reviewed (most work done by Bruno Barberi Gnecco)
        tkispell patch from David Pinson (exec bug fixed)
	gnome frontend added (Dany De Bontrider)
        acute, grave, circumflex ... detection
        C++ parts rewritten into C, and much more (see REVIEW)
 0.2.7  lib-patch from Klaas Freitag inserted, engine improved
        option -n 1 detect only numbers, get threshold value by otsu.cc
        xxx.pnm.bz2 can be used on linux systems bzip2 installed
 0.2.6  pipes used on POSIX2-systems for easier use of jpg,gif,tiff,pnm.gz-files
        example: gocr text.jpg; gocr text.pnm.gz
        verbose output on stderr, text output on stdout,
        redirection of output possible (-e, -o, example: -e /dev/stdout)
        engine upgraded a bit  (thx for the new sample files)
        gocr.tcl upgraded (save options, save text)
        DOS/WIN95-EXE created, download GOCREXE.ZIP (v0.2.5)
 0.2.5  program convert renamed to jconv
        you can choose stdin as input now, for using conversion tools
	example: djpeg -pnm -gray text.jpg | gocr -i -
        option "--help" added, some bugs removed
	amiga.h added for SAS/C under AmigaOS (suggested by Uffe Holst)
	line detection changed (faster?)
	importing gocr in your C++ application is easier now (see fkt pgm2asc)
	argument can be given instead of option -i (this is more natural)
	some reorganization of code (not finished)
	2000 downloads counted !!! Jun2000
        SourceForge.net used for gocr (project: jocr, other gocr exist there)
        bugs in dust removing, line detection and zoning fixed (rewritten)
        first version of tcl/tk-GUI, test it!
        rekursive function frame_nn() replaced by labyrint-algorithm (no extensiv stack used)
        gluing of broken chars added, removing glued serifs (on small fonts)
        new bugs added :;
 0.2.4a2 some details are added (better dust removing and char division)
 0.2.4  three char division (connected chars), dust removing
 0.2.3	add layout analysis (very slowly, try -m 4), engine modified
        better distance function, engine updated, database added for testing
	1000 downloads counted !!! May2000
 0.2.2	gocr_0_2.tgz expands into gocr_0_2 directory (thanks to zz99zz)
	engine upgraded a bit, some bugs fixed (umlaut, thin lines)
	short documentation added (ocr.tex)
	colored output (out30.bmp) for test/development-mode
	bug: read ASC-PBM and PCX (1 bit) fixed
 0.2.1	first official release on freshmeat.net March 2000
 0.2	line scanning added
 0.1	project started (not documented), autumn 1998 - summer 1999