voikko-fi - Finnish dictionary for Voikko ========================================= General information ------------------- Voikko-fi (previously know as Suomi-malaga) is a description of Finnish morphology written for libvoikko. The implementation uses unweighted VFST format and provides format 5 Finnish dictionary for libvoikko 4.0 or later. For Voikko the morphology supports spell checking, hyphenation and grammar checking. Special support is also included for text indexer Sukija. This support includes support for common spelling mistakes, old spellings, old inflection types and old or rare words. Build and installation ---------------------- Building voikko-fi for from this package requires foma, libvoikko, python and make. No configuration is required: to build the code for Voikko, you only need to run make vvfst Installation can be done by running make vvfst-install DESTDIR=/usr/lib/voikko (Replace /usr/lib/voikko with the directory you want to install the files to. Installing to ~/.voikko will cause libvoikko to use this version of voikko-fi only for the user who does the installation.) Building the code for Sukija can be done by running make vvfst-sukija Installation can be done by running make vvfst-install-sukija DESTDIR=/usr/lib/voikko You should install Sukija binaries to the same directory that you install Voikko files. Supported Make targets ---------------------- - vvfst Builds the binary files for Finnish dictionary (libvoikko format version 5) - vvfst-sukija Builds the binary files for Finnish dictionary (libvoikko format version 5) used in Sukija indexer. - vvfst-install DESTDIR=/usr/lib/voikko Installs the version 5 binary files needed by libvoikko to the directory specified by DESTDIR. DESTDIR is optional and defaults to /usr/lib/voikko - vvfst-install-sukija DESTDIR=/usr/lib/voikko Like vvfst-install but installs the binary files build by command vvfst-sukija. - dist-gzip Builds the full source package. - clean Removes all files generated by other targets. - update-vocabulary Updates the XML vocabulary from the nightly snapshot at joukahainen.puimula.org. This target requires wget to be available. Variables for tuning the build process -------------------------------------- - make vvfst: * VVFST_BUILDDIR=path/to/directory Specifies the directory where build files are written to while building for Voikko. Default: vvfst (build within source directory) * VVFST_BASEFORMS=yes|no Include information needed for generating BASEFORM attribute. Setting this to "no" will result in a smaller dictionary file. Note that BASEFORM attribute will still be produced but its values will likely be incorrect. This option should only be disabled for application specific (embedded) dictionaries that are known to be used only for spell checking, grammar checking or hyphenation. Default: yes * GENLEX_OPTS="--option1=xxx --option2=yyy ..." Sets options string for the lexicon generator script. The available options are + --min-frequency=n Limits the words to be included in the .lex files to the specified or higher frequency class. Default is 9. + --extra-usage=usage1,usage2,... If a word has usage flags (it belongs to a special vocabulary), it is included in the vocabulary only if at least one of the usage flags is listed here. Available usage flags are listed in file vocabulary/flags.txt. Listing "sukija" here causes application specific exclusions to be ignored (words marked with not_voikko will also be included). By default, no special vocabularies are included. + --style=style1,style2,... If a word has style flags (such as old, foreign or dialect), it is included in the vocabulary only if all of the style flags are listed here. Available style flags are listed in file vocabulary/flags.txt. Default: old,international,inappropriate + --sourceid Insert word identifiers from Joukahainen to lexicon and return them during morphological analysis. This option has no effect unless VOIKKO_DEBUG=yes is set. By default source ids are not preserved. * VANHAT_MUODOT=yes|no Accept word forms that were present in old Finnish but are no longer considered valid in standard Finnish. Default: no * VOIKKO_VARIANT=variant Set the short name for the language variant of this vocabulary. The name should match the regular expression [a-z][a-z0-9_]* Default: standard * VOIKKO_DESCRIPTION="Description of the vocabulary" Set the long description for the language variant of this vocabulary. * SM_PATCHINFO="Information about applied patches" If you have modified the source code or are distributing prerelease versions, describe any modifications made to the released version here. It may be best to change this directly in the Makefile. Copyright and license information --------------------------------- This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. See file COPYING for details. Copyright (©) 2006 - 2017 Hannu Väisänen (Email: Hannu.Vaisanen@uef.fi) and 2006 - 2017 Harri Pitkänen (hatapitk@iki.fi). Contributors listed in file CONTRIBUTORS hold copyrights to the vocabulary data.