Sophie

Sophie

distrib > Mandriva > 9.1 > ppc > by-pkgid > 9f89c06c1b9911b16886321f484d4836 > files > 29

pspell-0.12.2-7mdk.ppc.rpm

Subsections
   
  * 4.1 Overview
  * 4.2 Usage
  * 4.3 Class Reference
  * 4.4 Available Options
  * 4.5 Format of the PWLI Files
  * 4.6 Examples
  * 4.7 Rational
      + 4.7.1 store_repl method

--------------------------------------------------------------------------

4. Library Interface

4.1 Overview

The Pspell library contains two main classes and several helper classes.
The two main classes are PspellConfig and PspellMaster. The PspellConfig
class is used to set initial defaults and to change spell checker specific
options. The PspellManager class does most of the real work. It is
responsible for managing the dictionaries, checking if a word is in the
dictionary, and coming up with suggestions among other things. There are
many helper classes the important ones are PspellWordList,
PspellMutableWordList, Pspell*Emulation. The PspellWordList classes is
used for accessing the suggestion list, as well as the personal and
suggestion word list currently in use. The PspellMutableWordList is used
to manage the personal, and perhaps other, word lists. The Pspell*
Emulation classes are used for iterating through a list.

A C and C++ Interface is provided. I recommend using the C interface, even
if your program is in C++, to avoid some of the nasty issues associated
with C++ linkage. In general one can only use C++ linkage if both the
library and the program were created with the same compiler. I may
eventually provide C++ wrapper classes, including a few STL like one, for
the C library and remove the existing C++ interface all together.

The mapping between the C and C++ interface is pretty straightforward and
from C++ to C goes as follows:

    <class name in lowercase with underscores>_<method name>([const] <
    Class> *, <other parameters if any>)

For example "PspellManager::lang_name() const" would become
"pspell_manager_lang_name(const PspellManager *)".

Methods that return a bool will instead return an int in the C interface.

4.2 Usage

To use pspell your application should include "pspell/pspell.h". In
order to insure that all the necessary libraries are linked in libtool
should be used to perform the linking. When using libtool simply linking
with "-lpspell" should be all that is necessary. When using shared
libraries you might be able to simply link "-lpspell", but this is not
recommended. This version of Pspell uses the CVS version of libtool
(multi-language-branch) however released versions of libtool should also
work.

When your application first starts you should get a new configuration
class with the command:

    PspellConfig * spell_config = new_pspell_config();

which will create a new PspellConfig class. It is allocated with new and
it is your responsibility to delete it with delete_pspell_config. The
standard C++ delete can be used if the compiler is compatible with the one
used to create the Pspell library. Once you have the config class you
should set some variables. The most important one is the language
variable. To do so use the command:

    pspell_config_replace(spell_config, "language-tag", "en_US");

which will set the default language to use to American English. The
language is expected to be the standard two letter ISO 639 language code,
with an optional two letter ISO 3166 country code after an underscore. You
can set the preferred spelling via the "spelling" option, any extra info
via the "jargon" option, and the encoding via the "encoding" option.
Other things you might want to set is the preferred spell checker to use,
the search path for dictionary's, and the like see section 4.4 for the
available options.

When ever a new document is created a new PspellManager class should also
be created. There should be one manager class per document. To create a
new manager class use the new_pspell_manager and then cast it up using
to_pspell_manager like so.

    PspellCanHaveError * possible_err = new_pspell_manager(spell_config);
    PspellManager * spell_checker = 0;
    if (pspell_error_number(possible_err) != 0)
      puts(pspell_error_message(possible_err));
    else
      spell_checker = to_pspell_manager(possible_err);

which will create a new PspellManager class using the defaults found in
spell_config. If C++ is being used AND the compiler is compatible with the
one used to create the Pspell library a normal cast can be used instead of
to_pspell_manager.

If for some reason you want to use different defaults simply clone
spell_config and change the setting like so:

    PspellConfig * spell_config2 = pspell_config_clone(spell_config);
    pspell_config_replace(spell_config2, "language-tag","nl");
    possible_err = new_pspell_manager(spell_config2);
    delete_pspell_config(spell_config2);

Once again in C++ delete_pspell_config can be replaced with a simple C++
delete. Once the manager class is created you can use the check method to
see if a word in the document is correct like so:

    int correct = pspell_manager_check(spell_checker, <word>, <size>);

<word> can is expected to a const char * character string. If the encoding
is set to be "machine unsigned 16" or "machine unsigned 32". <word> is
expected to be a cast from either const u16int * or const u32int*
respectfully. U16int and u32int are generally unsigned short and unsigned
int respectfully. <size> is the length of the string or -1 if the sting is
null terminated. If the string is a cast from const u16int * or const
u32int * then size is the amount of space in bytes the string takes up
after being casted to const char * and not the true size of the string.
Pspell_manager_check will return 0 is it is not found and non-zero
otherwise.

If the word is not correct than the suggest method can be used to come up
with likely replacements.

    PspellWordList * suggestions = pspell_manager_suggest(spell_checker, 
                                                          <word>, <size>);
    PspellStringEmulation * elements = pspell_word_list_elements
    (suggestions);
    const char * word;
    while ( (word = pspell_string_emulation_next(pspell_elements) != NULL
     ) {
      // add to suggestion list
    }
    delete_pspell_string_emulation(elements);

Notice how elements is deleted but suggestions is not. The value returned
by suggestions is only valid to the next call to suggest. Once a
replacement is made the store_repl method should be used to communicate
the replacement pair back to the spell checker (see section 4.7.1 for
why). It usage is as follows:

    pspell_manager_store_repl(spell_checker, 
                              <misspelled word>, <size>,
                              <correctly spelled word>, <size>);

If the user decided to add the word to the session or personal dictionary
the the word can be be added using the add_to_session or add_to_personal
methods respectfully like so:

    pspell_manager_add_to_session|personal(spell_checker, <word>, <size>);

It is better to let the spell checker manage these words rather than doing
it your self so that the words have a change of appearing in the
suggestion list.

Finally, when the document is closed the PspellManager class should be
deleted like so.

    delete_pspell_manager(spell_checker);

The standard C++ delete should NOT be used here because it will not unload
any shared libraries pulled in my when the manager class is created.

4.3 Class Reference

Methods that return a bool generally return false on error and true other
wise. To find out what went wrong use the error_number and error_message
methods. Unless otherwise stated methods that return a const char * will
return null on error. The charter string returned is only valid until the
next method which returns a const char * is called.

All methods are virtual and abstract, thus these classes are really
abstract base classes. Therefore you cannot simply store the object
directly. In order to make copies of the objects use the clone and assign
methods if they are provided.

For the details of the various classes please see the header files. In the
future I will generate class references using some automated tool.


4.4 Available Options

The following options are available to control which word list Pspell
selects.

language-tag <string>
    the language code which consists of the two letter ISO 639 language
    code and an optional two letter ISO 3166 country code after a dash or
    underscore.
spelling <string>
    the requested spelling for languages with more than one spelling such
    as English. Known values are "american", "britsh", and
    "canadian". This information is normally inferred from the
    language-tag option. For example the language tag "en_GB" will set
    spelling to "british".
jargon <string>
    an extra information to distinguish two different words lists that
    have the same language-tag and spelling.
word-list-path <list>
    search path for word list information files
module-search-order <list>
    list of available modules, modules that come first on this list have a
    higher priority

The following options control the behavior of the selected module. Not all
modules support all options.

encoding <string>
    encoding that words are expected to be in. Valid values are "utf-8",
    "iso8859-*", "koi8-r", "viscii", "cp1252", "machine unsigned
    16", "machine unsigned 32".
ignore <int>
    ignore all words which are not at least as long as the value for this
    setting
personal <file>
    file name of the personal word list to use. Start it with "./" to
    look for the file in the current directory rather than the home
    directory.
repl <file>
    file name of the replacement word list to use. Start it with "./" to
    look for the file in the current directory rather than the home
    directory.
save-repl <boolean>
    save the replacement word list on calls to save_all_word_lists.
ignore-repl <boolean>
    ignore calls to Manager::store_replacement.
sug-mode <string>
    the suggestion mode, known values are fast, normal, and bad-spellers
run-together <boolean>
    consider run-together words as legal compounds.

The following options may be examined to tell exactly what word list or
module was selected

master
    the full path of the word list selected
master-flags
    any special flags that were passed on to the module
module
    the module selected

The options, spelling and jargon can also be examined.

<string> options may be set to anything, including in some cases an empty
string. <int> options must be set to a valid integer string. <boolean>
options must be set to "true" or "false". <list> options can not be
set directly, you must use the option add-<option> to add an item to the
list, rem-<option> to remove an item, or rem-all-<option> to remove all
the items. In the case of rem-all-<option> the value should be an empty
string. Although the standard retrieve method will work for a string, it
should not be used as the format of the string is implementation
dependent. Use the retrieve_list method instead.

4.5 Format of the PWLI Files

In order for Pspell to know which word lists to use each word list must
have at least one PWLI file in the pspell data directory which is normally
/usr/local/share/pspell/, use "pspell-config pkgdatadir" to find out
what it is on your system.

Each PWLI has the the following name:

    <language>[-[<spelling>][-<jargon>]]-<module>.pwli

Where <language> is the two letter language code, <spelling> is the
particular spelling your interested in if the languages has multiple
spelling in different parts of the world such as English, <jargon> is any
extra informations to distinguish the word list from other ones with the
same language and spelling, and <module> is the pspell module the main
word list is for.

For example:

    en-aspell.pwli
    en-american-aspell.pwli
    en-american-medical-ispell.pwli
    en-american-xlg-ispell.pwli
    de--medical-ispell.pwli

Notice how if the spelling is left out but the jargon is not there needs
to be two dashes between the language and the jargon.

Each PWLI file then contains exactly one line which contains the full path
of the main word list, white space, then any additional options to pass
onto the module.

4.6 Examples

Two simple examples are included in the examples directory. Pspell must be
installed before they will compile and at least one pspell module must be
installed before they will run. To build the C example type "make
example-c" and to build the C++ examples type "make example-cxx".

4.7 Rational


4.7.1 store_repl method

This method is needed because Aspell (http://aspell.sourceforge.net/) is
able to learn from users misspellings. For example on the first pass a
user misspells beginning as beging so aspell suggests:

    begging, begin, being, Beijing, bagging, ....

However the user then tries "begning" and aspell suggests

    beginning, beaning, begging, ...

so the user selects beginning. However than, latter on in the document the
user misspelles it as begng (NOT beging). Normally aspell will suggest.

    began, begging, begin, begun, ....

However becuase it knows the user mispelled beginning as beging it will
instead suggest:

    beginning, began, begging, begin, begun ...

I myself often misspelled beginning (and still do) as something close to
begging and two many times wind up writing sentences such as "begging with
....".

--------------------------------------------------------------------------