Sophie: mmorph-2.3.4.2-9mdv2010.1 x86

mmorph-2.3.4.2-9mdv2010.1.x86_64.rpm

Collected release notes for the various distributed versions of mmorph
======================================================================

(See also the summary of changes in file 00CHANGES).

Release notes for mmorph version 2.3.1
--------------------------------------

Version 2.3.1 has 2 new options to handle lookup of capitalized words:  -b
and -k.  See manual page mmorph(1).

Together with option -B, this is a first go at the capitalization problem
and will probably change in the future.  For the moment the assumption is
that converting uppercase letters to lowercase in a word will help looking
it up.  For now it handles languages where there was no loss of information
during capitalization such as English or Canadian French.  A more robust
mechanism should be provided for the cases where there is loss of
information, for example when a letter lost its accent when it was
capitalized like in French (é -> E).


Release notes for mmorph version 2.3
------------------------------------

Version 2.3 has 4 new options to handle record/field mode for lookup:  -C
classes , -B class, -U and -E.  See manual page mmorph(1).

Here are a few examples of use:

- to lookup words in record/field mode, only those in records of class T,
  Compd, Abbr, Enc, Proc, Init, Tit:

  mmorph -C T,Compd,Abbr,Enc,Proc,Init,Tit -m rules out.seg out.lex

- idem, with annotating the other records and marking unknown words with
  ??\?? (option -U).

    mtlexpunct < out.seg \
    | mtlexnum \
    | mmorph -U -C T,Compd,Abbr,Enc,Proc,Init,Tit -m rules > out.lex

- idem, but with looking up of folded capitalized words starting sentences.
  Option -B specifies what is the record class that precede the first word of
  a sentence (e.g. Otag).

    mmorph -B Otag -C T,Compd,Abbr,Enc,Proc,Init,Tit -m rules out.seg out.lex

  Does not yet work with capitals that have lost their accent.  Conversion
  of uppercase to lowercase is done according to the character set in
  effect given by the environment variable LC_CTYPE (cf.  setlocale(3) and
  locale(5)).

- two passes to extend annotations (option -E)

    mmorph -C T,Compd,Abbr,Enc,Proc,Init,Tit -m rules out.seg \
    | mmorph -E -C Abbr -m rules_abbreviations > out.lex

The number of options starts to get ridiculous.  Next version will probably
have three programs:  for generation, simple lookup, record/field lookup.

If you have problems with these changes, contact
Mr. Dominique Petitpierre    | Internet: petitp@divsun.unige.ch
ISSCO, University of Geneva  | X400: C=ch; ADMD=arcom; PRMD=switch;
54 route des Acacias         |       O=unige; OU=divsun; S=petitp
CH-1227 GENEVA (Switzerland) | Tel: +41/22/705 7117 | Fax: +41/22/300 1086


Release notes for mmorph version 2.2
------------------------------------

This version is faster and creates smaller files, an lets you factorize the
typed feature structures in the lexical declarations.

To allow this factorisation the syntax of the descriptions in @Lexicon
which was like this:

             <LexDef>          ::= LEXICALSTRING <BaseForm>? <Tfs>+

is replaced by this (cf "man 5 mmorph"):

             <LexDef>          ::= <Tfs> <Lexical>+
             <Lexical>         ::= LEXICALSTRING <BaseForm>?

In order to convert your files written for earlier versions of mmorph (2.1
and before), you can use the utilities you'll find in the directory ./util:

    swap	swaps the strings and typed feature structures of lexical
	        entries
    factorize   factorizes the lexical entries' TFS with respect to the strings

For each file containing lexical entries (@Lexicon section only, in whole
or part) you can do the following (description file name is "rules",
lexical entries file name is "lex"):

    1) swap lex >lex.new
    2) mv lex lex.old
    3) factorize rules lex.new >lex

If the lexical entries are at the end of the file "rules", extract them in
a separate file "lex", proceed as above and then replace the lexical
entries in "rules" with the new content of "lex".

The utility "swap" does not handle #include directives.  It also might need
some adjustements (it is a sed script) or pre-processing if you have fancy
layout of lexical entries.

Tell me if you have problems with this conversion.  If you send me your
mmorph description files I can do this for you.

You should get a substantial reduction on the size of the descriptions and
generated database.  Typically the description is only one fifth of the
original size, and the generated database one third.  Your mileage may vary
(measure the databases size with "du -a" instead of "ls" to avoid counting
the holes in the files).


The utility "factorize" can be used on it own in order to restructure
lexical entries written independantly, or to merge two lexical description
files.


Version 2.2 has four new options (cf "man mmorph"):

 -p to print the list of the projected tfs contained in a database:
        mmorph.new -p -m morph-lexicon.fr >tfs

 -q to print the list of all forms with their projected tfs:
        mmorph.new -q -m morph-lexicon.fr >forms
    The forms are not listed in order of generation (for that use 
    "mmorph -n").

If option "-d 16" is used together with option -p or -q some statistics are
displayed.

 -y parse only. Do not generate anything, just check the syntax.

 -z normalize, implies -y.  Print on standard output the lexical entries, in
    normalized form.

If you have problems with these changes, contact
Mr. Dominique Petitpierre    | Internet: petitp@divsun.unige.ch
ISSCO, University of Geneva  | X400: C=ch; ADMD=400net; PRMD=switch;
54 route des Acacias         |       O=unige; OU=divsun; S=petitp
CH-1227 GENEVA (Switzerland) | Tel: +41/22/705 7117 | Fax: +41/22/300 1086