Collected release notes for the various distributed versions of mmorph ====================================================================== (See also the summary of changes in file 00CHANGES). Release notes for mmorph version 2.3.1 -------------------------------------- Version 2.3.1 has 2 new options to handle lookup of capitalized words: -b and -k. See manual page mmorph(1). Together with option -B, this is a first go at the capitalization problem and will probably change in the future. For the moment the assumption is that converting uppercase letters to lowercase in a word will help looking it up. For now it handles languages where there was no loss of information during capitalization such as English or Canadian French. A more robust mechanism should be provided for the cases where there is loss of information, for example when a letter lost its accent when it was capitalized like in French (é -> E). Release notes for mmorph version 2.3 ------------------------------------ Version 2.3 has 4 new options to handle record/field mode for lookup: -C classes , -B class, -U and -E. See manual page mmorph(1). Here are a few examples of use: - to lookup words in record/field mode, only those in records of class T, Compd, Abbr, Enc, Proc, Init, Tit: mmorph -C T,Compd,Abbr,Enc,Proc,Init,Tit -m rules out.seg out.lex - idem, with annotating the other records and marking unknown words with ??\?? (option -U). mtlexpunct < out.seg \ | mtlexnum \ | mmorph -U -C T,Compd,Abbr,Enc,Proc,Init,Tit -m rules > out.lex - idem, but with looking up of folded capitalized words starting sentences. Option -B specifies what is the record class that precede the first word of a sentence (e.g. Otag). mmorph -B Otag -C T,Compd,Abbr,Enc,Proc,Init,Tit -m rules out.seg out.lex Does not yet work with capitals that have lost their accent. Conversion of uppercase to lowercase is done according to the character set in effect given by the environment variable LC_CTYPE (cf. setlocale(3) and locale(5)). - two passes to extend annotations (option -E) mmorph -C T,Compd,Abbr,Enc,Proc,Init,Tit -m rules out.seg \ | mmorph -E -C Abbr -m rules_abbreviations > out.lex The number of options starts to get ridiculous. Next version will probably have three programs: for generation, simple lookup, record/field lookup. If you have problems with these changes, contact Mr. Dominique Petitpierre | Internet: petitp@divsun.unige.ch ISSCO, University of Geneva | X400: C=ch; ADMD=arcom; PRMD=switch; 54 route des Acacias | O=unige; OU=divsun; S=petitp CH-1227 GENEVA (Switzerland) | Tel: +41/22/705 7117 | Fax: +41/22/300 1086 Release notes for mmorph version 2.2 ------------------------------------ This version is faster and creates smaller files, an lets you factorize the typed feature structures in the lexical declarations. To allow this factorisation the syntax of the descriptions in @Lexicon which was like this: <LexDef> ::= LEXICALSTRING <BaseForm>? <Tfs>+ is replaced by this (cf "man 5 mmorph"): <LexDef> ::= <Tfs> <Lexical>+ <Lexical> ::= LEXICALSTRING <BaseForm>? In order to convert your files written for earlier versions of mmorph (2.1 and before), you can use the utilities you'll find in the directory ./util: swap swaps the strings and typed feature structures of lexical entries factorize factorizes the lexical entries' TFS with respect to the strings For each file containing lexical entries (@Lexicon section only, in whole or part) you can do the following (description file name is "rules", lexical entries file name is "lex"): 1) swap lex >lex.new 2) mv lex lex.old 3) factorize rules lex.new >lex If the lexical entries are at the end of the file "rules", extract them in a separate file "lex", proceed as above and then replace the lexical entries in "rules" with the new content of "lex". The utility "swap" does not handle #include directives. It also might need some adjustements (it is a sed script) or pre-processing if you have fancy layout of lexical entries. Tell me if you have problems with this conversion. If you send me your mmorph description files I can do this for you. You should get a substantial reduction on the size of the descriptions and generated database. Typically the description is only one fifth of the original size, and the generated database one third. Your mileage may vary (measure the databases size with "du -a" instead of "ls" to avoid counting the holes in the files). The utility "factorize" can be used on it own in order to restructure lexical entries written independantly, or to merge two lexical description files. Version 2.2 has four new options (cf "man mmorph"): -p to print the list of the projected tfs contained in a database: mmorph.new -p -m morph-lexicon.fr >tfs -q to print the list of all forms with their projected tfs: mmorph.new -q -m morph-lexicon.fr >forms The forms are not listed in order of generation (for that use "mmorph -n"). If option "-d 16" is used together with option -p or -q some statistics are displayed. -y parse only. Do not generate anything, just check the syntax. -z normalize, implies -y. Print on standard output the lexical entries, in normalized form. If you have problems with these changes, contact Mr. Dominique Petitpierre | Internet: petitp@divsun.unige.ch ISSCO, University of Geneva | X400: C=ch; ADMD=400net; PRMD=switch; 54 route des Acacias | O=unige; OU=divsun; S=petitp CH-1227 GENEVA (Switzerland) | Tel: +41/22/705 7117 | Fax: +41/22/300 1086