Sophie

Sophie

distrib > Mandriva > 10.2 > i586 > media > contrib > by-pkgid > 9f1c334706f39a7000f5b5b20c0538e5

tokenizer-5.4.1-1mdk.i586.rpm

Description:

Tokenizer allows to segment a text in tokens, then in word-forms. The tokens
match regular expressions, and the word-forms match lexical entries compiled
with lexed. A word-form is a concatenation of tokens for a compound name.
Ambiguity between simple and coumpound words is represented through a direct
acyclic graph (DAG).

Sources packages:

Other version of this rpm: