Sophie

Sophie

distrib > Mandriva > 10.2 > x86_64 > by-pkgid > c5311b94fb00be1c76edfec54390ecaf

tokenizer-5.4.1-1mdk.src.rpm

Description:

Tokenizer allows to segment a text in tokens, then in word-forms. The tokens
match regular expressions, and the word-forms match lexical entries compiled
with lexed. A word-form is a concatenation of tokens for a compound name.
Ambiguity between simple and coumpound words is represented through a direct
acyclic graph (DAG).

Generated packages:

Other version of this rpm: