- Name: tokenizer
- Version: 5.4.1
- Release: 1mdk
- Epoch:
- Group: Sciences/Computer science
- License: GPL
- Url: http://atoll.inria.fr/~lclement/tokenizer-main.html
- Summary: Text segmenter
- Architecture: i586
- Size: 94807
- Distribution: Mandrakelinux
- Vendor: Mandrakesoft
- Packager: Guillaume Rousse <guillomovitch@mandrake.org>
Description:
Tokenizer allows to segment a text in tokens, then in word-forms. The tokens
match regular expressions, and the word-forms match lexical entries compiled
with lexed. A word-form is a concatenation of tokens for a compound name.
Ambiguity between simple and coumpound words is represented through a direct
acyclic graph (DAG).
- BuildArch:
- ExcludeArch:
- ExclusiveArch:
- Cookie: n2.mandrakesoft.com 1101393863
- Buildhost: n2.mandrakesoft.com
Generated packages:
Other version of this rpm: