Sophie

Sophie

distrib > PLD > th > ppc > by-pkgid > 382e57b831e711a151da622da4c5e329

perl-Text-DeDuper-1.01-1.noarch.rpm

Description:

This module uses the resemblance measure as proposed by Andrei Z. Broder at al
(http://www.ra.ethz.ch/CDstore/www6/Technical/Paper205/Paper205.html) to detect
similar (near-duplicate) documents based on their text.

Note of caution: The module only works correctly with languages where texts can
be tokenised to words by detecting alphabetical characters sequences. Therefore
it might not provide very good results for e.g. Chinese.

Sources packages:

Other version of this rpm: