<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html> <head> <title> ht://Dig: htfuzzy </title> </head> <body bgcolor="#eef7ff"> <h1> htfuzzy </h1> <p> ht://Dig Copyright © 1995-2004 <a href="THANKS.html">The ht://Dig Group</a><br> Please see the file <a href="COPYING">COPYING</a> for license information. </p> <hr size="4" noshade> <dl> <dd> <h2> Synopsis </h2> </dd> <dd> htfuzzy [-c <em>configfile</em>][-v] <em>algorithm</em> ... </dd> </dl> <dl> <dd> <h2> Description </h2> </dd> <dd> Htfuzzy creates indexes for different "fuzzy" search algorithms. These indexes can then be used by the <a href="htsearch.html" target="_top">htsearch</a> program. </dd> </dl> <dl> <dd> <h2> Options </h2> </dd> <dd> <dl compact> <dt> -c <em>configfile</em> </dt> <dd> Use the specified configuration file instead of the default. </dd> <dt> -v </dt> <dd> Verbose mode. Used once will provide progress feedback, used more than once will overflow even the biggest buffers. :-) </dd> </dl> </dd> </dl> <dl> <dd> <h2> Algorithms </h2> </dd> <dd> Indexes for the following search algorithms can currently be created: <dl> <dt> <strong>soundex</strong> </dt> <dd> Creates a slightly modified <a href="http://www.sog.org.uk/cig/vol6/605tdrake.pdf">soundex</a> key database. A soundex key encodes letters as digits, with similar sounding letters (c, k, q) given the same digit. Vowels are not coded. Differences with the standard soundex algorithm are: <ul> <li> Keys are 6 digits. </li> <li> The first letter is also encoded. </li> </ul> </dd> <dt> <strong>metaphone</strong> </dt> <dd> Creates a metaphone key database. This algorithm is more specific to English, but will get fewer "weird" matches than the soundex algorithm. </dd> <dt> <strong>accents</strong> </dt> <dd> Creates an accents key database. This algorithm will map all accented letters to their unaccented counterparts, so that a search for the unaccented word will yield all variations of this word with accents. </dd> <dt> <strong>endings</strong> </dt> <dd> Creates two databases which can be used to match common word endings. The creation of these databases requires a list of affix rules and a dictionary which uses those affix rules. The format of the affix rules and dictionary files are the ones used by the <a href="http://fmg-www.cs.ucla.edu/fmg-members/geoff/ispell.html"> ispell</a> program. Included with the distribution are the affix rules for English and a fairly small English dictionary. Other languages can be supported by getting the appropriate affix rules and dictionaries. These are available for many languages; check the ispell distribution for more details. </dd> <dt> <strong>synonyms</strong> </dt> <dd> Creates a database of synonyms for words. It reads a text database of synonyms and creates a database that htsearch can then use. Each line of the text database consists of words where the first word will have the other words on that line as synonyms. </dd> </dl> </dd> </dl> <dl> <dd> <h2> Files </h2> </dd> <dd> <dl> <dt> <a href="attrs.html#config_dir">CONFIG_DIR</a>/htdig.conf </dt> <dd> The default configuration file. </dd> </dl> <dl> <dt> <a href="attrs.html#database_dir">DATABASE_DIR</a>/db.accents.db </dt> <dd> (Output) Maps between characters with and without accents for accents fuzzy rule </dd> </dl> <dl> <dt> <a href="attrs.html#database_dir">DATABASE_DIR</a>/db.metaphone.db </dt> <dd> (Output) Database of similar-sounding words for metaphone fuzzy rule </dd> </dl> <dl> <dt> <a href="attrs.html#database_dir">DATABASE_DIR</a>/db.soundex.db </dt> <dd> (Output) Database of similar-sounding words for soundex fuzzy rule </dd> </dl> <dl> <dt> <a href="attrs.html#common_dir">COMMON_DIR</a>/english.0, <a href="attrs.html#common_dir">COMMON_DIR</a>/english.aff </dt> <dd> (Input) List of words and affix rules used to generate endings </dd> </dl> <dl> <dt> <a href="attrs.html#common_dir">COMMON_DIR</a>/root2word.db, <a href="attrs.html#common_dir">COMMON_DIR</a>/word2rood.db </dt> <dd> (Output) Database used for endings fuzzy rule </dd> </dl> <dl> <dt> <a href="attrs.html#common_dir">COMMON_DIR</a>/synonyms </dt> <dd> (Input) List of groups of words considered synonymous </dd> </dl> <dl> <dt> <a href="attrs.html#common_dir">COMMON_DIR</a>/synonyms.db </dt> <dd> (Output) Database used for synonyms fuzzy rule </dd> </dl> </dd> </dl> <dl> <dd> <h2> See Also </h2> </dd> <dd> <a href="htdig.html">htdig</a>, <a href="htmerge.html">htmerge</a>, <a href="htsearch.html" target="_top">htsearch</a>, <a href="attrs.html">Configuration file format</a>, and <a href="http://fmg-www.cs.ucla.edu/fmg-members/geoff/ispell.html"> ispell</a>. </dd> </dl> <hr size="4" noshade> Last modified: $Date: 2004/06/12 13:39:13 $ </body> </html>