<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html> <head> <title> ht://Dig: Overview of Programs </title> </head> <body bgcolor="#eef7ff"> <h1> Overview of Programs </h1> <p> ht://Dig Copyright © 1995-2001 <a href="THANKS.html">The ht://Dig Group</a><br> Please see the file <a href="COPYING">COPYING</a> for license information. </p> <hr size="4" noshade> <p> There are several programs in the ht://Dig package. </p> <h3> <a href="htdig.html">htdig</a> </h3> <p> Digging is the first step in creating a search database. This system uses the word <em>digging</em> while other systems call it <em>harvesting</em> or <em>gathering</em>. In the ht://Dig system, the program <a href="htdig.html">htdig</a> performs the information gathering stage. In this process, the program will act as a regular web user, except that it will follow <em>all</em> hyperlinks that it comes across. (Actually, it will not follow all of them, just those that are within the domain it needs to gather information on...)<br> Each document it goes to is examined and all the unique words in this document are extracted and stored. </p> <p> The digging process will <em>only</em> follow links and has no notion of JavaScript, applets, or user-input forms. </p> <hr noshade> <h3> <a href="htsearch.html" target="_top">htsearch</a> </h3> <p> Searching is where the users actually get to use all the information that was gathered during the dig and merge stages. The <a href="htsearch.html" target="_top"> htsearch</a> program performs the actual searches. It typically produces <code>HTML</code> output which will be seen by the users, though other text formats could be generated by editing the output templates. </p> <hr noshade> <h3> <a href="htmerge.html">htmerge</a> </h3> <p> Merging does exactly that--it merges one database into another. In previous versions of ht://Dig, the htmerge program also formed databases for use by htsearch from the htdig output. This process is now largely unnecessary except for removal of invalid URLs which is now done by the htpurge program. </p> <hr noshade> <h3> <a href="htpurge.html">htpurge</a> </h3> <p> Purging removes documents and the associated words from the databases. This should be done after running htdig to remove invalid URLs, documents marked not to be indexed, old versions of modified documents, etc. You can also specify specific URLs to be removed explicitly by htpurge. </p> <hr noshade> <h3> <a href="htload.html">htload</a> </h3> <p> Loading involves exporting the contents of the databases from formatted ASCII text documents as created by htdump or the -t flag from htdig. This is, of course, destructive by nature and data from the text files will replace any conflicting data in the databases. </p> <hr noshade> <h3> <a href="htdump.html">htdump</a> </h3> <p> Dumping involves exporting the contents of the databases to formatted ASCII text documents. This can be useful for backups, transferring databases between different operating systems, changing the compression or encodings in the ht://Dig configuration, parsing by external utilities. It is <em>not </em>recommended to edit these files by hand, so be warned! (Minor edits will probably be fine.) </p> <hr noshade> <h3> <a href="htstat.html">htstat</a> </h3> <p> The htstat program returns statistics on the databases, similar to the -s flags for some of the programs. In addition, it can return a list of URLs in the databases. </p> <hr noshade> <h3> <a href="htnotify.html">htnotify</a> </h3> <p> The ht://Dig system includes a handy reminder service which allows HTML authors to add some ht://Dig specific <a href="meta.html">meta information</a> in HTML documents. This meta information is used to email authors after a specified date. Very useful to maintain lists that contain those annoying "new" graphics with new items. (Hint: Things really aren't all that new anymore after 6 months!)<br> </p> <hr noshade> <h3> <a href="htfuzzy.html">htfuzzy</a> </h3> <p> To allow the searches to use "fuzzy" algorithms to match words, the <a href="htfuzzy.html">htfuzzy</a> program can create indexes for several different algorithms. </p> <hr size="4" noshade> Last modified: $Date: 2001/02/15 17:05:33 $ </body> </html>