WWWOFFLE - World Wide Web Offline Explorer - Version 2.6 ======================================================== The progam ht://Dig is a free (GPL) internet indexing and search program. The ht://Dig documentation describes itself as follows: The ht://Dig system is a complete world wide web indexing and searching system for a small domain or intranet. This system is *not* meant to replace the need for powerful internet-wide search systems like Lycos, Infoseek, Webcrawler and AltaVista. Instead it is meant to cover the search needs for a single company, campus, or even a particular sub section of a web site. As opposed to some WAIS-based or web-server based search engines, ht://Dig can span several web servers at a site. The type of these different web servers doesn't matter as long as they understand the HTTP 1.0 protocol. ht://Dig was developed at San Diego State University as a way to search the various web servers on the campus network. I have written WWWOFFLE so that ht://Dig can be used with it to allow the entire cache of pages can be indexed. There are three stages to using the program that are described in this document; installation, digging and searching. Installing ht://Dig ------------------- Note: If you already have version 3.1.0b4 or later of htdig installed and working then you can skip this section. To be able to use this program it must be installed. The instructions below give a step-by-step guide to this process assuming that version 3.1.0b4 of ht://Dig is used, later versions should also work. 1) Get the ht://Dig source code Download the source for the ht://Dig programs from http://www.htdig.org/files/ 2) Unpack the source code Use tar -xvzf htdig-3.1.0b4.tar.gz to create the directory htdig-3.1.0b4 with the program source files in. 3) Configure the ht://Dig program Move to the htdig-3.1.0b4 directory and run the configuration program cd htdig-3.1.0b4 ./configure 4) Compile ht://Dig Run make to compile htdig make make install This will compile and install it. Any problems at this stage will require the use of the ht://Dig documentation to solve. Configure WWWOFFLE to run with ht://Dig --------------------------------------- The configuration files for the ht://Dig programs as used with WWWOFFLE will have been installed in /var/spool/wwwoffle/html/search/htdig/conf when WWWOFFLE was installed. The scripts used to run the htdig programs will have been installed in /var/spool/wwwoffle/html/search/htdig/scripts when WWWOFFLE was installed. These files should be correct if the information in the WWWOFFLE Makefile (LOCALHOST and SPOOLDIR) was set correctly. Check them, they should have the spool directory and the proxy hostname and port set correctly. Also they should be checked to ensure that the ht://Dig programs are on the path (you can edit the PATH variable here if they are not in /usr/local/bin). The merging process can use a lot of disk space when the sort program is run, you can change the location of the temporary directory used for this with the TMPDIR variable. The Fuzzy Database ------------------ The ht://Dig programs use a database of fuzzy word endings and synonyms. This needs to be created just once, there is a script provided with WWWOFFLE that does this. /var/spool/wwwoffle/html/search/htdig/scripts/wwwoffle-htfuzzy If you have an existing ht://Dig installation then this step will probably have already been performed and is not required again. Note: When you do this it will take a *long* time since it produces two databases that htsearch uses to help in matching words. Digging and Merging ------------------- Digging is the name that is given to the process of searching through the web-pages to make the list of words. Merging is the process of converting the raw list of words into a database that can be searched. The ht://Dig installation will include a script called 'rundig' that demonstrates how digging and merging is supposed to work. To work with WWWOFFLE I have produced my own scripts that should be used instead. /var/spool/wwwoffle/html/search/htdig/scripts/wwwoffle-htdig-full /var/spool/wwwoffle/html/search/htdig/scripts/wwwoffle-htdig-incr /var/spool/wwwoffle/html/search/htdig/scripts/wwwoffle-htdig-lasttime The first of these scripts will do a full search and index all of the URLs in the cache. The second one will do an incremental search and will only index those that have changed since the last full search was done. The third will add in the files in the lasttime index into the database. Unfortunately due to the way that the htmerge program works, it will take almost as long to do an incremental search or a lasttime search as to do a full search. The only differnce is that for the incremental search and lasttime search the WWWOFFLE cache is only accessed for the files that have changed. Searching --------- The search page for ht://Dig is located at http://localhost:8080/search/htdig/ and is linked to from the "Welcome Page". The word or words that you want to search for should be entered here. This form actually calls the script /var/spool/wwwoffle/html/search/htdig/scripts/wwwoffle-htsearch to do the searching so it is possible to edit this to modify it if required. Thanks to --------- I would like to thank the htdig maintainer (Geoffrey.R.Hutchison@williams.edu) for the help that he has provided to get me started with htdig and the patches and comments that he has accepted from me into the htdig program. Andrew M. Bishop 13th Aug 2000