Sophie

Sophie

distrib > Mandriva > 8.2 > i586 > media > contrib > by-pkgid > aa833577325fdad442d6d6081d7de905 > files > 28

wwwoffle-2.6d-1mdk.i586.rpm

          WWWOFFLE - World Wide Web Offline Explorer - Version 2.6
          ========================================================


The progam ht://Dig is a free (GPL) internet indexing and search program.  The
ht://Dig documentation describes itself as follows:

        The ht://Dig system is a complete world wide web indexing and
        searching system for a small domain or intranet. This system
        is *not* meant to replace the need for powerful internet-wide
        search systems like Lycos, Infoseek, Webcrawler and AltaVista.
        Instead it is meant to cover the search needs for a single
        company, campus, or even a particular sub section of a web site.

        As opposed to some WAIS-based or web-server based search
        engines, ht://Dig can span several web servers at a site.  The
        type of these different web servers doesn't matter as long as
        they understand the HTTP 1.0 protocol.

        ht://Dig was developed at San Diego State University as a way
        to search the various web servers on the campus network.


I have written WWWOFFLE so that ht://Dig can be used with it to allow the
entire cache of pages can be indexed.  There are three stages to using the
program that are described in this document; installation, digging and
searching.


Installing ht://Dig
-------------------

Note: If you already have version 3.1.0b4 or later of htdig installed and
      working then you can skip this section.

To be able to use this program it must be installed.  The instructions below
give a step-by-step guide to this process assuming that version 3.1.0b4 of
ht://Dig is used, later versions should also work.

1) Get the ht://Dig source code

Download the source for the ht://Dig programs from

        http://www.htdig.org/files/


2) Unpack the source code

Use

        tar -xvzf htdig-3.1.0b4.tar.gz

to create the directory htdig-3.1.0b4 with the program source files in.


3) Configure the ht://Dig program

Move to the htdig-3.1.0b4 directory and run the configuration program

        cd htdig-3.1.0b4
        ./configure


4) Compile ht://Dig

Run make to compile htdig

        make
        make install

This will compile and install it.  Any problems at this stage will require the
use of the ht://Dig documentation to solve.


Configure WWWOFFLE to run with ht://Dig
---------------------------------------

The configuration files for the ht://Dig programs as used with WWWOFFLE will
have been installed in /var/spool/wwwoffle/html/search/htdig/conf when WWWOFFLE
was installed.  The scripts used to run the htdig programs will have been
installed in /var/spool/wwwoffle/html/search/htdig/scripts when WWWOFFLE was
installed.

These files should be correct if the information in the WWWOFFLE Makefile
(LOCALHOST and SPOOLDIR) was set correctly.  Check them, they should have the
spool directory and the proxy hostname and port set correctly.

Also they should be checked to ensure that the ht://Dig programs are on the path
(you can edit the PATH variable here if they are not in /usr/local/bin).  The
merging process can use a lot of disk space when the sort program is run, you
can change the location of the temporary directory used for this with the TMPDIR
variable.


The Fuzzy Database
------------------

The ht://Dig programs use a database of fuzzy word endings and synonyms.  This
needs to be created just once, there is a script provided with WWWOFFLE that
does this.

        /var/spool/wwwoffle/html/search/htdig/scripts/wwwoffle-htfuzzy

If you have an existing ht://Dig installation then this step will probably have
already been performed and is not required again.

Note: When you do this it will take a *long* time since it produces two
      databases that htsearch uses to help in matching words.


Digging and Merging
-------------------

Digging is the name that is given to the process of searching through the
web-pages to make the list of words.  Merging is the process of converting the
raw list of words into a database that can be searched.

The ht://Dig installation will include a script called 'rundig' that
demonstrates how digging and merging is supposed to work.  To work with WWWOFFLE
I have produced my own scripts that should be used instead.

        /var/spool/wwwoffle/html/search/htdig/scripts/wwwoffle-htdig-full
        /var/spool/wwwoffle/html/search/htdig/scripts/wwwoffle-htdig-incr
        /var/spool/wwwoffle/html/search/htdig/scripts/wwwoffle-htdig-lasttime

The first of these scripts will do a full search and index all of the URLs in
the cache.  The second one will do an incremental search and will only index
those that have changed since the last full search was done.  The third will add
in the files in the lasttime index into the database.

Unfortunately due to the way that the htmerge program works, it will take almost
as long to do an incremental search or a lasttime search as to do a full search.
The only differnce is that for the incremental search and lasttime search the
WWWOFFLE cache is only accessed for the files that have changed.


Searching
---------

The search page for ht://Dig is located at http://localhost:8080/search/htdig/
and is linked to from the "Welcome Page".  The word or words that you want to
search for should be entered here.

This form actually calls the script

        /var/spool/wwwoffle/html/search/htdig/scripts/wwwoffle-htsearch

to do the searching so it is possible to edit this to modify it if required.


Thanks to
---------

I would like to thank the htdig maintainer (Geoffrey.R.Hutchison@williams.edu)
for the help that he has provided to get me started with htdig and the patches
and comments that he has accepted from me into the htdig program.


Andrew M. Bishop
13th Aug 2000