Sophie

Sophie

distrib > Mandriva > 9.1 > ppc > by-pkgid > bebff3570faee357416d2588192a229a > files > 162

mnogosearch-3.2.8-1mdk.ppc.rpm

<HTML
><HEAD
><TITLE
>Extended indexing features</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.73
"><LINK
REL="HOME"
TITLE="mnoGoSearch 3.2 reference manual"
HREF="index.html"><LINK
REL="UP"
TITLE="Indexing"
HREF="msearch-indexing.html"><LINK
REL="PREVIOUS"
TITLE="indexer configuration"
HREF="msearch-indexer-configuration.html"><LINK
REL="NEXT"
TITLE="Using syslog

"
HREF="msearch-syslog.html"><LINK
REL="STYLESHEET"
TYPE="text/css"
HREF="mnogo.css"><META
NAME="Description"
CONTENT="mnoGoSearch - Full Featured Web site Open Source Search Engine Software over the Internet and Intranet Web Sites Based on SQL Database. It is a Free search software covered by GNU license."><META
NAME="Keywords"
CONTENT="shareware, freeware, download, internet, unix, utilities, search engine, text retrieval, knowledge retrieval, text search, information retrieval, database search, mining, intranet, webserver, index, spider, filesearch, meta, free, open source, full-text, udmsearch, website, find, opensource, search, searching, software, udmsearch, engine, indexing, system, web, ftp, http, cgi, php, SQL, MySQL, database, php3, FreeBSD, Linux, Unix, mnoGoSearch, MacOS X, Mac OS X, Windows, 2000, NT, 95, 98, GNU, GPL, url, grabbing"></HEAD
><BODY
CLASS="sect1"
BGCOLOR="#EEEEEE"
TEXT="#000000"
LINK="#000080"
VLINK="#800080"
ALINK="#FF0000"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>mnoGoSearch 3.2 reference manual: Full-featured search engine software</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="msearch-indexer-configuration.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
>Chapter 3. Indexing</TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="msearch-syslog.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="sect1"
><H1
CLASS="sect1"
><A
NAME="extended-indexing"
>Extended indexing features</A
></H1
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="news"
>News extensions
<A
NAME="AEN1425"
></A
></A
></H2
><P
>By Heiko Stoermer <TT
CLASS="email"
>&#60;<A
HREF="mailto:heiko.stoermer@innominate.de"
>heiko.stoermer@innominate.de</A
>&#62;</TT
>
	</P
><P
>mnoGoSearch comes with an integrated extension to
archive news servers. (currently MySQL only! 
see <A
HREF="msearch-extended-indexing.html#news-restr"
>the Section called <I
>Restrictions</I
></A
>)  This means that you can now
download all messages from a news server an save them completely in a
database. </P
><DIV
CLASS="sect3"
><H3
CLASS="sect3"
><A
NAME="news-benefits"
>Benefits</A
></H3
><P
></P
><UL
><LI
><P
>you can expire the messages on the news server to keep it slim and fast </P
></LI
><LI
><P
>you can search the complete message base with all the features that regular mnoGoSearch offers </P
></LI
><LI
><P
>you can still browse discussion threads over the complete archive </P
></LI
></UL
></DIV
><DIV
CLASS="sect3"
><H3
CLASS="sect3"
><A
NAME="news-restr"
>Restrictions</A
></H3
><P
></P
><UL
><LI
><P
>currently mysql only (I would
have really liked to do this for postgresql, but some really annoying
restrictions concerning query size and field size in postgresql
finally made me switch to mysql.) </P
></LI
><LI
><P
>perl front-end only </P
></LI
><LI
><P
>single dict only (because mysql-perl front-end does not support multi-dict) </P
></LI
></UL
></DIV
><DIV
CLASS="sect3"
><H3
CLASS="sect3"
><A
NAME="news-todo"
>To be implemented</A
></H3
><P
>No new features are planned for this thing. It
works the way it is (at least as far as I can see) and does everything
I wanted it to do. What I will do is to make the code a bit more
portable to other databases and fix the few very tiny bugs in the
front-end. Of course newly discovered bugs will be fixed. I'm
maintaining it as good as I can. </P
></DIV
><DIV
CLASS="sect3"
><H3
CLASS="sect3"
><A
NAME="news-perf"
>Performance</A
></H3
><P
>Of course, important questions always are: how fast.../how big.../how long.... </P
><P
></P
><UL
><LI
><P
>Our local intranet installation of mnoGoSearch says the following:
				<PRE
CLASS="programlisting"
>&#13;

          mnoGoSearch statistics 
  
    Status    Expired      Total 
   ----------------------------- 
       200      76132      76132 OK 
       404        119        119 Not found 
       503         17         17 Service Unavailable 
       504        802        802 Gateway Timeout 
   ----------------------------- 
     Total      77070      77070 

</PRE
>

				</P
><P
>which means that roughly 77.000 messages are archived in the database</P
></LI
><LI
><P
>Current database size is:  423 Megabytes </P
></LI
><LI
><P
>The dict table has 6.076.462 entries </P
></LI
><LI
><P
>It's run on an AMD K6 400  with 64 MBs of RAM (very tiny thing) </P
></LI
><LI
><P
>&#13;					<SPAN
CLASS="emphasis"
><I
CLASS="emphasis"
>typical queries take between 2 and 10 seconds. </I
></SPAN
>
				</P
></LI
></UL
></DIV
><DIV
CLASS="sect3"
><H3
CLASS="sect3"
><A
NAME="news-install"
>Installation</A
></H3
><P
></P
><OL
TYPE="1"
><LI
><P
>Compile:</P
><P
>Unpack the mnoGoSearch
distribution archive. Start the configure script with the option
<TT
CLASS="literal"
>--with-mysql</TT
>. <TT
CLASS="literal"
>make</TT
> and
<TT
CLASS="literal"
>make install</TT
> as described in the regular install
instructions </P
></LI
><LI
><P
>Create Database:</P
><P
>The news extension uses a
slightly different database layout. The create files can be found in
<TT
CLASS="filename"
>frontends/mysql-perl-news/create/</TT
> (Of course you
have to do <TT
CLASS="literal"
>mysqladmin create mnoGoSearch</TT
> first and
set permissions to the account the web front-end and indexer are run
as) </P
></LI
><LI
><P
>Install <TT
CLASS="filename"
>indexer.conf</TT
>:</P
><P
>an
<TT
CLASS="filename"
>indexer.conf</TT
> for incremental news archiving
(messages hardly ever change...) can be found in
<TT
CLASS="filename"
>frontends/mysql-perl-news/etc/</TT
> together with a
sample cron shell script that can be run once a day or so. Please see
<TT
CLASS="filename"
>indexer.conf</TT
> for detailed description of the
indexing process. </P
></LI
><LI
><P
>Install perl front-end:</P
><P
>copy
<TT
CLASS="filename"
>frontends/mysql-perl-news/*.pl</TT
> and
<TT
CLASS="filename"
>frontends/mysql-perl-news/*.htm*</TT
> to your cgi-bin
directory. </P
><P
>copy
<TT
CLASS="filename"
>frontends/mysql-perl-news/*.pm</TT
> to your site's
perl library dir (<TT
CLASS="filename"
>site_perl</TT
> or so) where the
modules can be found by the perl scripts. </P
><P
>edit
<TT
CLASS="filename"
>search.htm</TT
> and change the included database login
information. The Perl front-end has additional features that allow you
to browse message threads. You will see. </P
></LI
><LI
><P
>Now you are set and can run
indexer for the first time according to the instructions you can find
in <TT
CLASS="filename"
>indexer.conf</TT
>.</P
></LI
></OL
><P
>I hope this is a nice feature for you. If anyone
is interested in porting this to other databases/multidict mode/the
PHP front-end, PLEASE DO SO! I would be pleased and will assist
you. </P
></DIV
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="htdb"
>Indexing SQL database tables (htdb: virtual URL scheme)
<A
NAME="AEN1506"
></A
></A
></H2
><P
>mnoGoSearch can index SQL database text fields - the so called  htdb: virtual URL scheme.</P
><P
>Using htdb:/ virtual scheme you can build full text
index of your SQL tables as well as index your database driven WWW
server.</P
><DIV
CLASS="note"
><BLOCKQUOTE
CLASS="note"
><P
><B
>Note: </B
>currently mnoGoSearch can index only those
tables that  are in the same database with mnoGoSearch tables. MySQL
users may specify database in the query though. Also you must have
PRIMARY key on the table you want to index.</P
></BLOCKQUOTE
></DIV
><DIV
CLASS="sect3"
><H3
CLASS="sect3"
><A
NAME="htdb-indexer"
>HTDB indexer.conf commands</A
></H3
><P
>Two <TT
CLASS="filename"
>indexer.conf</TT
> commands provide HTDB. They are HTDBList and HTDBDoc. </P
><P
>&#13;			<B
CLASS="command"
>HTDBList
<A
NAME="AEN1519"
></A
>
</B
> is SQL query to
generate list of all URLs which correspond to records in the table
using PRIMARY key field. You may use either absolute or relative URLs
in HTDBList command:</P
><P
>For example:
		<PRE
CLASS="programlisting"
>&#13;HTDBList SELECT concat('htdb:/',id) FROM messages
    or
HTDBList SELECT id FROM messages
</PRE
>
</P
><P
>&#13;			<B
CLASS="command"
>HTDBDoc
<A
NAME="AEN1526"
></A
>
</B
> is a query to get only certain record from database using PRIMARY key value.</P
><P
>HTDBList SQL query is used for all URLs which
end with '/' sign. For other URLs SQL query given in HTDBDoc is
used.</P
><DIV
CLASS="note"
><BLOCKQUOTE
CLASS="note"
><P
><B
>Note: </B
>HTDBDoc query must return FULL HTTP
response with headers. So, you can build very flexible indexing system
giving different HTTP status in query. Take a look at HTTP response
codes section of documentation to understand indexer behavior when it
gets different HTTP status.</P
></BLOCKQUOTE
></DIV
><P
>If there is no result of HTDBDoc or query does
return several records, HTDB retrieval system generates "HTTP 404 Not
Found". This may happen at reindex time if record was deleted from
your table since last reindexing. You may use "DeleteBad yes" to
delete such records from mnoGoSearch tables as well.</P
><P
>You may use several HTDBDoc/List commands in one
<TT
CLASS="filename"
>indexer.conf</TT
> with corresponding Server
commands.</P
></DIV
><DIV
CLASS="sect3"
><H3
CLASS="sect3"
><A
NAME="htdb-var"
>HTDB variables
<A
NAME="AEN1537"
></A
></A
></H3
><P
>You may use PATH parts of URL as parameters of
both HTDBList and HTDBDoc SQL queries. All parts are to be used as $1,
$2,  ... $n, where number is the number of PATH part:

		<PRE
CLASS="programlisting"
>&#13;htdb:/part1/part2/part3/part4/part5
         $1    $2    $3    $4    $5
</PRE
>
</P
><P
>For example, you have this <TT
CLASS="filename"
>indexer.conf</TT
> command:
		<PRE
CLASS="programlisting"
>&#13;HTDBList SELECT id FROM catalog WHERE category='$1'
</PRE
>
		</P
><P
>When htdb:/cars/  URL is indexed, $1 will be replaced with 'cars':
		<PRE
CLASS="programlisting"
>&#13;SELECT id FROM catalog WHERE category='cars'
</PRE
>
		</P
><P
>You may use long URLs to provide several
parameters to both HTDBList and HTDBDoc queries. For example,
<TT
CLASS="literal"
>htdb:/path1/path2/path3/path4/id</TT
> with query:
		<PRE
CLASS="programlisting"
>&#13;HTDBList SELECT id FROM table WHERE field1='$1' AND field2='$2' and field3='$3'
</PRE
>
		</P
><P
>This query will generate the following URLs:
		<PRE
CLASS="programlisting"
>&#13;htdb:/path1/path2/path3/path4/id1
...
htdb:/path1/path2/path3/path4/idN
</PRE
>
</P
><P
>for all values of the field "id" which are in HTDBList output.</P
></DIV
><DIV
CLASS="sect3"
><H3
CLASS="sect3"
><A
NAME="htdb-fulltext"
>Creating full text index</A
></H3
><P
>Using htdb:/ scheme you can create full text
index and use it further in your application. Lets imagine you have a
big SQL table which stores for example web board messages in plain
text format. You also want to build an application with messages
search facility. Lets say messages are stored in "messages" table with
two fields "id" and "msg". "id" is an integer primary key and "msg"
big text field contains messages themselves. Using usual SQL LIKE
search may take long time to answer:
		<PRE
CLASS="programlisting"
>&#13;SELECT id, message FROM message WHERE message LIKE '%someword%'
</PRE
>
		</P
><P
>Using mnoGoSearch htdb: scheme you have a
possibility to create full text index on "message" table. Install
mnoGoSearch in usual order. Then edit your
<TT
CLASS="filename"
>indexer.conf</TT
>:
		<PRE
CLASS="programlisting"
>&#13;DBAddr mysql://foo:bar@localhost/database/
DBMode single

HTDBList SELECT id FROM messages

HTDBDoc SELECT concat(\
'HTTP/1.0 200 OK\\r\\n',\
'Content-type: text/plain\\r\\n',\
'\\r\\n',\
msg) \
FROM messages WHERE id='$1'

Server htdb:/
</PRE
>
		</P
><P
>After start indexer will insert 'htdb:/' URL
into database and will run an SQL query given in HTDBList. It will
produce 1,2,3, ..., N values in result. Those values will be
considered as links relative to 'htdb:/' URL. A list of new URLs in
the form htdb:/1, htdb:/2, ... , htdb:/N will be added into
database. Then HTDBDoc SQL query will be executed for each new
URL. HTDBDoc will produce HTTP document for each document in the
form:
		<PRE
CLASS="programlisting"
>&#13;HTTP/1.0 200 OK
Content-Type: text/plain

&#60;some text from 'message' field here&#62;
</PRE
>
		</P
><P
>This document will be used to create full text
index using words from 'message' fields. Words will be stored in
'dict' table assuming that we are using 'single' storage mode.</P
><P
>After indexing you can use mnoGoSearch tables to perform search:
		<PRE
CLASS="programlisting"
>&#13;SELECT url.url 
FROM url,dict 
WHERE dict.url_id=url.rec_id 
AND dict.word='someword';
</PRE
>
		</P
><P
>As far as mnoGoSearch 'dict' table has an index
on 'word' field this query will be executed much faster than queries
which use SQL LIKE search on 'messages' table.</P
><P
>You can also use several words in search:
		<PRE
CLASS="programlisting"
>&#13;SELECT url.url, count(*) as c 
FROM url,dict
WHERE dict.url_id=url.rec_id 
AND dict.word IN ('some','word')
GROUP BY url.url
ORDER BY c DESC;
</PRE
>
		</P
><P
>Both queries will return 'htdb:/XXX' values in
url.url field. Then your application has to cat leading 'htdb:/' from
those values to get PRIMARY key values of your 'messages'
table.</P
></DIV
><DIV
CLASS="sect3"
><H3
CLASS="sect3"
><A
NAME="htdb-web"
>Indexing SQL database driven web server</A
></H3
><P
>You can also use htdb:/ scheme to index your
database driven WWW server. It allows to create indexes without having
to invoke your web server while indexing. So, it is much faster and
requires less CPU resources when direct indexing from WWW
server. </P
><P
>The main idea of indexing database driven web
server is to build full text index in usual order. The only thing is
that search must produce real URLs instead of URLs in 'htdb:/...'
form. This can be achieved using mnoGoSearch aliasing tools.</P
><P
>Take a look at sample
<TT
CLASS="filename"
>indexer.conf</TT
> in
<TT
CLASS="filename"
>doc/samples/htdb.conf</TT
> It is an
<TT
CLASS="filename"
>indexer.conf</TT
> used to index <A
HREF="http://mnogosearch.org/"
TARGET="_top"
>our webboad</A
>.</P
><P
>HTDBList command generates URLs in the form:
		<PRE
CLASS="programlisting"
>&#13;http://search.mnogo.ru/board/message.php?id=XXX
</PRE
>
		</P
><P
>where XXX is a "messages" table primary key values.</P
><P
>For each primary key value HTDBDoc command generates text/html document with HTTP headers and content like this:
		<PRE
CLASS="programlisting"
>&#13;&#60;HTML&#62;
&#60;HEAD&#62;
&#60;TITLE&#62; ... subject field here .... &#60;/TITLE&#62;
&#60;META NAME="Description" Content=" ... author here ..."&#62;
&#60;/HEAD&#62;
&#60;BODY&#62; ... message text here ... &#60;/BODY&#62;
</PRE
>
</P
><P
>At the end of <TT
CLASS="filename"
>doc/samples/htdb.conf</TT
> we wrote three commands:
		<PRE
CLASS="programlisting"
>&#13;Server htdb:/
Realm  http://search.mnogo.ru/board/message.php?id=*
Alias  http://search.mnogo.ru/board/message.php?id=  htdb:/
</PRE
>
		</P
><P
>First command says indexer to execute HTDBList query which will generate a list of messages in the form:
		<PRE
CLASS="programlisting"
>&#13;http://search.mnogo.ru/board/message.php?id=XXX
</PRE
>
		</P
><P
>Second command allow indexer to accept such message URLs using string match with '*' wildcard at the end.</P
><P
>Third command replaces
"http://search.mnogo.ru/board/message.php?id=" substring in URL with
"htdb:/" when indexer retrieve documents with messages. It means that
"http://mysearch.udm.net/board/message.php?id=xxx" URLs will be shown
in search result, but "htdb:/xxx" URL will be indexed instead, where
xxx is the PRIMARY key value, the ID of record in "messages"
table.</P
></DIV
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="exec"
>Indexing binaries output (exec: and cgi: virtual URL
schemes)
<A
NAME="AEN1592"
></A
></A
></H2
><P
>mnoGoSearch supports exec: and cgi: virtual URL
schemes. They allows running an external program. This program must
return a result to it's sdtout. Result must be in HTTP standard,
i.e. HTTP response header followed by document's content.</P
><P
>For example, when indexing both
<TT
CLASS="literal"
>cgi:/usr/local/bin/myprog</TT
> and
<TT
CLASS="literal"
>exec:/usr/local/bin/myprog</TT
>, indexer will execute
the <TT
CLASS="filename"
>/usr/local/bin/myprog</TT
> program.</P
><DIV
CLASS="sect3"
><H3
CLASS="sect3"
><A
NAME="exec-cgi"
>Passing parameters to cgi: virtual scheme</A
></H3
><P
>When executing a program given in cgi: virtual
scheme, indexer emulates that program is running under HTTP server. It
creates REQUEST_METHOD environment variable with "GET" value and
QUERY_STRING variable according to HTTP standards. For example, if
<TT
CLASS="literal"
>cgi:/usr/local/apache/cgi-bin/test-cgi?a=b&#38;d=e</TT
>
is being indexed, indexer creates QUERY_STRING with
<TT
CLASS="literal"
>a=b&#38;d=e</TT
> value.  cgi: virtual URL scheme allows
indexing your site without having to invoke web servers even if you
want to index CGI scripts. For example, you have a web site with
static documents under <TT
CLASS="filename"
>/usr/local/apache/htdocs/</TT
>
and with CGI scripts under
<TT
CLASS="filename"
>/usr/local/apache/cgi-bin/</TT
>. Use the following
configuration:
		<PRE
CLASS="programlisting"
>&#13;Server http://localhost/
Alias  http://localhost/cgi-bin/	cgi:/usr/local/apache/cgi-bin/
Alias  http://localhost/		file:/usr/local/apache/htdocs/
</PRE
>
		</P
></DIV
><DIV
CLASS="sect3"
><H3
CLASS="sect3"
><A
NAME="exec-exec"
>Passing parameters to exec: virtual scheme</A
></H3
><P
>indexer does not create QUERY_STRING variable
like in cgi: scheme. It creates a command line with argument given in
URL after ? sign. For example, when indexing
<TT
CLASS="literal"
>exec:/usr/local/bin/myprog?a=b&#38;d=e</TT
>, this
command will be executed:
		<PRE
CLASS="programlisting"
>&#13;/usr/local/bin/myprog "a=b&#38;d=e" 
</PRE
>
		</P
></DIV
><DIV
CLASS="sect3"
><H3
CLASS="sect3"
><A
NAME="exec-ext"
>Using exec: virtual scheme as an external retrieval system</A
></H3
><P
>exec: virtual scheme allow using it as an
external retrieval system. It allows using protocols which are not
supported natively by mnoGoSearch. For example, you can use curl
program which is available from <A
HREF="http://curl.haxx.se/"
TARGET="_top"
>http://curl.haxx.se/</A
> to index HTTPS sites.</P
><P
>Put this short script to
<TT
CLASS="literal"
>/usr/local/mnogosearch/bin/</TT
> under
<TT
CLASS="filename"
>curl.sh</TT
> name.
		<PRE
CLASS="programlisting"
>&#13;#!/bin/sh
/usr/local/bin/curl -i $1 2&#62;/dev/null
</PRE
>
</P
><P
>This script takes an URL given in command line
argument and executes curl program to download it.  -i argument says
curl to output result together with HTTP headers.</P
><P
>Now use these commands in your <TT
CLASS="filename"
>indexer.conf</TT
>:
		<PRE
CLASS="programlisting"
>&#13;Server https://some.https.site/
Alias  https://  exec:/usr/local/mnogosearch/etc/curl.sh?https://
</PRE
>
		</P
><P
>When indexing
<TT
CLASS="filename"
>https://some.https.site/path/to/page.html</TT
>,
indexer will translate this URL to 
		<PRE
CLASS="programlisting"
>&#13;exec:/usr/local/mnogosearch/etc/curl.sh?https://some.https.site/path/to/page.html
</PRE
>
		</P
><P
>execute the <TT
CLASS="filename"
>curl.sh</TT
> script:
		<PRE
CLASS="programlisting"
>&#13;/usr/local/mnogosearch/etc/curl.sh "https://some.https.site/path/to/page.html"
</PRE
>
		</P
><P
>and take it's output.</P
></DIV
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="mirror"
>Mirroring
<A
NAME="AEN1633"
></A
></A
></H2
><P
>&#13;<A
NAME="AEN1636"
></A
>
You may specify a path to root dir to enable sites mirroring
	<PRE
CLASS="programlisting"
>&#13;MirrorRoot /path/to/mirror
</PRE
>
	</P
><P
>&#13;<A
NAME="AEN1641"
></A
>
You may specify as well root dir of mirrored document's headers indexer will store HTTP headers to local disk too.
	<PRE
CLASS="programlisting"
>&#13;MirrorHeadersRoot /path/to/headers
</PRE
>
	</P
><P
>&#13;<A
NAME="AEN1646"
></A
>
You may specify period during which earlier mirrored files will be used while indexing instead of real downloading.
	<PRE
CLASS="programlisting"
>&#13;MirrorPeriod &#60;time&#62;
</PRE
>
	</P
><P
>It is very useful when you do some experiments with
mnoGoSearch indexing the same hosts and do not want much traffic
from/to Internet. If MirrorHeadersRoot is not specified and headers
are not stored to local disk then default Content-Type's given in
AddType commands will be used. Default value of the MirrorPeriod is
-1, which means <TT
CLASS="literal"
>do not use mirrored files</TT
>.</P
><P
>&#60;time&#62; is in the form
<TT
CLASS="literal"
>xxxA[yyyB[zzzC]]</TT
> (Spaces are allowed between xxx
and A and yyy and so on) where xxx, yyy, zzz are numbers (can be
negative!). A, B, C can be one of the following:
	<PRE
CLASS="programlisting"
>&#13;		s - second
		M - minute
		h - hour
		d - day
		m - month
		y - year
</PRE
>
</P
><P
>(these letters are the same as in strptime/strftime functions)</P
><P
>Examples:
	<PRE
CLASS="programlisting"
>&#13;15s - 15 seconds
4h30M - 4 hours and 30 minutes
1y6m-15d - 1 year and six month minus 15 days
1h-10M+1s - 1 hour minus 10 minutes plus 1 second
</PRE
>
</P
><P
>If you specify only number without any character, it is
assumed that time is given in seconds (this behavior is for
compatibility with versions prior to 3.1.7).</P
><P
>The following command will force using local copies for one day:
	<PRE
CLASS="programlisting"
>&#13;MirrorPeriod 1d
</PRE
>
	</P
><P
>If your pages are already indexed, when you re-index
with -a indexer will check the headers and only download files that
have been modified since the last indexing. Thus, all pages that are
not modified will not be downloaded and therefore not mirrored
either. To create the mirror you need to either (a) start again with a
clean database or (b) use the -m switch. </P
><P
>You can actually use the created files as a full
featured mirror to you site. However be careful: indexer will not
download a document that is larger than MaxDocSize. If a document is
larger it will be only partially downloaded. If you site has no large
documents, everything will be fine.</P
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="msearch-indexer-configuration.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="msearch-syslog.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>indexer configuration</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="msearch-indexing.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Using syslog
<A
NAME="AEN1665"
></A
></TD
></TR
></TABLE
></DIV
></BODY
></HTML
>