Sophie

Sophie

distrib > Mandriva > 9.1 > ppc > by-pkgid > bebff3570faee357416d2588192a229a > files > 166

mnogosearch-3.2.8-1mdk.ppc.rpm

<HTML
><HEAD
><TITLE
>Storing mnoGoSearch data </TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.73
"><LINK
REL="HOME"
TITLE="mnoGoSearch 3.2 reference manual"
HREF="index.html"><LINK
REL="PREVIOUS"
TITLE="Comments"
HREF="msearch-htmlparser-comments.html"><LINK
REL="NEXT"
TITLE="Cache mode storage"
HREF="msearch-cachemode.html"><LINK
REL="STYLESHEET"
TYPE="text/css"
HREF="mnogo.css"><META
NAME="Description"
CONTENT="mnoGoSearch - Full Featured Web site Open Source Search Engine Software over the Internet and Intranet Web Sites Based on SQL Database. It is a Free search software covered by GNU license."><META
NAME="Keywords"
CONTENT="shareware, freeware, download, internet, unix, utilities, search engine, text retrieval, knowledge retrieval, text search, information retrieval, database search, mining, intranet, webserver, index, spider, filesearch, meta, free, open source, full-text, udmsearch, website, find, opensource, search, searching, software, udmsearch, engine, indexing, system, web, ftp, http, cgi, php, SQL, MySQL, database, php3, FreeBSD, Linux, Unix, mnoGoSearch, MacOS X, Mac OS X, Windows, 2000, NT, 95, 98, GNU, GPL, url, grabbing"></HEAD
><BODY
CLASS="chapter"
BGCOLOR="#EEEEEE"
TEXT="#000000"
LINK="#000080"
VLINK="#800080"
ALINK="#FF0000"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>mnoGoSearch 3.2 reference manual: Full-featured search engine software</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="msearch-htmlparser-comments.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
></TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="msearch-cachemode.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="chapter"
><H1
><A
NAME="howstore"
>Chapter 5. Storing mnoGoSearch data </A
></H1
><DIV
CLASS="TOC"
><DL
><DT
><B
>Table of Contents</B
></DT
><DT
><A
HREF="msearch-howstore.html#sql-stor"
>SQL storage types</A
></DT
><DT
><A
HREF="msearch-cachemode.html"
>Cache mode storage</A
></DT
><DT
><A
HREF="msearch-perf.html"
>mnoGoSearch performance issues
<A
NAME="AEN2196"
></A
></A
></DT
><DT
><A
HREF="msearch-searchd.html"
>SearchD support
<A
NAME="AEN2225"
></A
></A
></DT
><DT
><A
HREF="msearch-oracle.html"
>Oracle notes
<A
NAME="AEN2279"
></A
></A
></DT
><DT
><A
HREF="msearch-db2.html"
>IBM DB2 notes
<A
NAME="AEN2357"
></A
></A
></DT
></DL
></DIV
><DIV
CLASS="sect1"
><H1
CLASS="sect1"
><A
NAME="sql-stor"
>SQL storage types</A
></H1
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-general"
>General storage information</A
></H2
><P
>mnoGoSearch stores only unique words found in
document. If the word appears several times in the same document all
it's weights in different parts of the document are binary ORed. It
means that count of word appearance in the document does not affect
it's weight. But the fact whether the word appears in more important
parts of the document (title, description, etc.) is taken in account however. </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-modes"
>Various modes of words storage</A
></H2
><P
>There are different modes of word storage which
are currently supported by mnoGoSearch:
"single","multi","crc","crc-multi". Default mode is "single". Mode is
to be selected by <B
CLASS="command"
>DBMode</B
> command in both
<TT
CLASS="filename"
>indexer.conf</TT
> and <TT
CLASS="filename"
>search.htm</TT
>
files.</P
><PRE
CLASS="programlisting"
>&#13;Examples:
DBMode single
DBMode multi
DBMode crc
DBMode crc-multi
</PRE
><P
>mnoGoSearch compiled with built-in database
supports only "single","crc" and "crc-multi" modes. "multi" mode is
not implemented in built-in database.</P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-single"
>Storage mode - single
                 <A
NAME="AEN1973"
></A
></A
></H2
><P
>When "single" is specified, all words are stored
in one table (or in text file in built-in database) with structure
(url_id,word,weight), where url_id is  the ID of the document which is
referenced by rec_id field in "url" table. Word has <TT
CLASS="literal"
>variable
char(32)</TT
> SQL type.</P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-multi"
>Storage mode - multi
                 <A
NAME="AEN1980"
></A
></A
></H2
><P
>If "multi" is selected, words will be located in
different 13 tables depending of their lengths. Structures of these
tables are the same with "single" mode, but fixed length char type is
used, which is usually faster in most databases. This fact makes
"multi" mode usually faster comparing with "single" mode. This mode is
not implemented for built-in database.</P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-crc"
>Storage mode - crc
                 <A
NAME="AEN1986"
></A
></A
></H2
><P
>If "crc" mode is selected, mnoGoSearch will
store 32 bit integer word IDs calculated by CRC32 algorithm instead of
words. This mode requires less disc space and is faster than "single"
and "multi" modes. mnoGoSearch uses the fact that CRC32 calculates
quite unique check sums for different words. According to our tests
there are only 250 pairs of words have the same CRC in the list of
about 1.600.000 unique words. Most of these pairs (&#62;90%) have at
least one misspelled word. Words information is stored in the
structure (url_id,word_id,weight), where word_id is 32 bit integer ID
calculated by CRC32 algorithm. This mode is recommended for big search
engines. </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-crcmulti"
>Storage mode - crc-multi
                 <A
NAME="AEN1992"
></A
></A
></H2
><P
>When "crc-multi" mode is selected, mnoGoSearch
stores CRC32 word IDs in several tables (or binary files in built-in
database) with the same to "crc" structures depending on word lengths
like in "multi" mode. This mode usually is the most fast and
recommended for big search engines.</P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-cache"
>Storage mode - cache
                 <A
NAME="AEN1998"
></A
></A
></H2
><P
>There is a new "cache" storage mode. It is the
fastest one and it allows to index and quickly search through several
millions documents. Take a look into <A
HREF="msearch-cachemode.html"
>the Section called <I
>Cache mode storage</I
></A
> for
explanation. </P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-structure"
>SQL structure notes</A
></H2
><P
>Please note that we develop mnoGoSearch with
MySQL as back-end and often have no possibility to test each version
with all of other supported databases. So, if there is no table
definition in create/you_database  directory, you may found MySQL
definition for the same table and just adopt it for your
back-end. MySQL table definitions are always up-to-date.</P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="sql-stor-noncrc"
>Additional features of non-CRC storage modes</A
></H2
><P
>"single" mode in both SQL and build-in database
as well as "multi" mode with SQL database support substring search. As
far as "crc" and "crc-multi" do not store words themselves and use
integer values generated by CRC32 algorithm instead, there is no
possibility of substring search in these modes.</P
></DIV
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="msearch-htmlparser-comments.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="msearch-cachemode.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Comments</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
>&nbsp;</TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Cache mode storage</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>