<HTML ><HEAD ><TITLE >Searching documents</TITLE ><META NAME="GENERATOR" CONTENT="Modular DocBook HTML Stylesheet Version 1.73 "><LINK REL="HOME" TITLE="mnoGoSearch 3.2 reference manual" HREF="index.html"><LINK REL="PREVIOUS" TITLE="Segmenters for chinese and japanese languages " HREF="msearch-cjk.html"><LINK REL="NEXT" TITLE="How to write search result templates " HREF="msearch-templates.html"><LINK REL="STYLESHEET" TYPE="text/css" HREF="mnogo.css"><META NAME="Description" CONTENT="mnoGoSearch - Full Featured Web site Open Source Search Engine Software over the Internet and Intranet Web Sites Based on SQL Database. It is a Free search software covered by GNU license."><META NAME="Keywords" CONTENT="shareware, freeware, download, internet, unix, utilities, search engine, text retrieval, knowledge retrieval, text search, information retrieval, database search, mining, intranet, webserver, index, spider, filesearch, meta, free, open source, full-text, udmsearch, website, find, opensource, search, searching, software, udmsearch, engine, indexing, system, web, ftp, http, cgi, php, SQL, MySQL, database, php3, FreeBSD, Linux, Unix, mnoGoSearch, MacOS X, Mac OS X, Windows, 2000, NT, 95, 98, GNU, GPL, url, grabbing"></HEAD ><BODY CLASS="chapter" BGCOLOR="#EEEEEE" TEXT="#000000" LINK="#000080" VLINK="#800080" ALINK="#FF0000" ><DIV CLASS="NAVHEADER" ><TABLE SUMMARY="Header navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TH COLSPAN="3" ALIGN="center" >mnoGoSearch 3.2 reference manual: Full-featured search engine software</TH ></TR ><TR ><TD WIDTH="10%" ALIGN="left" VALIGN="bottom" ><A HREF="msearch-cjk.html" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="80%" ALIGN="center" VALIGN="bottom" ></TD ><TD WIDTH="10%" ALIGN="right" VALIGN="bottom" ><A HREF="msearch-templates.html" ACCESSKEY="N" >Next</A ></TD ></TR ></TABLE ><HR ALIGN="LEFT" WIDTH="100%"></DIV ><DIV CLASS="chapter" ><H1 ><A NAME="doingsearch" >Chapter 8. Searching documents</A ></H1 ><DIV CLASS="TOC" ><DL ><DT ><B >Table of Contents</B ></DT ><DT ><A HREF="msearch-doingsearch.html#search" >Using search front-ends</A ></DT ><DT ><A HREF="msearch-templates.html" >How to write search result templates <A NAME="AEN3083" ></A ></A ></DT ><DT ><A HREF="msearch-html.html" >Designing search.html</A ></DT ><DT ><A HREF="msearch-rel.html" >Relevancy <A NAME="AEN3548" ></A ></A ></DT ><DT ><A HREF="msearch-track.html" >Search queries tracking <A NAME="AEN3618" ></A ></A ></DT ><DT ><A HREF="msearch-srcache.html" >Search results cache <A NAME="AEN3638" ></A ></A ></DT ><DT ><A HREF="msearch-fuzzy.html" >Fuzzy search</A ></DT ></DL ></DIV ><DIV CLASS="sect1" ><H1 CLASS="sect1" ><A NAME="search" >Using search front-ends</A ></H1 ><DIV CLASS="sect2" ><H2 CLASS="sect2" ><A NAME="search-perform" >Performing search</A ></H2 ><P >Open your preferred front-end in Web browser: <PRE CLASS="programlisting" > http://your.web.server/path/to/search.cgi or http://your.web.server/path/to/search.php3 or http://your.web.server/path/to/search.pl </PRE > </P ><P >To find something just type words you want to find and press SUBMIT button. For example, "<TT CLASS="userinput" ><B >mysql odbc</B ></TT >". You should not use quotes " in query, they are written here only to divide a query from other text. mnoGoSearch will find all documents that contain word "mysql" and/or word "odbc". Best documents having bigger weights will be displayed first.</P ></DIV ><DIV CLASS="sect2" ><H2 CLASS="sect2" ><A NAME="search-params" >Search parameters <A NAME="AEN2947" ></A ></A ></H2 ><P >mnoGoSearch front-ends support the following parameters given in CGI query string. You may use them in HTML form on search page.</P ><DIV CLASS="table" ><A NAME="AEN2950" ></A ><P ><B >Table 8-1. Available search parameters</B ></P ><TABLE BORDER="1" CLASS="CALSTABLE" ><TBODY ><TR ><TD ALIGN="LEFT" VALIGN="MIDDLE" >q</TD ><TD ALIGN="LEFT" VALIGN="MIDDLE" >text parameter with search query</TD ></TR ><TR ><TD ALIGN="LEFT" VALIGN="MIDDLE" >ps</TD ><TD ALIGN="LEFT" VALIGN="MIDDLE" >page size, number of search results displayed on one page, 20 by default. Maximum page size is 100. This value does not allow passing very big page sizes to avoid server overload and might be changed with MAX_PS definition in <TT CLASS="filename" >search.c</TT >. </TD ></TR ><TR ><TD ALIGN="LEFT" VALIGN="MIDDLE" >np</TD ><TD ALIGN="LEFT" VALIGN="MIDDLE" >page number, 0 by default (first page)</TD ></TR ><TR ><TD ALIGN="LEFT" VALIGN="MIDDLE" >m</TD ><TD ALIGN="LEFT" VALIGN="MIDDLE" >search mode. Currently "all","any" and "bool" values are supported.</TD ></TR ><TR ><TD ALIGN="LEFT" VALIGN="MIDDLE" >wm</TD ><TD ALIGN="LEFT" VALIGN="MIDDLE" >word match. You may use this parameter to choose word match type. There are "wrd", "beg", "end" and "sub" values that respectively mean whole word, word beginning, word ending and word substring match.</TD ></TR ><TR ><TD ALIGN="LEFT" VALIGN="MIDDLE" >t</TD ><TD ALIGN="LEFT" VALIGN="MIDDLE" >tag limit. Limits search through only documents with given tag. This parameter has the same effect with -t indexer option</TD ></TR ><TR ><TD ALIGN="LEFT" VALIGN="MIDDLE" >cat</TD ><TD ALIGN="LEFT" VALIGN="MIDDLE" >Category limit. Take a look into <A HREF="msearch-subsections.html#categories" >the Section called <I >Categories <A NAME="AEN2372" ></A ></I > in Chapter 6</A > for details.</TD ></TR ><TR ><TD ALIGN="LEFT" VALIGN="MIDDLE" >ul</TD ><TD ALIGN="LEFT" VALIGN="MIDDLE" >URL limit, URL substring to limit search through subsection of database. It supports SQL % and _ LIKE wildcards. This parameter has the same effect with -u indexer option. If relative URL is specified <TT CLASS="filename" >search.cgi</TT > inserts % signs before and after "ul" value when compiled with SQL support. It allows to write URL substring in HTML from to limit search, for example <OPTION VALUE="/manual/"> instead of VALUE="%/manual/%". When full URL with schema is specified <TT CLASS="filename" >search.cgi</TT > adds % sign only after this value. For example for <OPTION VALUE="http://localhost/"> <TT CLASS="filename" >search.cgi</TT > will pass <TT CLASS="literal" >http://localhost/%</TT > in SQL LIKE comparison.</TD ></TR ><TR ><TD ALIGN="LEFT" VALIGN="MIDDLE" >wf</TD ><TD ALIGN="LEFT" VALIGN="MIDDLE" >Weight factors. It allows changing different document sections weights at a search time. Should be passed in the form of hex number. Check the explanation below.</TD ></TR ><TR ><TD ALIGN="LEFT" VALIGN="MIDDLE" >g</TD ><TD ALIGN="LEFT" VALIGN="MIDDLE" >Language limit. Language abbreviation to limit search results by url.lang field.</TD ></TR ><TR ><TD ALIGN="LEFT" VALIGN="MIDDLE" >tmplt</TD ><TD ALIGN="LEFT" VALIGN="MIDDLE" >Template filename (without path). To specify template file other standart <TT CLASS="filename" >search.htm</TT >. </TD ></TR ><TR ><TD ALIGN="LEFT" VALIGN="MIDDLE" >type</TD ><TD ALIGN="LEFT" VALIGN="MIDDLE" >Content-Type limit. Content-type to limit search results by url.content_type field. For cache mode storage this should be exact match. For SQL-modes it may be sql-like pattern. </TD ></TR ><TR ><TD ALIGN="LEFT" VALIGN="MIDDLE" >sp</TD ><TD ALIGN="LEFT" VALIGN="MIDDLE" >Words forms limit. =1, if you need search all forms for entered words. =0, if you need search only entered words. Default value is 1. </TD ></TR ><TR ><TD ALIGN="LEFT" VALIGN="MIDDLE" >sy</TD ><TD ALIGN="LEFT" VALIGN="MIDDLE" >Synonyms limit. =1, if you need add synonyms for entered words. =0, do not use synonyms. Default value is 1. </TD ></TR ></TBODY ></TABLE ></DIV ></DIV ><DIV CLASS="sect2" ><H2 CLASS="sect2" ><A NAME="search-changeweight" >Changing different document parts weights at search time</A ></H2 ><P >It is possible to pass "wf" HTML form variable to <TT CLASS="filename" >search.cgi</TT >. "wf" variable represents weight factors for specific document parts. Currently body,title,keywords,description,url parts, crosswords as well as user defined META and HTTP headers are supported. Take a look into "Section" part of <TT CLASS="filename" >indexer.conf-dist</TT >.</P ><P >To be able use this feature it is recommended to set different sections IDs for different document parts in "Section" <TT CLASS="filename" >indexer.conf</TT > command. Currently up to 256 different sections are supported.</P ><P >Imagine that we have these default sections in <TT CLASS="filename" >indexer.conf</TT >: <PRE CLASS="programlisting" > Section body 1 256 Section title 2 128 Section keywords 3 128 Section description 4 128 </PRE > </P ><P >"wf" value is a string of hex digits ABCD. Each digit is a factor for corresponding section's weight. The most right digit corresponds to section 1. For the given above sections configuration:</P ><P CLASS="literallayout" ><br> D is a factor for section 1 (body)<br> C is a factor for section 2 (title)<br> B is a factor for section 3 (keywords)<br> A is a factor for section 4 (description)<br> </P ><P >Examples:</P ><P CLASS="literallayout" ><br> wf=0001 will search through body only.<br> <br> wf=1110 will search through title,keywords,desctription but not <br> through the body.<br> <br> wf=F421 will search through:<br> Description with factor 15 (F hex)<br> Keywords with factor 4<br> Title with factor 2<br> Body with factor 1<br> </P ><P >By default, if "wf" variable is omitted in the query, all sections factors are 1, it means all sections have the same weight.</P ></DIV ><DIV CLASS="sect2" ><H2 CLASS="sect2" ><A NAME="search-scriptname" >Using front-end with an shtml page</A ></H2 ><P >When using a dynamic shtml page containing SSI that calls <TT CLASS="filename" >search.cgi</TT >, i.e. <TT CLASS="filename" >search.cgi</TT > is not called directly as a CGI program, it is necessary to override Apache's SCRIPT_NAME environment attribute so that all the links on search pages lead to the dynamic page and not to <TT CLASS="filename" >search.cgi</TT >.</P ><P >For example, when a shtml page contains a line <TT CLASS="literal" ><--#include virtual="search.cgi"></TT >, SCRIPT_NAME variable will still point to <TT CLASS="filename" >search.cgi</TT >, but not to the shtml page.</P ><P >To override SCRIPT_NAME variable we implemented a UDMSEARCH_SELF variable that you may add to Apache's <TT CLASS="filename" >httpd.conf</TT > file. Thus <TT CLASS="filename" >search.cgi</TT > will check UDMSEARCH_SELF variable first and then SCRIPT_NAME. Here is an example of using UDMSEARCH_SELF environment variable with <B CLASS="command" >SetEnv/PassEnv</B > Apache's <TT CLASS="filename" >httpd.conf</TT > command:</P ><P > <PRE CLASS="programlisting" > SetEnv UDMSEARCH_SELF /path/to/search.cgi PassEnv UDMSEARCH_SELF </PRE > </P ></DIV ><DIV CLASS="sect2" ><H2 CLASS="sect2" ><A NAME="search-templates" >Using several templates</A ></H2 ><P >It is often required to use several templates with the same <TT CLASS="filename" >search.cgi</TT >. There are actually several ways to do it. They are given here in the order how <TT CLASS="filename" >search.cgi</TT > detects template name.</P ><P ></P ><OL TYPE="1" ><LI ><P > <TT CLASS="filename" >search.cgi</TT > checks environment variable UDMSEARCH_TEMPLATE. So you can put a path to desired search template into this variable. </P ></LI ><LI ><P > <TT CLASS="filename" >search.cgi</TT > also supports Apache internal redirect. It checks REDIRECT_STATUS and REDIRECT_URL environment variables. To activate this way of template usage you may add these lines in Apache <TT CLASS="filename" >srm.conf</TT >: <PRE CLASS="programlisting" > AddType text/html .zhtml AddHandler zhtml .zhtml Action zhtml /cgi-bin/search.cgi </PRE > </P ><P >Put <TT CLASS="filename" >search.cgi</TT > into your <TT CLASS="filename" >/cgi-bin/</TT > directory. Then put HTML template into your site directory structure under any name with .zthml extension, for example template.zhtml. Now you may open search page: <TT CLASS="literal" >http://www.site.com/path/to/template.zhtml</TT > You may use any unused extension instead of .zthml of course. </P ></LI ><LI ><P >If the above two ways fail, search.cgi opens a template which has the same name with the script being executed using SCRIPT_NAME environment variable. <TT CLASS="filename" >search.cgi</TT > will open a template <TT CLASS="filename" >ETC/search.htm</TT >, <TT CLASS="filename" >search1.cgi</TT > will open <TT CLASS="filename" >ETC/search1.htm</TT > and so on, where ETC is mnoGoSearch /etc directory (usually <TT CLASS="filename" >/usr/local/mnoGoSearch/etc</TT >). So, you can use the same <TT CLASS="filename" >search.cgi</TT > with different templates without having to recompile it. Just create one or several hard or symbolic links for <TT CLASS="filename" >search.cgi</TT > or copy it and put corresponding search templates into /etc directory of mnoGoSearch installation.</P ><P >Take a look also into Making multi-language search pages section</P ></LI ></OL ></DIV ><DIV CLASS="sect2" ><H2 CLASS="sect2" ><A NAME="search-bool" >Advanced boolean search <A NAME="AEN3064" ></A ></A ></H2 ><P >If you want more advanced results you may use query language. You should select "bool" match mode in the search from.</P ><P >mnoGoSearch understands the following boolean operators:</P ><P ><TT CLASS="userinput" ><B >&</B ></TT > - logical AND. For example, "mysql & odbc". mnoGoSearch will find any URLs that contain both "mysql" and "odbc". You can also use <TT CLASS="userinput" ><B >+</B ></TT > for this operator.</P ><P ><TT CLASS="userinput" ><B >|</B ></TT > - logical OR. For example "mysql|odbc". mnoGoSearch will find any URLs that contain word "mysql" or word "odbc".</P ><P ><TT CLASS="userinput" ><B >~</B ></TT > - logical NOT. For example "mysql & ~odbc". mnoGoSearch will find URLs that contain word "mysql" and do not contain word "odbc" at the same time. Note that ~ just excludes given word from results. Query "~odbc" will find nothing! You can also use <TT CLASS="userinput" ><B >-</B ></TT > for this operator.</P ><P ><TT CLASS="userinput" ><B >()</B ></TT > - group command to compose more complex queries. For example "(mysql | msql) & ~postgres". Query language is simple and powerful at the same time. Just consider query as usual boolean expression.</P ></DIV ><DIV CLASS="sect2" ><H2 CLASS="sect2" ><A NAME="search-exp" >How search handles expired documents</A ></H2 ><P >Expired documents are still searchable with their old content.</P ></DIV ></DIV ></DIV ><DIV CLASS="NAVFOOTER" ><HR ALIGN="LEFT" WIDTH="100%"><TABLE SUMMARY="Footer navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" ><A HREF="msearch-cjk.html" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" ><A HREF="index.html" ACCESSKEY="H" >Home</A ></TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" ><A HREF="msearch-templates.html" ACCESSKEY="N" >Next</A ></TD ></TR ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" >Segmenters for chinese and japanese languages</TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" > </TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" >How to write search result templates <A NAME="AEN3083" ></A ></TD ></TR ></TABLE ></DIV ></BODY ></HTML >