Sophie

Sophie

distrib > Mandriva > 9.1 > ppc > by-pkgid > bebff3570faee357416d2588192a229a > files > 156

mnogosearch-3.2.8-1mdk.ppc.rpm

<HTML
><HEAD
><TITLE
>Segmenters for chinese and japanese languages
</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.73
"><LINK
REL="HOME"
TITLE="mnoGoSearch 3.2 reference manual"
HREF="index.html"><LINK
REL="UP"
TITLE="Languages support"
HREF="msearch-international.html"><LINK
REL="PREVIOUS"
TITLE="Making multi-language search pages

"
HREF="msearch-multilang.html"><LINK
REL="NEXT"
TITLE="Searching documents"
HREF="msearch-doingsearch.html"><LINK
REL="STYLESHEET"
TYPE="text/css"
HREF="mnogo.css"><META
NAME="Description"
CONTENT="mnoGoSearch - Full Featured Web site Open Source Search Engine Software over the Internet and Intranet Web Sites Based on SQL Database. It is a Free search software covered by GNU license."><META
NAME="Keywords"
CONTENT="shareware, freeware, download, internet, unix, utilities, search engine, text retrieval, knowledge retrieval, text search, information retrieval, database search, mining, intranet, webserver, index, spider, filesearch, meta, free, open source, full-text, udmsearch, website, find, opensource, search, searching, software, udmsearch, engine, indexing, system, web, ftp, http, cgi, php, SQL, MySQL, database, php3, FreeBSD, Linux, Unix, mnoGoSearch, MacOS X, Mac OS X, Windows, 2000, NT, 95, 98, GNU, GPL, url, grabbing"></HEAD
><BODY
CLASS="sect1"
BGCOLOR="#EEEEEE"
TEXT="#000000"
LINK="#000080"
VLINK="#800080"
ALINK="#FF0000"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>mnoGoSearch 3.2 reference manual: Full-featured search engine software</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="msearch-multilang.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
>Chapter 7. Languages support</TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="msearch-doingsearch.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="sect1"
><H1
CLASS="sect1"
><A
NAME="cjk"
>Segmenters for chinese and japanese languages</A
></H1
><P
>Traditional chinese and japanese writing have no spaces between words in phrase as in western languages.
Thus, while indexing documents in these languages, it's need additionaly to segment phrases into words.
</P
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="ja-segment"
>Japanese language phrase segmenter
<A
NAME="AEN2911"
></A
></A
></H2
><P
>For japanes language phrase segmenting the <SPAN
CLASS="application"
><A
HREF="http://chasen.aist-nara.ac.jp/"
TARGET="_top"
>ChaSen</A
></SPAN
>,
a morphological system for japanes language is used. Thus, you need this system to be installed before
<SPAN
CLASS="application"
>mnoGoSearch</SPAN
>'s configuring and building.
</P
><P
>To enable japanese language phrase segmenting use <TT
CLASS="option"
>--enable-chasen</TT
> switch for <B
CLASS="command"
>configure</B
>.
</P
></DIV
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="zh-segment"
>Chinese language phrase segmenter
<A
NAME="AEN2923"
></A
></A
></H2
><P
>For chinese language phrase segmenting the frequency dictionary of chinese words is used.
And segmenting itself is done by dynamic programming method to maximize the cumulative frequency of produced words.
</P
><P
>&#13;To enable chinese language phrase segmenting it's need to enable <TT
CLASS="literal"
>GB2312</TT
> charset support while 
<SPAN
CLASS="application"
>mnoGoSearch</SPAN
> configuring and specify frequency dictionary of chinese words by
<A
NAME="AEN2930"
></A
>
<B
CLASS="command"
>LoadChineseList</B
> in <TT
CLASS="filename"
>indexer.conf</TT
> file.
</P
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="msearch-multilang.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="msearch-doingsearch.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Making multi-language search pages
<A
NAME="AEN2757"
></A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="msearch-international.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Searching documents</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>