<HTML ><HEAD ><TITLE >Segmenters for chinese and japanese languages </TITLE ><META NAME="GENERATOR" CONTENT="Modular DocBook HTML Stylesheet Version 1.73 "><LINK REL="HOME" TITLE="mnoGoSearch 3.2 reference manual" HREF="index.html"><LINK REL="UP" TITLE="Languages support" HREF="msearch-international.html"><LINK REL="PREVIOUS" TITLE="Making multi-language search pages " HREF="msearch-multilang.html"><LINK REL="NEXT" TITLE="Searching documents" HREF="msearch-doingsearch.html"><LINK REL="STYLESHEET" TYPE="text/css" HREF="mnogo.css"><META NAME="Description" CONTENT="mnoGoSearch - Full Featured Web site Open Source Search Engine Software over the Internet and Intranet Web Sites Based on SQL Database. It is a Free search software covered by GNU license."><META NAME="Keywords" CONTENT="shareware, freeware, download, internet, unix, utilities, search engine, text retrieval, knowledge retrieval, text search, information retrieval, database search, mining, intranet, webserver, index, spider, filesearch, meta, free, open source, full-text, udmsearch, website, find, opensource, search, searching, software, udmsearch, engine, indexing, system, web, ftp, http, cgi, php, SQL, MySQL, database, php3, FreeBSD, Linux, Unix, mnoGoSearch, MacOS X, Mac OS X, Windows, 2000, NT, 95, 98, GNU, GPL, url, grabbing"></HEAD ><BODY CLASS="sect1" BGCOLOR="#EEEEEE" TEXT="#000000" LINK="#000080" VLINK="#800080" ALINK="#FF0000" ><DIV CLASS="NAVHEADER" ><TABLE SUMMARY="Header navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TH COLSPAN="3" ALIGN="center" >mnoGoSearch 3.2 reference manual: Full-featured search engine software</TH ></TR ><TR ><TD WIDTH="10%" ALIGN="left" VALIGN="bottom" ><A HREF="msearch-multilang.html" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="80%" ALIGN="center" VALIGN="bottom" >Chapter 7. Languages support</TD ><TD WIDTH="10%" ALIGN="right" VALIGN="bottom" ><A HREF="msearch-doingsearch.html" ACCESSKEY="N" >Next</A ></TD ></TR ></TABLE ><HR ALIGN="LEFT" WIDTH="100%"></DIV ><DIV CLASS="sect1" ><H1 CLASS="sect1" ><A NAME="cjk" >Segmenters for chinese and japanese languages</A ></H1 ><P >Traditional chinese and japanese writing have no spaces between words in phrase as in western languages. Thus, while indexing documents in these languages, it's need additionaly to segment phrases into words. </P ><DIV CLASS="sect2" ><H2 CLASS="sect2" ><A NAME="ja-segment" >Japanese language phrase segmenter <A NAME="AEN2911" ></A ></A ></H2 ><P >For japanes language phrase segmenting the <SPAN CLASS="application" ><A HREF="http://chasen.aist-nara.ac.jp/" TARGET="_top" >ChaSen</A ></SPAN >, a morphological system for japanes language is used. Thus, you need this system to be installed before <SPAN CLASS="application" >mnoGoSearch</SPAN >'s configuring and building. </P ><P >To enable japanese language phrase segmenting use <TT CLASS="option" >--enable-chasen</TT > switch for <B CLASS="command" >configure</B >. </P ></DIV ><DIV CLASS="sect2" ><H2 CLASS="sect2" ><A NAME="zh-segment" >Chinese language phrase segmenter <A NAME="AEN2923" ></A ></A ></H2 ><P >For chinese language phrase segmenting the frequency dictionary of chinese words is used. And segmenting itself is done by dynamic programming method to maximize the cumulative frequency of produced words. </P ><P > To enable chinese language phrase segmenting it's need to enable <TT CLASS="literal" >GB2312</TT > charset support while <SPAN CLASS="application" >mnoGoSearch</SPAN > configuring and specify frequency dictionary of chinese words by <A NAME="AEN2930" ></A > <B CLASS="command" >LoadChineseList</B > in <TT CLASS="filename" >indexer.conf</TT > file. </P ></DIV ></DIV ><DIV CLASS="NAVFOOTER" ><HR ALIGN="LEFT" WIDTH="100%"><TABLE SUMMARY="Footer navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" ><A HREF="msearch-multilang.html" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" ><A HREF="index.html" ACCESSKEY="H" >Home</A ></TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" ><A HREF="msearch-doingsearch.html" ACCESSKEY="N" >Next</A ></TD ></TR ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" >Making multi-language search pages <A NAME="AEN2757" ></A ></TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" ><A HREF="msearch-international.html" ACCESSKEY="U" >Up</A ></TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" >Searching documents</TD ></TR ></TABLE ></DIV ></BODY ></HTML >