<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=EUC-JP"> <title>MeCab: Yet Another Japanese Dependency Structure Analyzer</title> <link rel=stylesheet href="mecab.css"> </head> <body> <h1>¥¹¥¯¥ê¥×¥È¸À¸ì¤Î¥Ð¥¤¥ó¥Ç¥£¥ó¥°</h1> <p>$Id: bindings.html 65 2007-01-30 00:52:53Z taku-ku $;</p> <h2>³µÍ×</h2> <p> ³Æ¼ï¥¹¥¯¥ê¥×¥È¸À¸ì (<a href="http://www.perl.com">perl</a>, <a href="http://www.ruby-lang.org">ruby</a>, <a href="http://www.python.org">python</a>, <a href="http://java.sun.com">Java</a>) ¤«¤é, MeCab ¤¬Ä󶡤¹¤ë·ÁÂÖÁDzòÀϤε¡Ç½¤òÍøÍѲÄǽ¤Ç¤¹. ³Æ¥Ð¥¤¥ó¥Ç¥£¥ó¥°¤Ï <a href="http://www.swig.org">SWIG</a> ¤È¤¤¤¦¥×¥í¥°¥é ¥à¤òÍѤ¤¤Æ, ¼«Æ°À¸À®¤µ¤ì¤Æ¤¤¤Þ¤¹. <a href="http://www.swig.org">SWIG</a> ¤¬¥µ¥Ý¡¼¥È¤¹¤ë¾¤Î¸À¸ì¤â À¸À®²Äǽ¤À¤È»×¤ï¤ì¤Þ¤¹¤¬, ¸½ºß¤Ï, ºî¼Ô¤Î´ÉÍý¤Ç¤¤ëÈÏ°ÏÆâ¤È¤¤¤¦¤³¤È¤Ç, ¾åµ¤Î4¤Ä¤Î¸À¸ì¤Î¤ß¤òÄ󶡤·¤Æ¤ª¤ê¤Þ¤¹. </p> <h2>¥¤¥ó¥¹¥È¡¼¥ë</h2> <p> ³Æ¸À¸ì¥Ð¥¤¥Ç¥£¥ó¥°¤Î¥¤¥ó¥¹¥È¡¼¥ëÊýË¡¤Ï, perl/README, ruby/README, python/README, java/README ¤ò¸æÍ÷²¼¤µ¤¤. </p> <h2>¤È¤ê¤¢¤¨¤º²òÀϤ¹¤ë</h2> <p> MeCab::Tagger ¤È¤¤¤¦¥¯¥é¥¹¤Î¥¤¥ó¥¹¥¿¥ó¥¹¤òÀ¸À®¤·, parse (¤â¤·¤¯¤Ï parseToString) ¤È¤¤¤¦¥á¥½¥Ã¥É¤ò¸Æ¤Ö¤³¤È¤Ç, ²òÀÏ·ë²Ì¤¬Ê¸»úÎó¤È¤·¤Æ¼èÆÀ¤Ç¤¤Þ¤¹. MeCab::Tagger ¤Î¥³¥ó¥¹¥È¥é¥¯¥¿¤Î°ú¿ô¤Ï, ´ðËÜŪ¤Ë mecab ¤Î¼Â¹Ô·Á¼°¤ËÍ¿¤¨ ¤ë¥Ñ¥é¥á¡¼¥¿¤ÈƱ°ì¤Ç, ¤½¤ì¤é¤òʸ»úÎó¤È¤·¤ÆÍ¿¤¨¤Þ¤¹. </p> <h3>perl</h3> <pre> use MeCab; $m = new MeCab::Tagger ("-Ochasen"); print $m->parse ("º£Æü¤â¤·¤Ê¤¤¤È¤Í"); </pre> <h3>ruby</h3> <pre> require 'MeCab' m = MeCab::Tagger.new ("-Ochasen") print m.parse ("º£Æü¤â¤·¤Ê¤¤¤È¤Í") </pre> <h3>python</h3> <pre> import sys import MeCab m = MeCab.Tagger ("-Ochasen") print m.parse ("º£Æü¤â¤·¤Ê¤¤¤È¤Í") </pre> <h3>Java</h3> <pre> import org.chasen.mecab.Tagger; import org.chasen.mecab.Node public static void main(String[] argv) { Tagger tagger = new Tagger ("-Ochasen"); System.out.println (tagger.parse ("ÂÀϺ¤ÏÆóϺ¤Ë¤³¤ÎËܤòÅϤ·¤¿.")); } </pre> <h2>³Æ·ÁÂÖÁǤξܺپðÊó¤ò¼èÆÀ¤¹¤ë</h2> <p> MeCab::Tagger ¥¯¥é¥¹¤Î, parseToNode ¤È¤¤¤¦ ¥á¥½¥Ã¥É¤ò¸Æ¤Ö¤³¤È¤Ç, ¡ÖʸƬ¡×¤È¤¤¤¦ÆÃÊ̤ʷÁÂÖÁǤ¬ MeCab::Node ¥¯¥é¥¹¤Î¥¤¥ó¥¹¥¿¥ó¥¹¤È¤·¤Æ ¼èÆÀ¤Ç¤¤Þ¤¹. </p> <p> MeCab::Node ¤Ï, ÁÐÊý¸þ¥ê¥¹¥È¤È¤·¤Æɽ¸½¤µ¤ì¤Æ¤ª¤ê, next, prev ¤È¤¤¤¦¥á¥ó ¥ÐÊÑ¿ô¤¬¤¢¤ê¤Þ¤¹. ¤½¤ì¤¾¤ì, ¼¡¤Î·ÁÂÖÁÇ, Á°¤Î·ÁÂÖÁǤò MeCab::Node ¥¯¥é¥¹¤Î¥¤¥ó¥¹¥¿¥ó¥¹¤È¤·¤Æ ÊÖ¤·¤Þ¤¹. Á´·ÁÂÖÁǤˤÏ, next ¤ò½ç¼¡¸Æ¤Ö¤³¤È¤Ç¥¢¥¯¥»¥¹¤Ç¤¤Þ¤¹. </p> <p>MeCab::Node ¤Ï C ¸À¸ì¤Î¥¤¥ó¥¿¥Õ¥§¥¤¥¹¤ÇÄ󶡤·¤Æ¤¤¤ë mecab_node_t ¤ò¥é¥Ã ¥×¤·¤¿¥¯¥é¥¹¤Ç¤¹. mecab_node_t ¤¬»ý¤Ä¤Û¤Ü¤¹¤Ù¤Æ¤Î¥á¥ó¥ÐÊÑ¿ô¤Ë¥¢¥¯¥»¥¹¤¹ ¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹. ¤¿¤À¤·, surface ¤Î¤ß, ñ¸ì¤½¤Î¤â¤Î¤¬ÊÖ¤ë¤è¤¦¤ËÊѹ¹¤·¤Æ ¤¤¤Þ¤¹.</p> <p> °Ê²¼¤Ë <a href="http://www.perl.com">perl</a> ¤ÎÎã¤ò¼¨¤·¤Þ¤¹. ¤³¤ÎÎã¤Ç¤Ï, ³Æ·ÁÂÖÁǤò½ç¼¡¤Ë¥¢¥¯¥»¥¹¤·,·ÁÂÖÁǤÎɽÁØʸ»úÎó, ÉÊ»ì, ¤½¤Î·ÁÂÖÁǤޤǤΥ³¥¹¥È¤òɽ¼¨¤·¤Þ¤¹. </p> <pre> use MeCab; my $m = new MeCab::Tagger (""); for (my $n = $m->parseToNode ("º£Æü¤â¤·¤Ê¤¤¤È¤Í"); $n ; $n = $n->{next}) { printf ("%s\t%s\t%d\n", $n->{surface}, # ɽÁØ $n->{feature}, # ¸½ºß¤ÎÉÊ»ì $n->{cost} # ¤½¤Î·ÁÂÖÁǤޤǤΥ³¥¹¥È ); } </pre> <h2>¥¨¥é¡¼½èÍý</h2> <p> ¤â¤·, ¥³¥ó¥¹¥È¥é¥¯¥¿¤ä, ²òÀÏÅÓÃæ¤Ç¥¨¥é¡¼¤¬µ¯¤¤¿¾ì¹ç¤Ï, RuntimeError Îã³°¤¬È¯À¸¤·¤Þ¤¹. Îã³°¤Î¥Ï¥ó¥É¥ê¥ó¥°¤ÎÊýË¡¤Ï, ³Æ¸À¸ì¤Î¥ê¥Õ¥¡¥ì¥ó¥¹¥Þ¥Ë¥å¥¢¥ë¤ò ¤´¤é¤ó¤¯¤À¤µ¤¤. °Ê²¼¤Ï, <a href="http://www.python.org">python</a> ¤ÎÎã¤Ç¤¹ </p> <pre> try: m = MeCab.Tagger ("-d .") print m.parse ("º£Æü¤â¤·¤Ê¤¤¤È¤Í") except RuntimeError, e: print "RuntimeError:", e; </pre> <h2>Ãí°Õ»ö¹à</h2> <h3>ʸƬ,ʸËö·ÁÂÖÁÇ</h3> <p> parseToNode ¤ÎÊÖ¤êÃͤÏ, ¡ÖʸƬ¡×¤È¤¤¤¦ÆÃÊ̤ʷÁÂÖÁǤò¼¨¤¹ MeCab::Node ¥¤¥ó¥¿¥ó¥¹¤Ç¤¹. ¤µ¤é¤Ë, ¡ÖʸËö¡×¤È¤¤¤¦ÆÃÊ̤ʷÁÂÖÁǤ⸺ߤ¤¤¿¤·¤Þ¤¹¤Î¤Ç, Ãí°Õ¤·¤Æ¤¯¤À¤µ¤¤. ¤â¤·, ¤³¤ì¤é¤ò̵»ë¤·¤¿¤¤¾ì¹ç¤Ï, °Ê²¼¤Î¤è¤¦¤Ë next ¤Ç¤½¤ì¤¾¤ì¤òÆɤßÈô¤Ð¤·¤Æ¤¯¤À¤µ¤¤. <pre> my $n = $m->parseToNode ("º£Æü¤â¤·¤Ê¤¤¤È¤Í"); $n = $n->{next}; # ¡ÖʸƬ¡×¤ò̵»ë while ($n->{next}) { # next ¤òÄ´¤Ù¤ë printf ("%s\n", $n->{surface}); $n = $n->{next}; # ¼¡¤Ë°ÜÆ° } </pre> </p> <h3>MeCab::Node ¤Î¿¶Éñ¤¤</h3> <p> MeCab::Node ¤Î¼ÂÂÎ(¥á¥â¥ê¾å¤Ë¤¢¤ë·ÁÂÖÁǾðÊó)¤Ï, MeCab::Tagger ¥¤¥ó¥¹¥¿¥ó¥¹¤¬´ÉÍý¤·¤Æ¤¤¤Þ¤¹. MeCab::Node ¤Ï, Node ¤Î¼ÂÂΤò»Ø¤·¤Æ¤¤¤ë<b>»²¾È</b>¤Ë¤¹¤®¤»¤ó. ¤½¤Î¤¿¤á¤Ë, parseToNode ¤¬ ¸Æ¤Ð¤ì¤ëÅÙ¤Ë, ¼ÂÂΤ½¤Î¤â¤Î¤¬, ¾å½ñ¤¤µ¤ì¤Æ¤¤¤¤Þ¤¹. °Ê²¼¤Î¤è¤¦¤ÊÎã¤Ï¥½¡¼¥¹¤Î°Õ¿Þ¤¹¤ëÄ̤ê¤Ë¤ÏÆ°¤¤Þ¤»¤ó. </p> <pre> m = MeCab.Tagger ("") n1 = m.parseToNode ("º£Æü¤â¤·¤Ê¤¤¤È¤Í") n2 = m.parseToNode ("¤µ¤¯¤µ¤¯¤µ¤¯¤é") # n1 ¤ÎÆâÍƤÏ̵¸ú¤Ë¤Ê¤Ã¤Æ¤¤¤ë while (n1.hasNode () != 0): print n1.getSurface () n1 = n1.next () </pre> <p> ¾åµ¤ÎÎã¤Ç¤Ï, n1 ¤Î»Ø¤¹Ãæ¿È¤¬, ¡Ö¤µ¤¯¤µ¤¯¤µ¤¯¤é¡×¤ò²òÀϤ·¤¿»þÅÀ¤Ç ¾å½ñ¤¤µ¤ì¤Æ¤ª¤ê, »ÈÍѤǤ¤Ê¤¯¤Ê¤Ã¤Æ¤¤¤Þ¤¹. </p> <p> Ê£¿ô¤Î Node ¤òƱ»þ¤Ë¥¢¥¯¥»¥¹¤·¤¿¤¤¾ì¹ç¤Ï, Ê£¿ô¤Î MeCab::Tagger ¥¤¥ó¥¹¥¿¥ó¥¹¤òÀ¸À®¤·¤Æ¤¯¤À¤µ¤¤. </p> <h2>Á´¥á¥½¥Ã¥É</h2> <p> °Ê²¼¤Ë, <a href="http://www.swig.org">SWIG</a>ÍѤΥ¤¥ó¥¿¥Õ¥§¡¼¥¹¥Õ¥¡¥¤¥ë ¤Î°ìÉô¤ò¼¨¤·¤Þ¤¹. ¥Ð¥¤¥Ç¥£¥ó¥°¤Î¼ÂÁõ¸À¸ì¤ÎÅÔ¹ç¾å, C++ ¤Î¥·¥ó¥¿¥Ã¥¯¥¹¤Ç ɽµ¤µ¤ì¤Æ¤¤¤Þ¤¹¤¬, Ŭµ¹Æɤߤ«¤¨¤Æ¤¯¤À¤µ¤¤. ¤Þ¤¿, ³Æ¥á¥½¥Ã¥É¤ÎÆ°ºî¤âź¤¨ ¤Æ¤¤¤Þ¤¹¤Î¤Ç»²¹Í¤Ë¤·¤Æ¤¯¤À¤µ¤¤. </p> <pre> namespace MeCab { class Tagger { // str ¤ò²òÀϤ·¤Æʸ»úÎó¤È¤·¤Æ·ë²Ì¤òÆÀ¤Þ¤¹. len ¤Ï str ¤ÎŤµ(¾Êά²Äǽ) string parse(string str, int len); // parse ¤ÈƱ¤¸ string parseToString(string str, int len); // str ¤ò²òÀϤ·¤Æ MeCab::Node ·¿¤Î·ÁÂÖÁǤòÊÖ¤·¤Þ¤¹. // ¤³¤Î·ÁÂÖÁǤÏʸƬ¤ò¼¨¤¹¤â¤Î¤Ç, next ¤ò½ç¤Ëé¤ë¤³¤È¤ÇÁ´·ÁÂÖÁǤ˥¢¥¯¥»¥¹¤Ç¤¤Þ¤¹ Node parseToNode(string str, int len); // parse ¤Î Nbest ÈǤǤ¹. N ¤Ë nbest ¤Î¸Ä¿ô¤ò»ØÄꤷ¤Þ¤¹. // ¤³¤Îµ¡Ç½¤ò»È¤¦¾ì¹ç¤Ï, µ¯Æ°»þ¥ª¥×¥·¥ç¥ó¤È¤·¤Æ -l 1 ¤ò»ØÄꤹ¤ëɬÍפ¬¤¢¤ê¤Þ¤¹ string parseNBest(int N, string str, int len); // ²òÀÏ·ë²Ì¤ò, ³Î¤«¤é¤·¤¤¤â¤Î¤«¤é½çÈ֤˼èÆÀ¤¹¤ë¾ì¹ç¤Ë¤³¤Î´Ø¿ô¤Ç½é´ü²½¤ò¹Ô¤¤¤Þ¤¹. bool parseNBestInit(string str, int len); // parseNbestInit() ¤Î¸å, ¤³¤Î´Ø¿ô¤ò½ç¼¡¸Æ¤Ö¤³¤È¤Ç, ³Î¤«¤é¤·¤¤²òÀÏ·ë²Ì¤ò, ½çÈ֤˼èÆÀ¤Ç¤¤Þ¤¹. string next(); // next() ¤ÈƱ¤¸¤Ç¤¹¤¬, MeCab::Node ¤òÊÖ¤·¤Þ¤¹. Node nextNode(); }; #define MECAB_NOR_NODE 0 #define MECAB_UNK_NODE 1 #define MECAB_BOS_NODE 2 #define MECAB_EOS_NODE 3 struct Node { struct Node prev; // °ì¤ÄÁ°¤Î·ÁÂÖÁǤؤΥݥ¤¥ó¥¿ struct Node next; // °ì¤ÄÀè¤Î·ÁÂÖÁǤؤΥݥ¤¥ó¥¿ struct Node enext; // Ʊ¤¸°ÌÃ֤ǽª¤ï¤ë·ÁÂÖÁǤؤΥݥ¤¥ó¥¿ struct Node bnext; // Ʊ¤¸³«»Ï°ÌÃ֤ǻϤޤë·ÁÂÖÁǤؤΥݥ¤¥ó¥¿ string surface; // ·ÁÂÖÁǤÎʸ»úÎó¾ðÊó string feature; // CSV ¤Çɽµ¤µ¤ì¤¿ÁÇÀ¾ðÊó unsigned int length; // ·ÁÂÖÁǤÎŤµ unsigned int rlength; // ·ÁÂÖÁǤÎŤµ(ÀèƬ¤Î¥¹¥Ú¡¼¥¹¤ò´Þ¤à) unsigned int id; // ·ÁÂÖÁǤËÉÕÍ¿¤µ¤ì¤ë ¥æ¥Ë¡¼¥¯ID unsigned short rcAttr; // ±¦Ê¸Ì® id unsigned short lcAttr; // º¸Ê¸Ì® id unsigned short posid; // ·ÁÂÖÁÇ ID (̤»ÈÍÑ) unsigned char char_type; // ʸ»ú¼ï¾ðÊó unsigned char stat; // ·ÁÂÖÁǤμïÎà: °Ê²¼¤Î¥Þ¥¯¥í¤ÎÃÍ // #define MECAB_NOR_NODE 0 // #define MECAB_UNK_NODE 1 // #define MECAB_BOS_NODE 2 // #define MECAB_EOS_NODE 3 unsigned char isbest; // ¥Ù¥¹¥È²ò¤Î¾ì¹ç 1, ¤½¤ì°Ê³° 0 float alpha; // forward backward ¤Î foward log ³ÎΨ float beta; // forward backward ¤Î backward log ³ÎΨ float prob; // ¼þÊÕ³ÎΨ // alpha, beta, prob ¤Ï -l 2 ¥ª¥×¥·¥ç¥ó¤ò»ØÄꤷ¤¿»þ¤ËÄêµÁ¤µ¤ì¤Þ¤¹ short wcost; // ñ¸ìÀ¸µ¯¥³¥¹¥È long cost; // ÎßÀÑ¥³¥¹¥È }; } </pre> <h2>¥µ¥ó¥×¥ë¥×¥í¥°¥é¥à</h2> <p> perl/test.pl, ruby/test.rb, python/test.py, java/test.java ¤Ë¤½¤ì¤¾¤ì¤Î¸À¸ì¤Î¥µ¥ó¥×¥ë¤¬¤¢¤ê¤Þ¤¹¤Î¤Ç, »²¹Í¤Ë¤·¤Æ¤¯¤À¤µ¤¤. </p> <hr> <p> $Id: bindings.html 65 2007-01-30 00:52:53Z taku-ku $; </p> </body> </html>