<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=EUC-JP"> <title>MeCab: ¥ª¥ê¥¸¥Ê¥ë¼½ñ/¥³¡¼¥Ñ¥¹¤«¤é¤Î¥Ñ¥é¥á¡¼¥¿¿äÄê</title> <link type="text/css" rel="stylesheet" href="mecab.css"> </head> <body> <h1>¥ª¥ê¥¸¥Ê¥ë¼½ñ/¥³¡¼¥Ñ¥¹¤«¤é¤Î¥Ñ¥é¥á¡¼¥¿¿äÄê</h1> <p>$Id: learn.html 131 2007-06-09 16:18:15Z taku-ku $;</p> <h2>³µÍ×</h2> <p>³Ø½¬ÍÑ¥³¡¼¥Ñ¥¹¤«¤é¥Ñ¥é¥á¡¼¥¿(¥³¥¹¥ÈÃÍ)¤ò¿äÄꤹ¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹. MeCab ¼«¿È¤ÏÉÊ»ìÂηϤËÈó°Í¸¤ÊÀ߷פˤʤäƤ¤¤ë¤¿¤á, Æȼ«¤ÎÉÊ»ìÂηÏ, ¼½ñ, ¥³¡¼¥Ñ¥¹¤Ë´ð¤Å¤¯²òÀÏ´ï¤òºîÀ®¤¹¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹. ¥Ñ¥é¥á¡¼¥¿¿äÄê¤Ë¤Ï Conditinoal Random Fields (<a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a>) ¤ò»È¤Ã¤Æ¤¤¤Þ¤¹. </p> <p> <h2>½èÍý¤Îή¤ì</h2> <p>¥Ç¡¼¥¿¥Õ¥í¡¼¿Þ¤Ï¼¡¤Î¤è¤¦¤Ë¤Ê¤ê¤Þ¤¹. </p> <img src="flow.png"> <p>¥Ñ¥é¥á¡¼¥¿¿äÄê¤Ë¤Ï°Ê²¼¤Î¥µ¥Ö¥¿¥¹¥¯¤¬¤¢¤ê¤Þ¤¹. </p> <ul> <li><a href="#seed">Seed¼½ñ¤Î½àÈ÷</a> <li><a href="#config">ÀßÄê¥Õ¥¡¥¤¥ë¤Î½àÈ÷</a> <ul> <li>dicrc <li>char.def <li>unk.def <li>rewrite.def <li>feature.def </ul> <li><a href="#corpus">³Ø½¬ÍÑ¥³¡¼¥Ñ¥¹¤Î½àÈ÷</a> <li><a href="#binary">³Ø½¬ÍѥХ¤¥Ê¥ê¼½ñ¤ÎºîÀ®</a> <li><a href="#crf"><a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a> ¥Ñ¥é¥á¡¼¥¿¤Î³Ø½¬</a> <li><a href="#dist">ÇÛÉÛÍѼ½ñ¤ÎºîÀ®</a> <li><a href="#test">²òÀÏÍѥХ¤¥Ê¥ê¼½ñ¤ÎºîÀ®</a> <li><a href="#eval">ɾ²Á</a> </ul> <p>¤½¤ì¤¾¤ì½ç¤ËÀâÌÀ¤·¤Æ¤¤¤¤Þ¤¹. </p> <h2><a name="seed">Seed¼½ñ¤Î½àÈ÷</a></h2> <p>MeCab¤Î¼½ñ¤Ï CSV ¤Çµ½Ò¤µ¤ì¤Þ¤¹. Seed ¼½ñ¤ÈÇÛÉÛ¼½ñ¤Î¥Õ¥©¡¼¥Þ¥Ã ¥È¤Ï´ðËÜŪ¤ËƱ°ì¤Ç¤¹. </p> <p>°Ê²¼¤¬¼½ñ¤Î¥¨¥ó¥È¥ê¤ÎÎã¤Ç¤¹. </p> <pre> ¿Ê³Ø¹»,0,0,0,̾»ì,°ìÈÌ,*,*,*,*,¿Ê³Ø¹»,¥·¥ó¥¬¥¯¥³¥¦,¥·¥ó¥¬¥¯¥³¡¼ ÇßÎñ,0,0,0,̾»ì,°ìÈÌ,*,*,*,*,ÇßÎñ,¥¦¥á¥´¥è¥ß,¥¦¥á¥´¥è¥ß µ¤°µ,0,0,0,̾»ì,°ìÈÌ,*,*,*,*,µ¤°µ,¥¥¢¥Ä,¥¥¢¥Ä ¿åÃæÍãÁ¥,0,0,0,̾»ì,°ìÈÌ,*,*,*,*,¿åÃæÍãÁ¥,¥¹¥¤¥Á¥å¥¦¥è¥¯¥»¥ó,¥¹¥¤¥Á¥å¡¼¥è¥¯¥»¥ó </pre> <p>ºÇ½é¤Î4¥«¥é¥àÌܤޤǤÏ, ɬ¿Ü¹àÌܤÇ, </p> <ul> <li>ɽÁØ·Á (ñ¸ì¤½¤Î¤â¤Î) <li>º¸Ï¢ÀܾõÂÖÈÖ¹æ <li>±¦Ï¢ÀܾõÂÖÈÖ¹æ <li>¥³¥¹¥È </ul> <p>¤È¤Ê¤Ã¤Æ¤¤¤Þ¤¹. º¸Ï¢ÀܾõÂÖÈÖ¹æ, ±¦Ï¢ÀܾõÂÖÈÖ¹æ, ¥³¥¹¥È¤Ï, Seed ¼½ñ¤Ç¤Ï »È¤ï¤ì¤Ê¤¤¤Î¤Ç 0 ¤È¤·¤Æ¤ª¤¤Þ¤¹.</p> <p>5¥«¥é¥àÌܰʹߤϡÖÁÇÀ¡×¤È¸Æ¤Ð¤ì¤ë¹àÌܤǤ¹. MeCab ¤Ï, ¥·¥¹¥Æ¥à¤ÎÈÆÍÑÀ ¤ò¹â¤á¤ë¤¿¤á¤Ë, ¡ÖÉÊ»ì¡×¡Ö³èÍѡסÖÆɤߡסÖȯ²»¡×¤È¤¤¤Ã¤¿¡Öñ¸ì¤ËÉÕÍ¿¤µ¤ì ¤ë¾ðÊó¡×¤ò¥·¥¹¥Æ¥à¤Ï¶èÊ̤»¤º¡ÖÁÇÀ¡×¤È¤·¤Æ°·¤Ã¤Æ¤¤¤Þ¤¹. ¥æ¡¼¥¶¤Ï CSV ¤¬ µö¤¹¸Â¤ê²¿¸Ä¤Ç¤âÁÇÀ¤òÉÕÍ¿¤¹¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹. ¤¿¤À¤·, ³Æ¥«¥é¥à¤ÎÁÇÀ¤Î ÄêµÁ¤Ï¤½¤í¤¨¤Æ¤ª¤¯É¬Íפ¬¤¢¤ê¤Þ¤¹. (5¥«¥é¥àÌܤÏÉÊ»ì, 6¥«¥é¥àÌܤÏÉÊ»ìºÆʬ ÎàÅù) Ä̾ï, ÁÇÀÈÖ¹æ¤Î¼ã¤¤¤â¤Î¤«¤é½ç¤Ë°ìÈÌŪ¤ÊÁÇÀ¤òÎóµó¤·¤Æ¤¤¤¤Þ¤¹. (Îã: ÉÊ»ì, ÉÊ»ìºÙʬÎà, ³èÍÑ·¿, ³èÍÑ·Á, ¸¶·Á, Æɤß, ȯ²») </p> <p>ÁÇÀ¤ÏÆâÉôŪ¤Ë¤ÏÇÛÎó¤È¤·¤Æ°·¤ï¤ì¤Þ¤¹. 0ÈÖÌܤÎÁÇÀ, 1ÈÖÌܤÎÁÇÀ.. ¤È ¤¤¤¦¸Æ¤ÓÊý¤ÇÁÇÀ¤ò»²¾È¤¹¤ë¤³¤È¤¬¤¢¤ê¤Þ¤¹. ÁÇÀ¤ÎÈÖ¹æ¤ÈÆâÉôɽ¸½(ÉÊ»ì, ÆÉ ¤ßÅù)¤Ï, ¥æ¡¼¥¶¼«¿È¤¬´ÉÍý¤·¤Æ¤¯¤À¤µ¤¤. </p> <p>¾åµ¤ÎÎã¤Ï, ipadic ¤ÎÎã¤Ç¤¹. ÁÇÀÎó¤È¤·¤Æ</p> <ul> <li>ÉÊ»ì <li>ÉÊ»ìºÙʬÎà1 <li>ÉÊ»ìºÙʬÎà2 <li>ÉÊ»ìºÙʬÎà3 <li>³èÍÑ·¿ <li>³èÍÑ·Á <li>´ðËÜ·Á <li>ÆÉ¤ß <li>ȯ²» </ul> <p>¤¬ÄêµÁ¤µ¤ì¤Æ¤¤¤Þ¤¹. </p> <p>MeCab ¤Ï³èÍѽèÍý¤ò¹Ô¤¤¤Þ¤»¤ó. ³èÍѤ¹¤ë¸ì¤Î¾ì¹ç¤Ï, ¥æ¡¼¥¶¤¬»öÁ°¤Ë³èÍÑ ¤òŸ³«¤¹¤ëɬÍפ¬¤¢¤ê¤Þ¤¹. <pre> Ï¢¤ì½Ð¤¹,0,0,0,Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥µ¹Ô,´ðËÜ·Á,Ï¢¤ì½Ð¤¹,¥Ä¥ì¥À¥¹,¥Ä¥ì¥À¥¹ Ï¢¤ì½Ð¤µ,0,0,0,Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥µ¹Ô,̤Á³·Á,Ï¢¤ì½Ð¤¹,¥Ä¥ì¥À¥µ,¥Ä¥ì¥À¥µ Ï¢¤ì½Ð¤½,0,0,0,Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥µ¹Ô,̤Á³¥¦Àܳ,Ï¢¤ì½Ð¤¹,¥Ä¥ì¥À¥½,¥Ä¥ì¥À¥½ Ï¢¤ì½Ð¤·,0,0,0,Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥µ¹Ô,Ï¢ÍÑ·Á,Ï¢¤ì½Ð¤¹,¥Ä¥ì¥À¥·,¥Ä¥ì¥À¥· Ï¢¤ì½Ð¤»,0,0,0,Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥µ¹Ô,²¾Äê·Á,Ï¢¤ì½Ð¤¹,¥Ä¥ì¥À¥»,¥Ä¥ì¥À¥» Ï¢¤ì½Ð¤»,0,0,0,Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥µ¹Ô,Ì¿Îá£å,Ï¢¤ì½Ð¤¹,¥Ä¥ì¥À¥»,¥Ä¥ì¥À¥» Ï¢¤ì½Ð¤·¤ã,0,0,0,Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥µ¹Ô,²¾Äê½ÌÌó£±,Ï¢¤ì½Ð¤¹,¥Ä¥ì¥À¥·¥ã,¥Ä¥ì¥À¥·¥ã </pre></p> <h2><a name="config">ÀßÄê¥Õ¥¡¥¤¥ë¤Î½àÈ÷</a></h2> <h3>dicrc</h3> <p> ¼½ñ¤Î¤µ¤Þ¤¶¤Þ¤ÊÆ°ºî¤ò»ØÄꤹ¤ë¥Õ¥¡¥¤¥ë¤Ç¤¹. °Ê²¼¤¬ºÇÄã¸Â¤ÎÀßÄê¤Ç¤¹. <p> <pre> cost-factor = 800 bos-feature = BOS/EOS,*,*,*,*,*,*,*,* eval-size = 6 unk-eval-size = 4 config-charset = EUC-JP </pre> <ul> <li>cost-factor: ¥³¥¹¥ÈÃͤËÊÑ´¹¤¹¤ë¤È¤¤Î¥¹¥±¡¼¥ê¥ó¥°¥Õ¥¡¥¯¥¿¡¼¤Ç¤¹. 700 ¤«¤é 800 ¤ÇÌäÂꤢ¤ê¤Þ¤»¤ó. <li>bos-feature: ʸƬ, ʸËö¤ÎÁÇÀ¤Ç¤¹. CSV ¤Çɽ¸½¤·¤Þ¤¹. <li>eval-size: ´ûÃθì¤Î»þ, ÁÇÀ¤ÎÀèƬ¤«¤é²¿¸Ä¹çÃפ¹¤ì¤ÐÀµ²ò¤ÈǧÄꤹ¤ë¤« ¤ò»ØÄꤷ¤Þ¤¹. Ä̾ï, ´ûÃθì¤ÏÉÊ»ì, ³èÍѤȤ¤¤Ã¤¿¾ðÊó¤Î¤ß¤¬Àµ²ò¤¹¤ì¤Ð¤è¤¤¤Î¤Ç, ¡ÖÆɤߡסÖȯ²»¡×¤È¤¤¤Ã¤¿ÁÇÀ¤Ï̵ »ë¤¹¤ë¤è¤¦¤Ë¤·¤Þ¤¹. ¾åµ¤ÎÎã¤Ç¤Ï 6 ¤È¤Ê¤Ã¤Æ¤¤¤ë¤Î¤Ç, IPAÉÊ»ìÂηϤΠÉÊ»ì, ÉÊ»ìºÙʬÎà1, 2, 3, ³èÍÑ·¿, ³èÍÑ·Á ¤Î 6¤Ä¤¬É¾²Á¤µ¤ì¤Þ¤¹. <li>unk-eval-size: ̤Ãθì¤Î»þ, ÁÇÀ¤ÎÀèƬ¤«¤é²¿¸Ä¹çÃפ¹¤ì¤ÐÀµ²ò¤ÈǧÄê ¤¹¤ë¤«¤ò»ØÄꤷ¤Þ¤¹. <li>config-charset: dicrc, char.def, unk.def, pos-id.def¥Õ¥¡¥¤¥ë¤Îʸ»ú¥³¡¼¥É¤Ç¤¹. </ul> <h3>char.def</h3> <p> ̤Ãθì½èÍý¤ÎÄêµÁ¥Õ¥¡¥¤¥ë¤Ç¤¹. Ä̾ïÆüËܸì¤Î·ÁÂÖÁDzòÀϤǤϻú¼ï¤Ë´ð¤Å¤¯Ì¤ÃÎ ¸ì½èÍý¤¬¹Ô¤ï¤ì¤Þ¤¹. MeCab ¤Ç¤Ï, ¤É¤Îʸ»ú¤ò¤É¤Î»ú¼ï¤È¤·¤ÆÄêµÁ¤¹¤ë¤«¤È¤¤¤Ã¤¿Àß Äê¤òºÙ¤«¤¯»ØÄꤹ¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹. ¤µ¤é¤Ë, ³Æ»ú¼ï¤ËÂФ·, ¤É¤Î¤è¤¦¤Ê̤ÃÎ¸ì ½èÍý¤ò¹Ô¤¦¤«ºÙ¤«¤¯»ØÄꤹ¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹. </p> <p> ¥Õ¥¡¥¤¥ë¤ÎºÇ½é¤Ë¤Ï, ¥«¥Æ¥´¥ê̾¤ÎÄêµÁ¤È, ³Æ¥«¥Æ¥´¥ê¤Î̤Ãθì½èÍý¤ÎÆ°ºî ¤òÄêµÁ¤·¤Þ¤¹. <pre> ¥«¥Æ¥´¥ê̾ Æ°ºî¥¿¥¤¥ß¥ó¥°(0/1) ¥°¥ë¡¼¥Ô¥ó¥°(0/1) Ťµ(0,1, 2... n) </pre> <ul> <li>¥«¥Æ¥´¥ê̾: ¥«¥Æ¥´¥ê¤Î̾Á°¤Ç¤¹. <br> HIRANA, KATAKANA.. ¤È¤¤¤Ã¤¿¥«¥Æ¥´¥ê¤òÄêµÁ¤·¤Þ¤¹.DEFAULT ¤È SPACE ¤Ïɬ¿Ü¤Î¥«¥Æ¥´¥ê¤Ç¤¹. <li>Æ°ºî¥¿¥¤¥ß¥ó¥°: <br> ¤½¤Î¥«¥Æ¥´¥ê¤Ë¤ª¤¤¤Æ, ¤¤¤Ä̤Ãθì½èÍý¤òÆ°¤«¤¹¤«¤òÄêµÁ¤·¤Þ¤¹. <ul> <li>0: ´ûÃθ줬¤¢¤ë¾ì¹ç¤Ï, ̤Ãθì½èÍý¤òÆ°ºî¤µ¤»¤Þ¤»¤ó <li>1: ¾ï¤Ë̤Ãθì½èÍý¤òÆ°¤«¤·¤Þ¤¹ </ul> <li>¥°¥ë¡¼¥Ô¥ó¥°: ̤Ãθì¤Î¸õÊäÀ¸À®ÊýË¡¤Ç¤¹. <ul> <li>0: Ʊ¤¸»ú¼ï¤Ç¤Þ¤È¤á¤Þ¤»¤ó. <li>1: Ʊ¤¸»ú¼ï¤Ç¤Þ¤È¤á¤Þ¤¹. </ul> <li>Ťµ: ̤Ãθì¤Î¸õÊäÀ¸À®ÊýË¡¤Ç¤¹. <ul> <li>1: 1ʸ»ú¤Þ¤Ç¤Îʸ»úÎó¤ò̤Ãθì¤È¤·¤Þ¤¹. <li>2: 2ʸ»ú¤Þ¤Ç¤Îʸ»úÎó¤ò̤Ãθì¤È¤·¤Þ¤¹. <br> ... <li>n: nʸ»ú¤Þ¤Ç¤Îʸ»úÎó¤ò̤Ãθì¤È¤·¤Þ¤¹. <br> </ul> ¥°¥ë¡¼¥Ô¥ó¥°¤ÈŤµ¤ÏƱ»þ¤Ë»ØÄꤹ¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹. </ul> <p>Îã</p> <pre> KANJI 0 0 2 SYMBOL 1 1 0 NUMERIC 1 1 0 ALPHA 1 1 0 HIRAGANA 0 1 2 </pre> </p> <p>¼¡¤Ë, ³Æ¥«¥Æ¥´¥ê¤¬UCS2¤Î¥³¡¼¥É¥Ý¥¤¥ó¥È¤Î¤É¤³¤Ë³ºÅö¤¹¤ë¤«ÄêµÁ¤·¤Þ¤¹. </p> <pre> codepoint ¥Ç¥Õ¥©¥ë¥È¥«¥Æ¥´¥ê̾ ¸ß´¹¥«¥Æ¥´¥ê̾1 ¸ß´¹¥«¥Æ¥´¥ê̾2 .. </pre> <p>¤â¤·¤¯¤Ï,</p> <pre> low_codepoint..high_codepoint ¥Ç¥Õ¥©¥ë¥È¥«¥Æ¥´¥ê̾ ¸ß´¹¥«¥Æ¥´¥ê̾1 ¸ß´¹¥«¥Æ¥´¥ê̾2 .. </pre> <p>Îã</p> <pre> 0x0009 SPACE 0x30A1..0x30FF KATAKANA 0x30FC KATAKANA HIRAGANA # ¡¼ </pre> <p>¥³¡¼¥É¥Ý¥¤¥ó¥È¤Ï UCS2(Unicode)¤ò 0x ¤«¤é»Ï¤Þ¤ë16¿Ê¿ô¤Çµ½Ò¤·¤Þ¤¹.</p> <p> ºÇ½é¤Î¥«¥Æ¥´¥ê¤Ï, ¤½¤Î¥³¡¼¥É¥Ý¥¤¥ó¥È¤Î¥Ç¥Õ¥©¥ë¥È¥«¥Æ¥´¥ê¤Ç¤¹. ¤µ¤é¤Ë, ¸ß´¹¥«¥Æ¥´¥ê¤òÎóµó¤¹¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹. ¾åµ¤ÎÎã¤Ç¤Ï, Ĺ²»µ¹æ¡Ö¡¼¡× ¤Ï, ¥Ç¥Õ¥©¥ë¥È¤Ç¤Ï¥«¥¿¥«¥Ê¤Ç¤¹¤¬, Ê¿²¾Ì¾¤ò¸ß´¹¥«¥Æ¥´¥ê¤È¤·¤Æ»ý¤Á¤Þ¤¹. ¥°¥ë¡¼¥Ô¥ó¥°Æ°ºî¤Î»þ¤Ë¸ß´¹¥«¥Æ¥´¥ê¤ÏƱ¤¸¥°¥ë¡¼¥×¤È¤·¤Æ¤ß¤Ê¤µ¤ì¤Þ¤¹. </p> <p>°Ê²¼¤¬ char.def ¤Î¶ñÂÎÎã¤Ç¤¹.</p> <pre> DEFAULT 0 1 0 # DEFAULT is a mandatory category! SPACE 0 1 0 KANJI 0 0 2 SYMBOL 1 1 0 NUMERIC 1 1 0 ALPHA 1 1 0 HIRAGANA 0 1 2 KATAKANA 1 1 0 KANJINUMERIC 1 1 0 GREEK 1 1 0 CYRILLIC 1 1 0 # SPACE 0x0020 SPACE # DO NOT REMOVE THIS LINE, 0x0020 is reserved for SPACE 0x00D0 SPACE 0x0009 SPACE 0x000B SPACE 0x000A SPACE # ASCII 0x0021..0x002F SYMBOL 0x0030..0x0039 NUMERIC ... # KATAKANA 0x30A1..0x30FF KATAKANA 0x31F0..0x31FF KATAKANA # Small KU .. Small RO 0x30FC KATAKANA HIRAGANA # ¡¼ </pre> <h3>unk.def</h3> <p> ̤ÃθìÍѤμ½ñ¤Ç¤¹. </p> <pre> DEFAULT,0,0,0,µ¹æ,°ìÈÌ,*,*,*,*,* SPACE,0,0,0,µ¹æ,¶õÇò,*,*,*,*,* KANJI,0,0,0,̾»ì,°ìÈÌ,*,*,*,*,* KANJI,0,0,0,̾»ì,¥µÊÑÀܳ,*,*,*,*,* HIRAGANA,0,0,̾»ì,°ìÈÌ,*,*,*,*,* HIRAGANA,0,0,0,̾»ì,¥µÊÑÀܳ,*,*,*,*,* HIRAGANA,0,0,0,̾»ì,¸ÇÍ̾»ì,ÃÏ°è,°ìÈÌ,*,*,* ... </pre> <p> ɽÁؤÎÉôʬ¤ò char.def ¤ÇÄêµÁ¤·¤¿¥«¥Æ¥´¥ê̾¤È¤·¤¿¼½ñ¥Õ¥¡¥¤¥ë¤Ç¤¹. ³Æ¥«¥Æ¥´¥ê¤ËÂФ·¤Æ¤É¤Î¤è¤¦¤ÊÁÇÀ¸Îó¤òÉÕÍ¿¤¹¤ë¤«¤òÄêµÁ¤·¤Þ¤¹. 1¤Ä¤Î¥«¥Æ¥´¥ê¤ËÊ£¿ô¤ÎÁÇÀ¤òÄêµÁ¤·¤Æ¤â¤«¤Þ¤¤¤Þ¤»¤ó. ³Ø½¬¸å, ŬÀڤʥ³¥¹¥ÈÃͤ¬ ¼«Æ°Åª¤ËÍ¿¤¨¤é¤ì¤Þ¤¹. </p> <h3>rewrite.def</h3> <p> ÁÇÀÎ󤫤éÆâÉô¾õÂÖÁÇÀ¸Îó¤ËÊÑ´¹¤¹¤ë¥Þ¥Ã¥Ô¥ó¥°¤òÄêµÁ¤·¤Þ¤¹. </p> <p> <a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a>¤Ï, unigram, º¸Ê¸Ì® bigram, ±¦Ê¸Ì® bigram ¤Î3¾ðÊó¤ò»È¤Ã¤ÆÅý·×¾ðÊó¤ò·× »»¤·¤Þ¤¹. Î㤨¤Ð°Ê²¼¤Î¡ÖÈþ¤·¤¤Àî¡×¤È¤¤¤¦°Ê²¼¤ÎÎã¤Ç¤Ï, ¼½ñ¤ËÄêµÁ¤µ¤ì¤Æ¤¤¤ëÁÇÀ¤«¤é unigramÁÇÀ, º¸Ê¸Ì®ÁÇÀ(¤½¤Î·ÁÂÖÁǤòº¸Â¦¤«¤é¸«¤¿»þ¤ÎÁÇÀ), ±¦Ê¸Ì®ÁÇÀ(¤½¤Î·ÁÂÖÁǤòº¸Â¦¤«¤é¸«¤¿»þ¤ÎÁÇÀ)¤Î3¤Ä¤¬»È¤ï¤ì¤Þ¤¹. rewrite.def ¤Ï, ¼½ñ¤ÎÁÇÀ¤«¤é¤½¤ì¤¾¤ì¤ÎÆâÉôÁÇÀ¤Ø¤Î¥Þ¥Ã¥Ô¥ó¥°¤òÄêµÁ¤·¤Þ¤¹. <img src="feature.png"> </p> <p>¶ñÂÎŪ¤Ë°Ê²¼¤Î¤è¤¦¤Ê¤³¤È¤¬¥Þ¥Ã¥Ô¥ó¥°´Ø¿ô¤òŬÀÚ¤ËÄêµÁ¤¹¤ë¤³¤È¤Ç¼Â¸½¤Ç¤¤Þ¤¹. </p> <ul> <li>¡ÖÍè¤ë¡×¡Ö¤¯¤ë¡×¤È¤¤¤¦Æó¤Ä¤Îɽµ¤ò¡ÖÍè¤ë¡×¤Ë¤Þ¤È¤á¤ÆÅý·×Ãͤò·×»»¤¹¤ë. <li>Ï¢ÀÜ¥³¥¹¥È¤Î·×»»¤ÎºÝ, ÉÊ»ì¤Î¤ß¤ò»È¤¦/¸ì×ò½¤¹¤ë.... Åù¡¹ÁÇÀ¤Î¤É¤ÎÉô ʬ¤ò»È¤¦¤«¤òºÙ¤«¤¯ÄêµÁ¤¹¤ë. </ul> </p> <p> rewrite.def ¤Ë¤Ï 3 ¤Ä¤Î¥»¥¯¥·¥ç¥ó¤¬¤¢¤ê¤Þ¤¹. <ul> <li>[unigram rewrite]: Unigram ÆâÉô¾õÂÖ¤Ø¤Î¥Þ¥Ã¥Ô¥ó¥° <li>[left rewrite]: º¸Ê¸Ì® bigram ¤Ø¤Î¥Þ¥Ã¥Ô¥ó¥° <li>[right rewrite]: ±¦Ê¸Ì® bigram ¤Ø¤Î¥Þ¥Ã¥Ô¥ó¥° </ul> <p> ¤½¤ì¤¾¤ì¤Î¥»¥¯¥·¥ç¥ó¤Î¸å¤Ë, 1¹Ô¤Ë1¤Ä¤Î¥Þ¥Ã¥Ô¥ó¥°¥ë¡¼¥ë¤¬Â³¤¤Þ¤¹. ¥Þ¥Ã¥Ô¥ó¥°¥ë¡¼¥ë¤Ï <pre> ¥Þ¥Ã¥Á¥Ñ¥¿¡¼¥ó ÊÑ´¹Àè </pre> ¤È¤¤¤¦·Á¼°¤Çµ½Ò¤·¤Þ¤¹. ¥Þ¥Ã¥Ô¥ó¥°¥ë¡¼¥ë¤ÏÀèƬ¤«¤é½ç¤ËÁöºº¤µ¤ì¤ÆºÇ½é¤Ë ¥Þ¥Ã¥Á¤·¤¿¤â¤Î¤¬»È¤ï¤ì¤Þ¤¹. <p> ¥Þ¥Ã¥Á¥Ñ¥¿¡¼¥ó¤Ç¤Ï´Êñ¤ÊÀµµ¬É½¸½¤¬¤ò»È¤¦¤³¤È¤¬¤Ç¤¤Þ¤¹. <ul> <li>*: ¤¹¤Ù¤Æ¤Îʸ»úÎó¤Ë¥Þ¥Ã¥Á <li>(AB|CD|EF): AB ¤â¤·¤¯¤Ï CD ¤â¤·¤¯¤Ï EF ¤Ë¥Þ¥Ã¥Á <li>AB: ʸ»úÎó AB ¤Î¤ß¤Ë´°Á´¥Þ¥Ã¥Á </ul> </p> <p> ÊÑ´¹Àè¤Ï $1 $2, $3.. ¤È¤¤¤¦¥Þ¥¯¥í¤ò»È¤¤ ÁÇÀ¤Î³ÆÍ×ÁÇ (CSV¤Çµ¤µ¤ì¤¿Í×ÁÇ) ¤ÎÆâÍƤò»²¾È¤¹¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹. </p> <p> Îã <pre> [unigram rewrite] # Æɤß,ȯ²»¤ò¤È¤ê¤Î¤¾¤¤¤Æ, ÉÊ»ì1,2,3,4,³èÍÑ·Á,³èÍÑ·¿,¸¶·Á,¤è¤ß ¤ò»È¤¦ *,*,*,*,*,*,*,* $1,$2,$3,$4,$5,$6,$7,$8 # Æɤߤ¬¤Ê¤¤¾ì¹ç¤Ï̵»ë *,*,*,*,*,*,* $1,$2,$3,$4,$5,$6,$7,* [left rewrite] (½õ»ì|½õÆ°»ì),*,*,*,*,*,(¤Ê¤¤|̵¤¤) $1,$2,$3,$4,$5,$6,̵¤¤ (½õ»ì|½õÆ°»ì),½ª½õ»ì,*,*,*,*,(¤è|¥è) $1,$2,$3,$4,$5,$6,¤è ... [right rewrite] (½õ»ì|½õÆ°»ì),*,*,*,*,*,(¤Ê¤¤|̵¤¤) $1,$2,$3,$4,$5,$6,̵¤¤ (½õ»ì|½õÆ°»ì),½ª½õ»ì,*,*,*,*,(¤è|¥è) $1,$2,$3,$4,$5,$6,¤è .. </pre> </p> <h3>feature.def</h3> <p> ÆâÉô¾õÂÖ¤ÎÁÇÀ¸Î󤫤é <a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a>¤ÎÁÇÀ¸Îó¤òÃê½Ð¤¹¤ë¤¿¤á¤Î¥Æ¥ó¥×¥ì¡¼¥È¤òÄêµÁ¤·¤¿¥Õ¥¡¥¤¥ë¤Ç¤¹ </p> <p>³Æ¹Ô¤¬°ì¥Æ¥ó¥×¥ì¡¼¥È¤ËÂбþ¤·¤Þ¤¹. UNIGRAM ¤Ç¤Ï¤¸¤Þ¤ë¤â¤Î¤Ï UNIGRAM ÍÑ ¤Î¥Æ¥ó¥×¥ì¡¼¥È, BIGRAM ¤Ç¤Ï¤¸¤Þ¤ë¤â¤Î¤ÏÏ¢ÀÜÍѤΥƥó¥×¥ì¡¼¥È¤Ç¤¹. </p> <p> ³Æ¥Æ¥ó¥×¥ì¡¼¥È¤Ç¤Ï, °Ê²¼¤Î¥Þ¥¯¥í¤ò»È¤¦¤³¤È¤¬¤Ç¤¤Þ¤¹ <ul> <li>%F[n]: ¥æ¥Ë¥°¥é¥à¤Î nÈÖÌܤÎÁÇÀ¤ËŸ³«¤µ¤ì¤Þ¤¹. <li>%F?[n] :¥æ¥Ë¥°¥é¥à¤Î nÈÖÌܤÎÁÇÀ¤ËŸ³«¤µ¤ì¤Þ¤¹. ¤¿¤À¤·, ̤ÄêµÁ¤Î¾ì¹ç¤½¤Î¥Æ¥ó¥×¥ì¡¼ ¥È¤½¤Î¤â¤Î¤Ï»È¤ï¤ì¤Þ¤»¤ó. <li>%t: ʸ»ú¼ï¾ðÊó¤ËŸ³«¤µ¤ì¤Þ¤¹. ʸ»ú¼ï¤Ï char.def ¤ÇÄêµÁ¤µ¤ì¤¿ ¤â¤Î¤¬»È¤ï¤ì¤Þ¤¹. (%t ¤Ï ¥æ¥Ë¥°¥é¥àÁÇÀ¤Î»þ¤Î¤ß͸ú¤Ç¤¹) <li>%L[n]: º¸Ê¸Ì®¤Î nÈÖÌܤÎÁÇÀ¤ËŸ³«¤µ¤ì¤Þ¤¹. <li>%L?[n]: º¸Ê¸Ì®¤Î nÈÖÌܤÎÁÇÀ¤ËŸ³«¤µ¤ì¤Þ¤¹. ¤¿¤À¤·, ̤ÄêµÁ¤Î¾ì¹ç¤½¤Î¥Æ¥ó¥×¥ì¡¼ ¥È¤½¤Î¤â¤Î¤Ï»È¤ï¤ì¤Þ¤»¤ó. <li>%R[n]: ±¦Ê¸Ì®¤Î nÈÖÌܤÎÁÇÀ¤ËŸ³«¤µ¤ì¤Þ¤¹. <li>%R?[n]: º¸Ê¸Ì®¤Î nÈÖÌܤÎÁÇÀ¤ËŸ³«¤µ¤ì¤Þ¤¹. ¤¿¤À¤·, ̤ÄêµÁ¤Î¾ì¹ç¤½¤Î¥Æ¥ó¥×¥ì¡¼ ¥È¤½¤Î¤â¤Î¤Ï»È¤ï¤ì¤Þ¤»¤ó. </ul> </p> <p> Îã <pre> UNIGRAM W0:%F[6] UNIGRAM W1:%F[0]/%F[6] UNIGRAM W2:%F[0],%F?[1]/%F[6] UNIGRAM W3:%F[0],%F[1],%F?[2]/%F[6] UNIGRAM W4:%F[0],%F[1],%F[2],%F?[3]/%F[6] UNIGRAM T0:%t UNIGRAM T1:%F[0]/%t UNIGRAM T2:%F[0],%F?[1]/%t UNIGRAM T3:%F[0],%F[1],%F?[2]/%t UNIGRAM T4:%F[0],%F[1],%F[2],%F?[3]/%t BIGRAM B00:%L[0]/%R[0] BIGRAM B01:%L[0],%L?[1]/%R[0] BIGRAM B02:%L[0]/%R[0],%R?[1] BIGRAM B03:%L[0]/%R[0],%R[1],%R?[2] BIGRAM B04:%L[0],%L?[1]/%R[0],%R[1],%R?[2] BIGRAM B05:%L[0]/%R[0],%R[1],%R[2],%R?[3] BIGRAM B06:%L[0],%L?[1]/%R[0],%R[1],%R[2],%R?[3] ... </pre> <h2><a name="corpus">³Ø½¬ÍÑ¥³¡¼¥Ñ¥¹¤Î½àÈ÷</a></h2> <p>³Ø½¬¥Ç¡¼¥¿¤Ï, MeCab ¤Î¥Ç¥Õ¥©¥ë¥È½ÐÎϤÈƱ°ì¥Õ¥©¡¼¥Þ¥Ã¥È¤Çµ½Ò¤·¤Þ¤¹. </p> <pre> ÂÀϺ ̾»ì,¸ÇÍ̾»ì,¿Í̾,̾,*,*,ÂÀϺ,¥¿¥í¥¦,¥¿¥í¡¼ ¤Ï ½õ»ì,·¸½õ»ì,*,*,*,*,¤Ï,¥Ï,¥ï ²Ö»Ò ̾»ì,¸ÇÍ̾»ì,¿Í̾,̾,*,*,²Ö»Ò,¥Ï¥Ê¥³,¥Ï¥Ê¥³ ¤¬ ½õ»ì,³Ê½õ»ì,°ìÈÌ,*,*,*, ¤¬,¥¬,¥¬ ¹¥¤ ̾»ì,·ÁÍÆÆ°»ì¸ì´´,*,*,*,*, ¹¥¤,¥¹¥,¥¹¥ ¤À ½õÆ°»ì,*,*,*, Æü졦¥À,´ðËÜ·Á,¤À,¥À,¥À . µ¹æ,¶çÅÀ,*,*,*,*, . , . , . EOS ¾ÆÃñ ̾»ì,°ìÈÌ,*,*,*,*,¾ÆÃñ,¥·¥ç¥¦¥Á¥å¥¦,¥·¥ç¡¼¥Á¥å¡¼ ¹¥¤ ̾»ì,·ÁÍÆÆ°»ì¸ì´´,*,*,*,*,¹¥¤,¥¹¥,¥¹¥ ¤Î ½õ»ì,Ï¢Âβ½,*,*,*,*, ¤Î,¥Î,¥Î ¿ÆÉã ̾»ì,°ìÈÌ,*,*,*,*,¿ÆÉã,¥ª¥ä¥¸,¥ª¥ä¥¸ . µ¹æ,¶çÅÀ,*,*,*,*, . , . , . EOS ... </pre> <p> ¥¿¥Ö¤Ç¶èÀÚ¤é¤ì¤¿ºÇ½é¤ÎÉôʬ¤¬É½ÁØʸ»ú¤Ç¤¹. ¼¡¤ËÁÇÀÇÛÎó¤ò CSV¤Çɽ¸½¤·¤¿Ê¸ »úÎó¤¬Â³¤¤Þ¤¹. ʸ¤Î¶èÀÚ¤ê¤Ë¤Ï EOS ¤Î¤ß¤Î¹Ô¤òÃÖ¤¤Þ¤¹.</p> <h2><a name="binary">³Ø½¬ÍѥХ¤¥Ê¥ê¼½ñ¤ÎºîÀ®</a></h2> <p>¸½ºß¤Îºî¶È¥Ç¥£¥ì¥¯¥È¥ê¤ò WORK ¤È¤·¤Þ¤¹. WORK °Ê²¼¤Ë seed ¤È final ¤È ¤¤¤¦Æó¤Ä¤Î¥Ç¥£¥ì¥¯¥È¥ê¤òºî¤Ã¤Æ¤¯¤À¤µ¤¤. </p> <pre> cd $WORK mkdir seed final </pre> <p>seed ¥Ç¥£¥ì¥¯¥È¥ê¤Ë¤µ¤¤Û¤ÉÀâÌÀ¤·¤¿°Ê²¼¤Î¥Õ¥¡¥¤¥ë¤ò¥³¥Ô¡¼¤·¤Þ¤¹. <ul> <li>seed ¼½ñ (CSV ¤Î¥Õ¥¡¥¤¥ë½¸¹ç) <li>Á´ÀßÄê¥Õ¥¡¥¤¥ë (char.def, unk.def, rewrite.def, feature.def) <li>³Ø½¬Íѥǡ¼¥¿ (¥Õ¥¡¥¤¥ë̾: corpus) </ul> <p> Îã <pre> % cd $WORK/seed % ls Adj.csv Interjection.csv Noun.name.csv Noun.verbal.csv Symbol.csv rewrite.def Adnominal.csv Noun.adjv.csv Noun.number.csv Others.csv Verb.csv unk.def Adverb.csv Noun.adverbal.csv Noun.org.csv Postp-col.csv char.def Auxil.csv Noun.csv Noun.others.csv Postp.csv corpus Conjunction.csv Noun.demonst.csv Noun.place.csv Prefix.csv dicrc Filler.csv Noun.nai.csv Noun.proper.csv Suffix.csv feature.def </pre> </p> <p>°Ê²¼¤Î¥³¥Þ¥ó¥É¤ò¼Â¹Ô¤·¤Æ, ³Ø½¬ÍѥХ¤¥Ê¥ê¼½ñ¤òºîÀ®¤·¤Þ¤¹. <pre> % cd $WORK/seed % /usr/local/libexec/mecab/mecab-dict-index °Ê²¼¤Î¤è¤¦¤Ë -d, -o ¤ò»È¤¦¤³¤È¤â¤Ç¤¤Þ¤¹. % /usr/local/libexec/mecab/mecab-dict-index -d $WORK/seed -o $WORK/seed </pre> <ul> <li>-d: seed ¼½ñ, ÀßÄê¥Õ¥¡¥¤¥ë¤¬¤¢¤ë¥Ç¥£¥ì¥¯¥È¥ê (¥Ç¥Õ¥©¥ë¥È¤Ï¥«¥ì¥ó¥È) <li>-o: ³Ø½¬ÍѥХ¤¥Ê¥ê¼½ñ¤¬½ÐÎϤµ¤ì¤ë¥Ç¥£¥ì¥¯¥È¥ê (¥Ç¥Õ¥©¥ë¥È¤Ï¥«¥ì¥ó¥È) </ul> </p> <h2><a name="crf"><a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a> ¥Ñ¥é¥á¡¼¥¿¤Î³Ø½¬</a></h2> <p> <pre> % cd $WORK/seed % /usr/local/libexec/mecab/mecab-cost-train -c 1.0 corpus model °Ê²¼¤Î¤è¤¦¤Ë -d ¤ò»È¤Ã¤Æ¼½ñ¤ò»ØÄꤹ¤ë¤³¤È¤â¤Ç¤¤Þ¤¹< % /usr/local/libexec/mecab/mecab-cost-train -d $WORK/seed -c 1.0 $WORK/seed/corpus $WORK/seed/model </pre> </p> <ul> <li>-d: ³Ø½¬ÍѥХ¤¥Ê¥ê¼½ñ¤¬¤¢¤ë¥Ç¥£¥ì¥¯¥È¥ê (¥Ç¥Õ¥©¥ë¥È¤Ï¥«¥ì¥ó¥È) <li>-c: <a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a>¤Î¥Ï¥¤¥Ñ¡¼¥Ñ¥é¥á¡¼¥¿ <li>-f: ÁÇÀÉÑÅÙ¤ÎïçÃÍ <li>-p NUM: NUM ÊÂÎó¤Ç³Ø½¬¤ò¼Â¹Ô (¥Ç¥Õ¥©¥ë¥È¤Ï1) <li>corpus: ³Ø½¬¥Ç¡¼¥¿¤Î¥Õ¥¡¥¤¥ë̾ <li>model: ½ÐÎϤµ¤ì¤ë<a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a>¥Ñ¥é¥á¡¼¥¿¤Î¥Õ¥¡¥¤¥ë̾ </ul> <p> mecab-cost-train ¤Ï¥Ð¥¤¥Ê¥ê¥â¥Ç¥ë¤ÎºîÀ®¤Î»þ¤ËÂçÎ̤Υá¥â¥ê¤ò¾ÃÈñ¤·¤Þ¤¹. °Ê²¼¤Î¤è¤¦¤Ë¥Ð¥¤¥Ê¥ê¥â¥Ç¥ë¤ÎºîÀ®¤òÊÌ¥×¥í¥»¥¹¤Ç¹Ô¤¦¤³¤È¤Ç¥á¥â¥ê¾ÃÈñ¤ò ÍÞ¤¨¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹. <pre> % /usr/local/libexec/mecab/mecab-cost-train -y -c 1.0 corpus model % /usr/local/libexec/mecab/mecab-cost-train -b model.txt model </pre> </p> <p> ¥Ï¥¤¥Ñ¡¼¥Ñ¥é¥á¡¼¥¿C¤Ï, ³Ø½¬¤Î¡Ö¶¯¤µ¡×¤ò·è¤á¤Þ¤¹. C ¤òÂ礤¯¤¹¤ë¤È, ³Ø½¬¥Ç¡¼¥¿¤Ë¤Ç¤¤ë¤À¤±¥Õ¥£¥Ã¥È¤·¤è¤¦¤È¤·¤Þ¤¹¤¬, ²á³Ø½¬¤¹¤ë²ÄǽÀ¤¬¤¢¤ê¤Þ¤¹. ¾®¤µ¤¯¤¹¤ë¤È, ²á³Ø½¬¤òÈò¤±¤è¤¦¤È¤·¤Þ¤¹¤¬, ½½Ê¬¤Ê³Ø½¬¤¬¤Ç¤¤Ê¤¤²ÄǽÀ¤¬¤¢¤ê¤Þ¤¹. ŬÀÚ¤Ê C ¤Ï, ¸òº¹¸¡ÄêÅù¤Î¥â¥Ç¥ëÁªÂò¼êË¡¤Çȯ¸«Åª¤Ë¸«¤Ä¤±¤ë¤·¤«¤¢¤ê¤Þ¤»¤ó. ¥Ç¥Õ¥©¥ë¥È¤ÎÃÍ¤Ï 1. 0 ¤È¤Ê¤Ã¤Æ¤¤¤Þ¤¹. </p> <p> -f ¥ª¥×¥·¥ç¥ó¤Ë¤è¤Ã¤ÆÁÇÀÉÑÅÙ¤ÎïçÃͤò»ØÄꤹ¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹. Î㤨¤Ð, -f 3 ¤È¤¹¤ë¤È, ³Ø½¬¥Ç¡¼¥¿Ãæ¤Ë3²ó°Ê¾å½Ð¸½¤·¤¿ÁÇÀ¤Î¤ß¤ò»È¤¤¤Þ¤¹. ŬÀÚ¤Ê ÁÇÀïçÃͤÏ, ¸òº¹¸¡ÄêÅù¤Î¥â¥Ç¥ëÁªÂò¼êË¡¤Çȯ¸«Åª¤Ë¸«¤Ä¤±¤ë¤·¤«¤¢¤ê¤Þ¤»¤ó. </p> <p> ³Ø½¬Ãæ, °Ê²¼¤Î¤è¤¦¤Ê¾ðÊ󤬽ÐÎϤµ¤ì¤Þ¤¹. <pre> reading corpus ... adding virtual node: ̾»ì,¸ÇÍ̾»ì,ÃÏ°è,°ìÈÌ,*,*,ÅìÆü,¥È¥¦¥Ë¥Á,¥È¥¦¥Ë¥Á adding virtual node: Éû»ì,½õ»ìÎàÀܳ,*,*,*,*,¤«¤Ê¤ê,¥«¥Ê¥ê,¥«¥Ê¥ê Number of sentences: 32 Number of features: 47547 eta: 0.00010 freq: 1 C(sigma^2): 1.00000 iter=0 err=1.00000 F=0.41186 target=1691.68869 diff=1.00000 iter=1 err=1.00000 F=0.68727 target=1077.14848 diff=0.36327 iter=2 err=0.87500 F=0.81904 target=621.20311 diff=0.42329 iter=3 err=0.81250 F=0.86354 target=384.72432 diff=0.38068 iter=4 err=0.68750 F=0.93685 target=233.72722 diff=0.39248 .. </pre> <ul> <li>adding virtual node: ̤Ãθì½èÍý¤ò¹Ô¤Ê¤Ã¤Æ¤â½èÍý¤Ç¤¤Ê¤«¤Ã¤¿ ·ÁÂÖÁǤÇ, ³Ø½¬¤ÎºÝÊص¹Åª¤ËÄɲ䵤ì¤ë·ÁÂÖÁǤǤ¹. <li>iter: ³Ø½¬²ó¿ô <li>err: ʸ¥ì¥Ù¥ë¤Î¥¨¥é¡¼Î¨ <li>F: FÃÍ(ÀºÅ٤ȺƸ½Î¨¤ÎÄ´ÏÂÊ¿¶Ñ) <li>target: ÌÜŪ´Ø¿ô¤ÎÃÍ. ¤³¤ÎÃͤ¬¼ý«¤¹¤ë¤È³Ø½¬¤¬½ªÎ»¤·¤Þ¤¹. <li>diff: ÌÜŪ´Ø¿ô¤ÎÁêÂÐŪ¤Êº¹Ê¬. ¤³¤ÎÃͤ¬ 0. 0001 ¤Ë¤Ê¤ë¤È³Ø½¬¤¬½ªÎ»¤·¤Þ ¤¹. </ul> </p> <p> <h2><a name="dist">ÇÛÉÛÍѼ½ñ¤ÎºîÀ®</a></h2> <pre> % cd $WORK/seed % /usr/local/libexec/mecab/mecab-dict-gen -o ../final -m model °Ê²¼¤Î¤è¤¦¤Ë -d, -o ¤ò»È¤Ã¤Æ¼½ñ¤ò»ØÄꤹ¤ë¤³¤È¤â¤Ç¤¤Þ¤¹ % /usr/local/libexec/mecab/mecab-dict-gen -o $WORK/final -d $WORK/seed -m $WORK/seed/model </pre> <ul> <li>-d: seed ¼½ñ¤¬¤¢¤ë¥Ç¥£¥ì¥¯¥È¥ê (¥Ç¥Õ¥©¥ë¥È¤Ï¥«¥ì¥ó¥È) <li>-o: ÇÛÉÛÍѼ½ñ¤Î½ÐÎÏÀè¥Ç¥£¥ì¥¯¥È¥ê <li>-m: <a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a> ¤Î¥Ñ¥é¥á¡¼¥¿¥Õ¥¡¥¤¥ë </ul> <p>ÇÛÉÛÍѼ½ñ¤Ï, seed ¼½ñ¤ÈÊ̤Υǥ£¥ì¥¯¥È¥ê¤Ë½ÐÎϤ·¤Ê¤±¤ì¤Ð¤Ê¤ê¤Þ¤»¤ó. Ä̾ï, ÇÛÉÛ¼½ñ¥Ç¥£¥ì¥¯¥È¥ê final ¤ò¥¢¡¼¥«¥¤¥Ö¤·¤Æ¥æ¡¼¥¶¤ËÇÛÉÛ¤·¤Þ¤¹. </p> <h2><a name="test">²òÀÏÍѥХ¤¥Ê¥ê¼½ñ¤ÎºîÀ®</a></h2> <pre> % cd $WORK/final % /usr/local/libexec/mecab/mecab-dict-index °Ê²¼¤Î¤è¤¦¤Ë -d, -o ¤ò»È¤¦¤³¤È¤â¤Ç¤¤Þ¤¹. % /usr/local/libexec/mecab/mecab-dict-index -d $WORK/final -o $WORK/final </pre> <p> <ul> <li>-d: seed ¼½ñ, ÀßÄê¥Õ¥¡¥¤¥ë¤¬¤¢¤ë¥Ç¥£¥ì¥¯¥È¥ê (¥Ç¥Õ¥©¥ë¥È¤Ï¥«¥ì¥ó¥È) <li>-o: ³Ø½¬ÍѥХ¤¥Ê¥ê¼½ñ¤¬½ÐÎϤµ¤ì¤ë¥Ç¥£¥ì¥¯¥È¥ê (¥Ç¥Õ¥©¥ë¥È¤Ï¥«¥ì¥ó¥È) </ul> <p>º£ºî¤Ã¤¿¼½ñ¤ò»È¤Ã¤Æ¼ÂºÝ¤Ë²òÀϤ·¤Æ¤ß¤Þ¤¹. </p> <pre> % mecab -d $WORK/final ¾ÆÃñ¹¥¤¤Î¿ÆÉã. ¾ÆÃñ ̾»ì,°ìÈÌ,*,*,*,*,¾ÆÃñ,¥·¥ç¥¦¥Á¥å¥¦,¥·¥ç¡¼¥Á¥å¡¼ ¹¥¤ ̾»ì,·ÁÍÆÆ°»ì¸ì´´,*,*,*,*,¹¥¤, ¥¹¥, ¥¹¥ ¤Î ½õ»ì,Ï¢Âβ½,*,*,*,*,¤Î,¥Î,¥Î ¿ÆÉã ̾»ì,°ìÈÌ,*,*,*,*,¿ÆÉã, ¥ª¥ä¥¸, ¥ª¥ä¥¸ . µ¹æ,¶çÅÀ,*,*,*,*,.,.,. EOS </pre> </p> <h2><a name="eval">ɾ²Á</a></h2> <p> ¥Æ¥¹¥È¥Ç¡¼¥¿¤òÍÑ°Õ¤·¤Þ¤¹. ¥Æ¥¹¥È¥Ç¡¼¥¿¤Ï MeCab ¤Î ¥Ç¥Õ¥©¥ë¥È½ÐÎϤÈƱ°ì¥Õ¥©¡¼¥Þ¥Ã¥È¤Çµ½Ò¤·¤Þ¤¹. </p> <p> ¤Þ¤º, mecab-test-gen ¤ò»È¤Ã¤Æ¥Æ¥¹¥È¥³¡¼¥Ñ¥¹(test)¤«¤é, ʸ¤Î¤ß(test.sen)¤òÃê½Ð¤·¤Þ¤¹. <pre> % /usr/local/libexec/mecab/mecab-test-gen < test > test.sen </pre> </p> <p>test.sen ¤ò¤µ¤¤Û¤Éºî¤Ã¤¿¼½ñ¤Ç²òÀϤ·¤Þ¤¹. </p> <pre> % mecab -d $WORK/final test.sen > test.result </pre> </p> <p> ɾ²Á¥¹¥¯¥ê¥×¥È mecab-system-eval ¤ò¼Â¹Ô¤·¤Þ¤¹. Âè°ì°ú¿ô¤¬¥·¥¹¥Æ¥à¤Î·ë²Ì, ÂèÆó°ú¿ô¤¬Àµ²ò¤Î¥Õ¥¡¥¤¥ë¤Ç¤¹. <pre> % /usr/local/libexec/mecab/mecab-system-eval test.result test precision recall F LEVEL 0: 98.6887(647112/655710) 98.9793(647112/653785) 98.8338 LEVEL 1: 98.2163(644014/655710) 98.5055(644014/653785) 98.3607 LEVEL 2: 97.2230(637501/655710) 97.5093(637501/653785) 97.3659 LEVEL 4: 96.8367(634968/655710) 97.1218(634968/653785) 96.9791 </pre> </p> <p>-l ¥ª¥×¥·¥ç¥ó¤Ë¤è¤Ã¤Æ, ¤É¤ÎÁÇÀ¤Î¥ì¥Ù¥ë¤ò»È¤Ã¤Æɾ²Á¤¹¤ë¤«»ØÄê¤Ç¤¤Þ¤¹. <ul> <li>-l 0: 0 ÈÖÌܤÎÁÇÀ¤Î¤ß¤ò»È¤Ã¤Æɾ²Á¤·¤Þ¤¹. <li>-l 4: 0¡Á4 ÈÖÌܤÎÁÇÀ¤ò»È¤Ã¤Æɾ²Á¤·¤Þ¤¹ <li>-l -1: Á´¥ì¥Ù¥ë¤ÎÁÇÀ¤ò»È¤Ã¤Æɾ²Á¤·¤Þ¤¹ <li>-l "0 1 2" 0ÈÖÌÜ, 0¡Á1ÈÖÌÜ, 0¡Á4ÈÖÌܤÎ3¤Ä¤Îɾ²Á¤òɽ¼¨¤·¤Þ¤¹. <li>-l "0 1 -1" 0ÈÖÌÜ, 0¡Á1ÈÖÌÜ, Á´¥ì¥Ù¥ë¤Î3¤Ä¤Îɾ²Á¤òɽ¼¨¤·¤Þ¤¹. </ul> <hr> <p>$Id: learn.html 131 2007-06-09 16:18:15Z taku-ku $;</p> </body> </html>