Sophie

Sophie

distrib > Fedora > 18 > x86_64 > media > updates > by-pkgid > e450e7f3d6075c4a54de19e68d38177f > files > 331

groonga-doc-3.0.5-1.fc18.x86_64.rpm


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>5.4. Correction &mdash; groonga v3.0.5 documentation</title>
    
    <link rel="stylesheet" href="../_static/groonga.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '../',
        VERSION:     '3.0.5',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="../_static/jquery.js"></script>
    <script type="text/javascript" src="../_static/underscore.js"></script>
    <script type="text/javascript" src="../_static/doctools.js"></script>
    <link rel="shortcut icon" href="../_static/favicon.ico"/>
    <link rel="top" title="groonga v3.0.5 documentation" href="../index.html" />
    <link rel="up" title="5. Suggest" href="../suggest.html" />
    <link rel="next" title="5.5. Suggestion" href="suggestion.html" />
    <link rel="prev" title="5.3. Completion" href="completion.html" /> 
  </head>
  <body>
<div class="header">
  <h1 class="title">
    <a id="top-link" href="../index.html">
      <span class="project">groonga</span>
      <span class="separator">-</span>
      <span class="description">An open-source fulltext search engine and column store.</span>
    </a>
  </h1>

  <div class="other-language-links">
    <ul>
      <li><a href="../../../ja/html/suggest/correction.html"><img src="../_static/jp.png" alt="日本語">日本語版はこちら</a></li>
    </ul>
  </div>
</div>
  

    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="suggestion.html" title="5.5. Suggestion"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="completion.html" title="5.3. Completion"
             accesskey="P">previous</a> |</li>
        <li><a href="../index.html">groonga v3.0.5 documentation</a> &raquo;</li>
          <li><a href="../suggest.html" accesskey="U">5. Suggest</a> &raquo;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="correction">
<h1>5.4. Correction<a class="headerlink" href="#correction" title="Permalink to this headline">¶</a></h1>
<p>This section describes about the following correction
features:</p>
<ul class="simple">
<li>How it works</li>
<li>How to use</li>
<li>How to learn</li>
</ul>
<div class="section" id="how-it-works">
<h2>5.4.1. How it works<a class="headerlink" href="#how-it-works" title="Permalink to this headline">¶</a></h2>
<p>The correction feature uses three searches to compute corrected
words:</p>
<blockquote>
<div><ol class="arabic simple">
<li>Cooccurrence search against learned data.</li>
<li>Similar search against registered words. (optional)</li>
</ol>
</div></blockquote>
<div class="section" id="cooccurrence-search">
<h3>5.4.1.1. Cooccurrence search<a class="headerlink" href="#cooccurrence-search" title="Permalink to this headline">¶</a></h3>
<p>Cooccurrence search can find registered words from user's
wrong input. It uses user submit sequences that will be
learned from query logs, access logs and so on.</p>
<p>For example, there are the following user submissions:</p>
<table border="1" class="docutils">
<colgroup>
<col width="41%" />
<col width="59%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">query</th>
<th class="head">time</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>serach (typo!)</td>
<td>2011-08-10T22:20:50+09:00</td>
</tr>
<tr class="row-odd"><td>search (fixed!)</td>
<td>2011-08-10T22:20:52+09:00</td>
</tr>
</tbody>
</table>
<p>Groonga creates the following correction pair from the above
submissions:</p>
<table border="1" class="docutils">
<colgroup>
<col width="33%" />
<col width="67%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">input</th>
<th class="head">corrected word</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>serach</td>
<td>search</td>
</tr>
</tbody>
</table>
<p>Groonga treats continuous submissions within a minute as
input correction by user. Not submitted user input sequence
between two submissions isn't used as learned data for
correction.</p>
<p>If an user inputs &quot;serach&quot; and cooccurrence search returns
&quot;search&quot; because &quot;serach&quot; is in input column and
corresponding corrected word column value is &quot;search&quot;.</p>
</div>
<div class="section" id="similar-search">
<h3>5.4.1.2. Similar search<a class="headerlink" href="#similar-search" title="Permalink to this headline">¶</a></h3>
<p>Similar search can find registered words that has one or
more the same tokens as user input. TokenBigram tokenizer is
used for tokenization because suggest dataset schema
created by <a class="reference internal" href="../reference/executables/groonga-suggest-create-dataset.html"><em>groonga-suggest-create-dataset</em></a>
uses TokenBigram tokenizer as the default tokenizer.</p>
<p>For example, there is a registered query &quot;search engine&quot;. An
user can find &quot;search engine&quot; by &quot;web search service&quot;,
&quot;sound engine&quot; and so on. Because &quot;search engine&quot; and &quot;web
search engine&quot; have the same token &quot;search&quot; and &quot;search
engine&quot; and &quot;sound engine&quot; have the same token &quot;engine&quot;.</p>
<p>&quot;search engine&quot; is tokenized to &quot;search&quot; and &quot;engine&quot;
tokens. (Groonga's TokenBigram tokenizer doesn't tokenize
two characters for continuous alphabets and continuous
digits for reducing search
noise. TokenBigramSplitSymbolAlphaDigit tokenizer should be
used to ensure tokenizing to two characters.) &quot;web search
service&quot; is tokenized to &quot;web&quot;, &quot;search&quot; and
&quot;service&quot;. &quot;sound engine&quot; is tokenized to &quot;sound&quot; and
&quot;engine&quot;.</p>
</div>
</div>
<div class="section" id="how-to-use">
<h2>5.4.2. How to use<a class="headerlink" href="#how-to-use" title="Permalink to this headline">¶</a></h2>
<p>Groonga provides <a class="reference internal" href="../reference/commands/suggest.html"><em>suggest</em></a> command to use
correction. <cite>--type correct</cite> option requests corrections.</p>
<p>For example, here is an command to get correction results by
&quot;saerch&quot;:</p>
<p>Execution example:</p>
<div class="highlight-none"><div class="highlight"><pre>suggest --table item_query --column kana --types correction --frequency_threshold 1 --query saerch
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     &quot;correct&quot;: [
#       [
#         1
#       ],
#       [
#         [
#           &quot;_key&quot;,
#           &quot;ShortText&quot;
#         ],
#         [
#           &quot;_score&quot;,
#           &quot;Int32&quot;
#         ]
#       ],
#       [
#         &quot;search&quot;,
#         1
#       ]
#     ]
#   }
# ]
</pre></div>
</div>
</div>
<div class="section" id="how-it-learns">
<h2>5.4.3. How it learns<a class="headerlink" href="#how-it-learns" title="Permalink to this headline">¶</a></h2>
<p>Cooccurrence search uses learned data. They are based on
query logs, access logs and so on. To create learned data,
groonga needs user submit inputs with time stamp.</p>
<p>For example, an user wants to search by &quot;search&quot; but the
user has typo &quot;saerch&quot; before inputs the correct query. The
user inputs the query with the following sequence:</p>
<blockquote>
<div><ol class="arabic simple">
<li>2011-08-10T13:33:23+09:00: s</li>
<li>2011-08-10T13:33:23+09:00: sa</li>
<li>2011-08-10T13:33:24+09:00: sae</li>
<li>2011-08-10T13:33:24+09:00: saer</li>
<li>2011-08-10T13:33:24+09:00: saerc</li>
<li>2011-08-10T13:33:25+09:00: saerch (submit!)</li>
<li>2011-08-10T13:33:29+09:00: serch (correcting...)</li>
<li>2011-08-10T13:33:30+09:00: search (submit!)</li>
</ol>
</div></blockquote>
<p>Groonga can be learned from the input sequence by the
following command:</p>
<div class="highlight-none"><div class="highlight"><pre>load --table event_query --each &#39;suggest_preparer(_id, type, item, sequence, time, pair_query)&#39;
[
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950803.86057, &quot;item&quot;: &quot;s&quot;},
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950803.96857, &quot;item&quot;: &quot;sa&quot;},
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950804.26057, &quot;item&quot;: &quot;sae&quot;},
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950804.56057, &quot;item&quot;: &quot;saer&quot;},
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950804.76057, &quot;item&quot;: &quot;saerc&quot;},
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950805.76057, &quot;item&quot;: &quot;saerch&quot;, &quot;type&quot;: &quot;submit&quot;},
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950809.76057, &quot;item&quot;: &quot;serch&quot;},
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950810.86057, &quot;item&quot;: &quot;search&quot;, &quot;type&quot;: &quot;submit&quot;}
]
</pre></div>
</div>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
  <h3><a href="../index.html">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">5.4. Correction</a><ul>
<li><a class="reference internal" href="#how-it-works">5.4.1. How it works</a><ul>
<li><a class="reference internal" href="#cooccurrence-search">5.4.1.1. Cooccurrence search</a></li>
<li><a class="reference internal" href="#similar-search">5.4.1.2. Similar search</a></li>
</ul>
</li>
<li><a class="reference internal" href="#how-to-use">5.4.2. How to use</a></li>
<li><a class="reference internal" href="#how-it-learns">5.4.3. How it learns</a></li>
</ul>
</li>
</ul>

  <h4>Previous topic</h4>
  <p class="topless"><a href="completion.html"
                        title="previous chapter">5.3. Completion</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="suggestion.html"
                        title="next chapter">5.5. Suggestion</a></p>
  <h3>This Page</h3>
  <ul class="this-page-menu">
    <li><a href="../_sources/suggest/correction.txt"
           rel="nofollow">Show Source</a></li>
  </ul>
<div id="searchbox" style="display: none">
  <h3>Quick search</h3>
    <form class="search" action="../search.html" method="get">
      <input type="text" name="q" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    <p class="searchtip" style="font-size: 90%">
    Enter search terms or a module, class or function name.
    </p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="suggestion.html" title="5.5. Suggestion"
             >next</a> |</li>
        <li class="right" >
          <a href="completion.html" title="5.3. Completion"
             >previous</a> |</li>
        <li><a href="../index.html">groonga v3.0.5 documentation</a> &raquo;</li>
          <li><a href="../suggest.html" >5. Suggest</a> &raquo;</li> 
      </ul>
    </div>
    <div class="footer">
        &copy; Copyright 2009-2013, Brazil, Inc.
    </div>
  </body>
</html>