    5.4. Correction — groonga v3.0.5 documentation
    5. Suggest
  <div class="section" id="correction">
<h1>5.4. Correction<a class="headerlink" href="#correction" title="Permalink to this headline">¶</a></h1>
<p>This section describes about the following correction
<ul class="simple">
<li>How it works</li>
<li>How to use</li>
<li>How to learn</li>
<div class="section" id="how-it-works">
<h2>5.4.1. How it works<a class="headerlink" href="#how-it-works" title="Permalink to this headline">¶</a></h2>
<p>The correction feature uses three searches to compute corrected
<div><ol class="arabic simple">
<li>Cooccurrence search against learned data.</li>
<li>Similar search against registered words. (optional)</li>
<div class="section" id="cooccurrence-search">
<h3> Cooccurrence search<a class="headerlink" href="#cooccurrence-search" title="Permalink to this headline">¶</a></h3>
<p>Cooccurrence search can find registered words from user's
wrong input. It uses user submit sequences that will be
learned from query logs, access logs and so on.</p>
<p>For example, there are the following user submissions:</p>
<table border="1" class="docutils">
<col width="41%" />
<col width="59%" />
<thead valign="bottom">
<tr class="row-odd"><th class="head">query</th>
<th class="head">time</th>
<tbody valign="top">
<tr class="row-even"><td>serach (typo!)</td>
<tr class="row-odd"><td>search (fixed!)</td>
<p>Groonga creates the following correction pair from the above
<table border="1" class="docutils">
<col width="33%" />
<col width="67%" />
<thead valign="bottom">
<tr class="row-odd"><th class="head">input</th>
<th class="head">corrected word</th>
<tbody valign="top">
<tr class="row-even"><td>serach</td>
<p>Groonga treats continuous submissions within a minute as
input correction by user. Not submitted user input sequence
between two submissions isn't used as learned data for
<p>If an user inputs &quot;serach&quot; and cooccurrence search returns
&quot;search&quot; because &quot;serach&quot; is in input column and
corresponding corrected word column value is &quot;search&quot;.</p>
<div class="section" id="similar-search">
<h3> Similar search<a class="headerlink" href="#similar-search" title="Permalink to this headline">¶</a></h3>
<p>Similar search can find registered words that has one or
more the same tokens as user input. TokenBigram tokenizer is
used for tokenization because suggest dataset schema
created by <a class="reference internal" href="../reference/executables/groonga-suggest-create-dataset.html"><em>groonga-suggest-create-dataset</em></a>
uses TokenBigram tokenizer as the default tokenizer.</p>
<p>For example, there is a registered query &quot;search engine&quot;. An
user can find &quot;search engine&quot; by &quot;web search service&quot;,
&quot;sound engine&quot; and so on. Because &quot;search engine&quot; and &quot;web
search engine&quot; have the same token &quot;search&quot; and &quot;search
engine&quot; and &quot;sound engine&quot; have the same token &quot;engine&quot;.</p>
<p>&quot;search engine&quot; is tokenized to &quot;search&quot; and &quot;engine&quot;
tokens. (Groonga's TokenBigram tokenizer doesn't tokenize
two characters for continuous alphabets and continuous
digits for reducing search
noise. TokenBigramSplitSymbolAlphaDigit tokenizer should be
used to ensure tokenizing to two characters.) &quot;web search
service&quot; is tokenized to &quot;web&quot;, &quot;search&quot; and
&quot;service&quot;. &quot;sound engine&quot; is tokenized to &quot;sound&quot; and
<div class="section" id="how-to-use">
<h2>5.4.2. How to use<a class="headerlink" href="#how-to-use" title="Permalink to this headline">¶</a></h2>
<p>Groonga provides <a class="reference internal" href="../reference/commands/suggest.html"><em>suggest</em></a> command to use
correction. <cite>--type correct</cite> option requests corrections.</p>
<p>For example, here is an command to get correction results by
<p>Execution example:</p>
<div class="highlight-none"><div class="highlight"><pre>suggest --table item_query --column kana --types correction --frequency_threshold 1 --query saerch
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     &quot;correct&quot;: [
#       [
#         1
#       ],
#       [
#         [
#           &quot;_key&quot;,
#           &quot;ShortText&quot;
#         ],
#         [
#           &quot;_score&quot;,
#           &quot;Int32&quot;
#         ]
#       ],
#       [
#         &quot;search&quot;,
#         1
#       ]
#     ]
#   }
# ]
<div class="section" id="how-it-learns">
<h2>5.4.3. How it learns<a class="headerlink" href="#how-it-learns" title="Permalink to this headline">¶</a></h2>
<p>Cooccurrence search uses learned data. They are based on
query logs, access logs and so on. To create learned data,
groonga needs user submit inputs with time stamp.</p>
<p>For example, an user wants to search by &quot;search&quot; but the
user has typo &quot;saerch&quot; before inputs the correct query. The
user inputs the query with the following sequence:</p>
<div><ol class="arabic simple">
<li>2011-08-10T13:33:23+09:00: s</li>
<li>2011-08-10T13:33:23+09:00: sa</li>
<li>2011-08-10T13:33:24+09:00: sae</li>
<li>2011-08-10T13:33:24+09:00: saer</li>
<li>2011-08-10T13:33:24+09:00: saerc</li>
<li>2011-08-10T13:33:25+09:00: saerch (submit!)</li>
<li>2011-08-10T13:33:29+09:00: serch (correcting...)</li>
<li>2011-08-10T13:33:30+09:00: search (submit!)</li>
<p>Groonga can be learned from the input sequence by the
following command:</p>
<div class="highlight-none"><div class="highlight"><pre>load --table event_query --each &#39;suggest_preparer(_id, type, item, sequence, time, pair_query)&#39;
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950803.86057, &quot;item&quot;: &quot;s&quot;},
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950803.96857, &quot;item&quot;: &quot;sa&quot;},
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950804.26057, &quot;item&quot;: &quot;sae&quot;},
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950804.56057, &quot;item&quot;: &quot;saer&quot;},
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950804.76057, &quot;item&quot;: &quot;saerc&quot;},
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950805.76057, &quot;item&quot;: &quot;saerch&quot;, &quot;type&quot;: &quot;submit&quot;},
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950809.76057, &quot;item&quot;: &quot;serch&quot;},
{&quot;sequence&quot;: &quot;1&quot;, &quot;time&quot;: 1312950810.86057, &quot;item&quot;: &quot;search&quot;, &quot;type&quot;: &quot;submit&quot;}

      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
  <h3><a href="../index.html">Table Of Contents</a></h3>
<li><a class="reference internal" href="#">5.4. Correction</a><ul>
<li><a class="reference internal" href="#how-it-works">5.4.1. How it works</a><ul>
<li><a class="reference internal" href="#cooccurrence-search"> Cooccurrence search</a></li>
<li><a class="reference internal" href="#similar-search"> Similar search</a></li>
<li><a class="reference internal" href="#how-to-use">5.4.2. How to use</a></li>
<li><a class="reference internal" href="#how-it-learns">5.4.3. How it learns</a></li>

  <h4>Previous topic</h4>
  <p class="topless"><a href="completion.html"
                        title="previous chapter">5.3. Completion</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="suggestion.html"
                        title="next chapter">5.5. Suggestion</a></p>
  <h3>This Page</h3>
  <ul class="this-page-menu">
    <li><a href="../_sources/suggest/correction.txt"
           rel="nofollow">Show Source</a></li>
