Sophie

Sophie

distrib > Fedora > 19 > i386 > by-pkgid > 6beacea4c4bc1b8f238481a6fa680433 > files > 499

python3-whoosh-2.5.7-1.fc19.noarch.rpm



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>Query expansion and Key word extraction &mdash; Whoosh 2.5.7 documentation</title>
    
    <link rel="stylesheet" href="_static/default.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '',
        VERSION:     '2.5.7',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <link rel="top" title="Whoosh 2.5.7 documentation" href="index.html" />
    <link rel="next" title="“Did you mean... ?” Correcting errors in user queries" href="spelling.html" />
    <link rel="prev" title="How to create highlighted search result excerpts" href="highlight.html" /> 
  </head>
  <body>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="spelling.html" title="“Did you mean... ?” Correcting errors in user queries"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="highlight.html" title="How to create highlighted search result excerpts"
             accesskey="P">previous</a> |</li>
        <li><a href="index.html">Whoosh 2.5.7 documentation</a> &raquo;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="query-expansion-and-key-word-extraction">
<h1>Query expansion and Key word extraction<a class="headerlink" href="#query-expansion-and-key-word-extraction" title="Permalink to this headline">¶</a></h1>
<div class="section" id="overview">
<h2>Overview<a class="headerlink" href="#overview" title="Permalink to this headline">¶</a></h2>
<p>Whoosh provides methods for computing the &#8220;key terms&#8221; of a set of documents. For
these methods, &#8220;key terms&#8221; basically means terms that are frequent in the given
documents, but relatively infrequent in the indexed collection as a whole.</p>
<p>Because this is a purely statistical operation, not a natural language
processing or AI function, the quality of the results will vary based on the
content, the size of the document collection, and the number of documents for
which you extract keywords.</p>
<p>These methods can be useful for providing the following features to users:</p>
<ul class="simple">
<li>Search term expansion. You can extract key terms for the top N results from a
query and suggest them to the user as additional/alternate query terms to try.</li>
<li>Tag suggestion. Extracting the key terms for a single document may yield
useful suggestions for tagging the document.</li>
<li>&#8220;More like this&#8221;. You can extract key terms for the top ten or so results from
a query (and removing the original query terms), and use those key words as
the basis for another query that may find more documents using terms the user
didn&#8217;t think of.</li>
</ul>
</div>
<div class="section" id="usage">
<h2>Usage<a class="headerlink" href="#usage" title="Permalink to this headline">¶</a></h2>
<ul>
<li><p class="first">Get more documents like a certain search hit. <em>This requires that the field
you want to match on is vectored or stored, or that you have access to the
original text (such as from a database)</em>.</p>
<p>Use <a class="reference internal" href="api/searching.html#whoosh.searching.Hit.more_like_this" title="whoosh.searching.Hit.more_like_this"><tt class="xref py py-meth docutils literal"><span class="pre">more_like_this()</span></tt></a>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">results</span> <span class="o">=</span> <span class="n">mysearcher</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">myquery</span><span class="p">)</span>
<span class="n">first_hit</span> <span class="o">=</span> <span class="n">results</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">more_results</span> <span class="o">=</span> <span class="n">first_hit</span><span class="o">.</span><span class="n">more_like_this</span><span class="p">(</span><span class="s">&quot;content&quot;</span><span class="p">)</span>
</pre></div>
</div>
</li>
<li><p class="first">Extract keywords for the top N documents in a
<a class="reference internal" href="api/searching.html#whoosh.searching.Results" title="whoosh.searching.Results"><tt class="xref py py-class docutils literal"><span class="pre">whoosh.searching.Results</span></tt></a> object. <em>This requires that the field is
either vectored or stored</em>.</p>
<p>Use the <a class="reference internal" href="api/searching.html#whoosh.searching.Results.key_terms" title="whoosh.searching.Results.key_terms"><tt class="xref py py-meth docutils literal"><span class="pre">key_terms()</span></tt></a> method of the
<a class="reference internal" href="api/searching.html#whoosh.searching.Results" title="whoosh.searching.Results"><tt class="xref py py-class docutils literal"><span class="pre">whoosh.searching.Results</span></tt></a> object to extract keywords from the top N
documents of the result set.</p>
<p>For example, to extract <em>five</em> key terms from the <tt class="docutils literal"><span class="pre">content</span></tt> field of the top
<em>ten</em> documents of a results object:</p>
<div class="highlight-python"><pre>keywords = [keyword for keyword, score
            in results.key_terms("content", docs=10, numterms=5)</pre>
</div>
</li>
<li><p class="first">Extract keywords for an arbitrary set of documents. <em>This requires that the
field is either vectored or stored</em>.</p>
<p>Use the <a class="reference internal" href="api/searching.html#whoosh.searching.Searcher.document_number" title="whoosh.searching.Searcher.document_number"><tt class="xref py py-meth docutils literal"><span class="pre">document_number()</span></tt></a> or
<a class="reference internal" href="api/searching.html#whoosh.searching.Searcher.document_numbers" title="whoosh.searching.Searcher.document_numbers"><tt class="xref py py-meth docutils literal"><span class="pre">document_numbers()</span></tt></a> methods of the
<a class="reference internal" href="api/searching.html#whoosh.searching.Searcher" title="whoosh.searching.Searcher"><tt class="xref py py-class docutils literal"><span class="pre">whoosh.searching.Searcher</span></tt></a> object to get the document numbers for the
document(s) you want to extract keywords from.</p>
<p>Use the <a class="reference internal" href="api/searching.html#whoosh.searching.Searcher.key_terms" title="whoosh.searching.Searcher.key_terms"><tt class="xref py py-meth docutils literal"><span class="pre">key_terms()</span></tt></a> method of a
<a class="reference internal" href="api/searching.html#whoosh.searching.Searcher" title="whoosh.searching.Searcher"><tt class="xref py py-class docutils literal"><span class="pre">whoosh.searching.Searcher</span></tt></a> to extract the keywords, given the list of
document numbers.</p>
<p>For example, let&#8217;s say you have an index of emails. To extract key terms from
the <tt class="docutils literal"><span class="pre">content</span></tt> field of emails whose <tt class="docutils literal"><span class="pre">emailto</span></tt> field contains
<tt class="docutils literal"><span class="pre">matt&#64;whoosh.ca</span></tt>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="k">with</span> <span class="n">email_index</span><span class="o">.</span><span class="n">searcher</span><span class="p">()</span> <span class="k">as</span> <span class="n">s</span><span class="p">:</span>
    <span class="n">docnums</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">document_numbers</span><span class="p">(</span><span class="n">emailto</span><span class="o">=</span><span class="s">u&quot;matt@whoosh.ca&quot;</span><span class="p">)</span>
    <span class="n">keywords</span> <span class="o">=</span> <span class="p">[</span><span class="n">keyword</span> <span class="k">for</span> <span class="n">keyword</span><span class="p">,</span> <span class="n">score</span>
                <span class="ow">in</span> <span class="n">s</span><span class="o">.</span><span class="n">key_terms</span><span class="p">(</span><span class="n">docnums</span><span class="p">,</span> <span class="s">&quot;body&quot;</span><span class="p">)]</span>
</pre></div>
</div>
</li>
<li><p class="first">Extract keywords from arbitrary text not in the index.</p>
<p>Use the <a class="reference internal" href="api/searching.html#whoosh.searching.Searcher.key_terms_from_text" title="whoosh.searching.Searcher.key_terms_from_text"><tt class="xref py py-meth docutils literal"><span class="pre">key_terms_from_text()</span></tt></a> method of a
<a class="reference internal" href="api/searching.html#whoosh.searching.Searcher" title="whoosh.searching.Searcher"><tt class="xref py py-class docutils literal"><span class="pre">whoosh.searching.Searcher</span></tt></a> to extract the keywords, given the text:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="k">with</span> <span class="n">email_index</span><span class="o">.</span><span class="n">searcher</span><span class="p">()</span> <span class="k">as</span> <span class="n">s</span><span class="p">:</span>
    <span class="n">keywords</span> <span class="o">=</span> <span class="p">[</span><span class="n">keyword</span> <span class="k">for</span> <span class="n">keyword</span><span class="p">,</span> <span class="n">score</span>
                <span class="ow">in</span> <span class="n">s</span><span class="o">.</span><span class="n">key_terms_from_text</span><span class="p">(</span><span class="s">&quot;body&quot;</span><span class="p">,</span> <span class="n">mytext</span><span class="p">)]</span>
</pre></div>
</div>
</li>
</ul>
</div>
<div class="section" id="expansion-models">
<h2>Expansion models<a class="headerlink" href="#expansion-models" title="Permalink to this headline">¶</a></h2>
<p>The <tt class="docutils literal"><span class="pre">ExpansionModel</span></tt> subclasses in the <tt class="xref py py-mod docutils literal"><span class="pre">whoosh.classify</span></tt> module implement
different weighting functions for key words. These models are translated into
Python from original Java implementations in Terrier.</p>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
  <h3><a href="index.html">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">Query expansion and Key word extraction</a><ul>
<li><a class="reference internal" href="#overview">Overview</a></li>
<li><a class="reference internal" href="#usage">Usage</a></li>
<li><a class="reference internal" href="#expansion-models">Expansion models</a></li>
</ul>
</li>
</ul>

  <h4>Previous topic</h4>
  <p class="topless"><a href="highlight.html"
                        title="previous chapter">How to create highlighted search result excerpts</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="spelling.html"
                        title="next chapter">&#8220;Did you mean... ?&#8221; Correcting errors in user queries</a></p>
  <h3>This Page</h3>
  <ul class="this-page-menu">
    <li><a href="_sources/keywords.txt"
           rel="nofollow">Show Source</a></li>
  </ul>
<div id="searchbox" style="display: none">
  <h3>Quick search</h3>
    <form class="search" action="search.html" method="get">
      <input type="text" name="q" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    <p class="searchtip" style="font-size: 90%">
    Enter search terms or a module, class or function name.
    </p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="spelling.html" title="“Did you mean... ?” Correcting errors in user queries"
             >next</a> |</li>
        <li class="right" >
          <a href="highlight.html" title="How to create highlighted search result excerpts"
             >previous</a> |</li>
        <li><a href="index.html">Whoosh 2.5.7 documentation</a> &raquo;</li> 
      </ul>
    </div>
    <div class="footer">
        &copy; Copyright 2007-2012 Matt Chaput.
      Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.1.3.
    </div>
  </body>
</html>