<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Indexing and searching document hierarchies — Whoosh 2.5.1 documentation</title> <link rel="stylesheet" href="_static/default.css" type="text/css" /> <link rel="stylesheet" href="_static/pygments.css" type="text/css" /> <script type="text/javascript"> var DOCUMENTATION_OPTIONS = { URL_ROOT: '', VERSION: '2.5.1', COLLAPSE_INDEX: false, FILE_SUFFIX: '.html', HAS_SOURCE: true }; </script> <script type="text/javascript" src="_static/jquery.js"></script> <script type="text/javascript" src="_static/underscore.js"></script> <script type="text/javascript" src="_static/doctools.js"></script> <link rel="top" title="Whoosh 2.5.1 documentation" href="index.html" /> <link rel="next" title="Whoosh recipes" href="recipes.html" /> <link rel="prev" title="Concurrency, locking, and versioning" href="threads.html" /> </head> <body> <div class="related"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="genindex.html" title="General Index" accesskey="I">index</a></li> <li class="right" > <a href="py-modindex.html" title="Python Module Index" >modules</a> |</li> <li class="right" > <a href="recipes.html" title="Whoosh recipes" accesskey="N">next</a> |</li> <li class="right" > <a href="threads.html" title="Concurrency, locking, and versioning" accesskey="P">previous</a> |</li> <li><a href="index.html">Whoosh 2.5.1 documentation</a> »</li> </ul> </div> <div class="document"> <div class="documentwrapper"> <div class="bodywrapper"> <div class="body"> <div class="section" id="indexing-and-searching-document-hierarchies"> <h1>Indexing and searching document hierarchies<a class="headerlink" href="#indexing-and-searching-document-hierarchies" title="Permalink to this headline">¶</a></h1> <div class="section" id="overview"> <h2>Overview<a class="headerlink" href="#overview" title="Permalink to this headline">¶</a></h2> <p>Whoosh’s full-text index is essentially a flat database of documents. However, Whoosh supports two techniques for simulating the indexing and querying of hierarchical documents, that is, sets of documents that form a parent-child hierarchy, such as “Chapter - Section - Paragraph” or “Module - Class - Method”.</p> <p>You can specify parent-child relationships <em>at indexing time</em>, by grouping documents in the same hierarchy, and then use the <a class="reference internal" href="api/query.html#whoosh.query.NestedParent" title="whoosh.query.NestedParent"><tt class="xref py py-class docutils literal"><span class="pre">whoosh.query.NestedParent</span></tt></a> and/or <a class="reference internal" href="api/query.html#whoosh.query.NestedChildren" title="whoosh.query.NestedChildren"><tt class="xref py py-class docutils literal"><span class="pre">whoosh.query.NestedChildren</span></tt></a> to find parents based on their children or vice-versa.</p> <p>Alternatively, you can use <em>query time joins</em>, essentially like external key joins in a database, where you perform one search to find a relevant document, then use a stored value on that document (for example, a <tt class="docutils literal"><span class="pre">parent</span></tt> field) to look up another document.</p> <p>Both methods have pros and cons.</p> </div> <div class="section" id="using-nested-document-indexing"> <h2>Using nested document indexing<a class="headerlink" href="#using-nested-document-indexing" title="Permalink to this headline">¶</a></h2> <div class="section" id="indexing"> <h3>Indexing<a class="headerlink" href="#indexing" title="Permalink to this headline">¶</a></h3> <p>This method works by indexing a “parent” document and all its “child” documents <em>as a “group”</em> so they are guaranteed to end up in the same segment. You can use the context manager returned by <tt class="docutils literal"><span class="pre">IndexWriter.group()</span></tt> to group documents:</p> <div class="highlight-python"><div class="highlight"><pre><span class="k">with</span> <span class="n">ix</span><span class="o">.</span><span class="n">writer</span><span class="p">()</span> <span class="k">as</span> <span class="n">w</span><span class="p">:</span> <span class="k">with</span> <span class="n">w</span><span class="o">.</span><span class="n">group</span><span class="p">():</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"class"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"Index"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"add document"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"add reader"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"close"</span><span class="p">)</span> <span class="k">with</span> <span class="n">w</span><span class="o">.</span><span class="n">group</span><span class="p">():</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"class"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"Accumulator"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"add"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"get result"</span><span class="p">)</span> <span class="k">with</span> <span class="n">w</span><span class="o">.</span><span class="n">group</span><span class="p">():</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"class"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"Calculator"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"add"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"add all"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"add some"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"multiply"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"close"</span><span class="p">)</span> <span class="k">with</span> <span class="n">w</span><span class="o">.</span><span class="n">group</span><span class="p">():</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"class"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"Deleter"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"add"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"delete"</span><span class="p">)</span> </pre></div> </div> <p>Alternatively you can use the <tt class="docutils literal"><span class="pre">start_group()</span></tt> and <tt class="docutils literal"><span class="pre">end_group()</span></tt> methods:</p> <div class="highlight-python"><div class="highlight"><pre><span class="k">with</span> <span class="n">ix</span><span class="o">.</span><span class="n">writer</span><span class="p">()</span> <span class="k">as</span> <span class="n">w</span><span class="p">:</span> <span class="n">w</span><span class="o">.</span><span class="n">start_group</span><span class="p">()</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"class"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"Index"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"add document"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"add reader"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"close"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">end_group</span><span class="p">()</span> </pre></div> </div> <p>Each level of the hierarchy should have a query that distinguishes it from other levels (for example, in the above index, you can use <tt class="docutils literal"><span class="pre">kind:class</span></tt> or <tt class="docutils literal"><span class="pre">kind:method</span></tt> to match different levels of the hierarchy).</p> <p>Once you’ve indexed the hierarchy of documents, you can use two query types to find parents based on children or vice-versa.</p> <p>(There is currently no support in the default query parser for nested queries.)</p> </div> <div class="section" id="nestedparent-query"> <h3>NestedParent query<a class="headerlink" href="#nestedparent-query" title="Permalink to this headline">¶</a></h3> <p>The <a class="reference internal" href="api/query.html#whoosh.query.NestedParent" title="whoosh.query.NestedParent"><tt class="xref py py-class docutils literal"><span class="pre">whoosh.query.NestedParent</span></tt></a> query type lets you specify a query for child documents, but have the query return an “ancestor” document from higher in the hierarchy:</p> <div class="highlight-python"><div class="highlight"><pre><span class="c"># First, we need a query that matches all the documents in the "parent"</span> <span class="c"># level we want of the hierarchy</span> <span class="n">all_parents</span> <span class="o">=</span> <span class="n">query</span><span class="o">.</span><span class="n">Term</span><span class="p">(</span><span class="s">"kind"</span><span class="p">,</span> <span class="s">"class"</span><span class="p">)</span> <span class="c"># Then, we need a query that matches the children we want to find</span> <span class="n">wanted_kids</span> <span class="o">=</span> <span class="n">query</span><span class="o">.</span><span class="n">Term</span><span class="p">(</span><span class="s">"name"</span><span class="p">,</span> <span class="s">"close"</span><span class="p">)</span> <span class="c"># Now we can make a query that will match documents where "name" is</span> <span class="c"># "close", but the query will return the "parent" documents of the matching</span> <span class="c"># children</span> <span class="n">q</span> <span class="o">=</span> <span class="n">query</span><span class="o">.</span><span class="n">NestedParent</span><span class="p">(</span><span class="n">all_parents</span><span class="p">,</span> <span class="n">wanted_kids</span><span class="p">)</span> <span class="c"># results = Index, Calculator</span> </pre></div> </div> <p>Note that in a hierarchy with more than two levels, you can specify a “parents” query that matches any level of the hierarchy, so you can return the top-level ancestors of the matching children, or the second level, third level, etc.</p> <p>The query works by first building a bit vector representing which documents are “parents”:</p> <div class="highlight-python"><pre>Index | Calculator | | 1000100100000100 | | | Deleter Accumulator</pre> </div> <p>Then for each match of the “child” query, it calculates the previous parent from the bit vector and returns it as a match (it only returns each parent once no matter how many children match). This parent lookup is very efficient:</p> <div class="highlight-python"><pre>1000100100000100 | |<-+ close</pre> </div> </div> <div class="section" id="nestedchildren-query"> <h3>NestedChildren query<a class="headerlink" href="#nestedchildren-query" title="Permalink to this headline">¶</a></h3> <p>The opposite of <tt class="docutils literal"><span class="pre">NestedParent</span></tt> is <a class="reference internal" href="api/query.html#whoosh.query.NestedChildren" title="whoosh.query.NestedChildren"><tt class="xref py py-class docutils literal"><span class="pre">whoosh.query.NestedChildren</span></tt></a>. This query lets you match parents but return their children. This is useful, for example, to search for an album title and return the songs in the album:</p> <div class="highlight-python"><div class="highlight"><pre><span class="c"># Query that matches all documents in the "parent" level we want to match</span> <span class="c"># at</span> <span class="n">all_parents</span> <span class="o">=</span> <span class="n">query</span><span class="o">.</span><span class="n">Term</span><span class="p">(</span><span class="s">"kind"</span><span class="p">,</span> <span class="s">"album"</span><span class="p">)</span> <span class="c"># Parent documents we want to match</span> <span class="n">wanted_parents</span> <span class="o">=</span> <span class="n">query</span><span class="o">.</span><span class="n">Term</span><span class="p">(</span><span class="s">"album_title"</span><span class="p">,</span> <span class="s">"heaven"</span><span class="p">)</span> <span class="c"># Now we can make a query that will match parent documents where "album_title"</span> <span class="c"># contains "heaven", but the query will return the "child" documents of the</span> <span class="c"># matching parents</span> <span class="n">q1</span> <span class="o">=</span> <span class="n">query</span><span class="o">.</span><span class="n">NestedChildren</span><span class="p">(</span><span class="n">all_parents</span><span class="p">,</span> <span class="n">wanted_parents</span><span class="p">)</span> </pre></div> </div> <p>You can then combine that query with an <tt class="docutils literal"><span class="pre">AND</span></tt> clause, for example to find songs with “hell” in the song title that occur on albums with “heaven” in the album title:</p> <div class="highlight-python"><div class="highlight"><pre><span class="n">q2</span> <span class="o">=</span> <span class="n">query</span><span class="o">.</span><span class="n">And</span><span class="p">([</span><span class="n">q1</span><span class="p">,</span> <span class="n">query</span><span class="o">.</span><span class="n">Term</span><span class="p">(</span><span class="s">"song_title"</span><span class="p">,</span> <span class="s">"hell"</span><span class="p">)])</span> </pre></div> </div> </div> <div class="section" id="deleting-and-updating-hierarchical-documents"> <h3>Deleting and updating hierarchical documents<a class="headerlink" href="#deleting-and-updating-hierarchical-documents" title="Permalink to this headline">¶</a></h3> <p>The drawback of the index-time method is <em>updating and deleting</em>. Because the implementation of the queries depends on the parent and child documents being contiguous in the segment, you can’t update/delete just one child document. You can only update/delete an entire top-level document at once (for example, if your hierarchy is “Chapter - Section - Paragraph”, you can only update or delete entire chapters, not a section or paragraph). If the top-level of the hierarchy represents very large blocks of text, this can involve a lot of deleting and reindexing.</p> <p>Currently <tt class="docutils literal"><span class="pre">Writer.update_document()</span></tt> does not automatically work with nested documents. You must manually delete and re-add document groups to update them.</p> <p>To delete nested document groups, use the <tt class="docutils literal"><span class="pre">Writer.delete_by_query()</span></tt> method with a <tt class="docutils literal"><span class="pre">NestedParent</span></tt> query:</p> <div class="highlight-python"><div class="highlight"><pre><span class="c"># Delete the "Accumulator" class</span> <span class="n">all_parents</span> <span class="o">=</span> <span class="n">query</span><span class="o">.</span><span class="n">Term</span><span class="p">(</span><span class="s">"kind"</span><span class="p">,</span> <span class="s">"class"</span><span class="p">)</span> <span class="n">to_delete</span> <span class="o">=</span> <span class="n">query</span><span class="o">.</span><span class="n">Term</span><span class="p">(</span><span class="s">"name"</span><span class="p">,</span> <span class="s">"Accumulator"</span><span class="p">)</span> <span class="n">q</span> <span class="o">=</span> <span class="n">query</span><span class="o">.</span><span class="n">NestedParent</span><span class="p">(</span><span class="n">all_parents</span><span class="p">,</span> <span class="n">to_delete</span><span class="p">)</span> <span class="k">with</span> <span class="n">myindex</span><span class="o">.</span><span class="n">writer</span><span class="p">()</span> <span class="k">as</span> <span class="n">w</span><span class="p">:</span> <span class="n">w</span><span class="o">.</span><span class="n">delete_by_query</span><span class="p">(</span><span class="n">q</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="section" id="using-query-time-joins"> <h2>Using query-time joins<a class="headerlink" href="#using-query-time-joins" title="Permalink to this headline">¶</a></h2> <p>A second technique for simulating hierarchical documents in Whoosh involves using a stored field on each document to point to its parent, and then using the value of that field at query time to find parents and children.</p> <p>For example, if we index a hierarchy of classes and methods using pointers to parents instead of nesting:</p> <div class="highlight-python"><div class="highlight"><pre><span class="c"># Store a pointer to the parent on each "method" document</span> <span class="k">with</span> <span class="n">ix</span><span class="o">.</span><span class="n">writer</span><span class="p">()</span> <span class="k">as</span> <span class="n">w</span><span class="p">:</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"class"</span><span class="p">,</span> <span class="n">c_name</span><span class="o">=</span><span class="s">"Index"</span><span class="p">,</span> <span class="n">docstring</span><span class="o">=</span><span class="s">"..."</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">m_name</span><span class="o">=</span><span class="s">"add document"</span><span class="p">,</span> <span class="n">parent</span><span class="o">=</span><span class="s">"Index"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">m_name</span><span class="o">=</span><span class="s">"add reader"</span><span class="p">,</span> <span class="n">parent</span><span class="o">=</span><span class="s">"Index"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">m_name</span><span class="o">=</span><span class="s">"close"</span><span class="p">,</span> <span class="n">parent</span><span class="o">=</span><span class="s">"Index"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"class"</span><span class="p">,</span> <span class="n">c_name</span><span class="o">=</span><span class="s">"Accumulator"</span><span class="p">,</span> <span class="n">docstring</span><span class="o">=</span><span class="s">"..."</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">m_name</span><span class="o">=</span><span class="s">"add"</span><span class="p">,</span> <span class="n">parent</span><span class="o">=</span><span class="s">"Accumulator"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">m_name</span><span class="o">=</span><span class="s">"get result"</span><span class="p">,</span> <span class="n">parent</span><span class="o">=</span><span class="s">"Accumulator"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"class"</span><span class="p">,</span> <span class="n">c_name</span><span class="o">=</span><span class="s">"Calculator"</span><span class="p">,</span> <span class="n">docstring</span><span class="o">=</span><span class="s">"..."</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">m_name</span><span class="o">=</span><span class="s">"add"</span><span class="p">,</span> <span class="n">parent</span><span class="o">=</span><span class="s">"Calculator"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">m_name</span><span class="o">=</span><span class="s">"add all"</span><span class="p">,</span> <span class="n">parent</span><span class="o">=</span><span class="s">"Calculator"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">m_name</span><span class="o">=</span><span class="s">"add some"</span><span class="p">,</span> <span class="n">parent</span><span class="o">=</span><span class="s">"Calculator"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">m_name</span><span class="o">=</span><span class="s">"multiply"</span><span class="p">,</span> <span class="n">parent</span><span class="o">=</span><span class="s">"Calculator"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">m_name</span><span class="o">=</span><span class="s">"close"</span><span class="p">,</span> <span class="n">parent</span><span class="o">=</span><span class="s">"Calculator"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"class"</span><span class="p">,</span> <span class="n">c_name</span><span class="o">=</span><span class="s">"Deleter"</span><span class="p">,</span> <span class="n">docstring</span><span class="o">=</span><span class="s">"..."</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">m_name</span><span class="o">=</span><span class="s">"add"</span><span class="p">,</span> <span class="n">parent</span><span class="o">=</span><span class="s">"Deleter"</span><span class="p">)</span> <span class="n">w</span><span class="o">.</span><span class="n">add_document</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">"method"</span><span class="p">,</span> <span class="n">m_name</span><span class="o">=</span><span class="s">"delete"</span><span class="p">,</span> <span class="n">parent</span><span class="o">=</span><span class="s">"Deleter"</span><span class="p">)</span> <span class="c"># Now do manual joins at query time</span> <span class="k">with</span> <span class="n">ix</span><span class="o">.</span><span class="n">searcher</span><span class="p">()</span> <span class="k">as</span> <span class="n">s</span><span class="p">:</span> <span class="c"># Tip: Searcher.document() and Searcher.documents() let you look up</span> <span class="c"># documents by field values more easily than using Searcher.search()</span> <span class="c"># Children to parents:</span> <span class="c"># Print the docstrings of classes on which "close" methods occur</span> <span class="k">for</span> <span class="n">child_doc</span> <span class="ow">in</span> <span class="n">s</span><span class="o">.</span><span class="n">documents</span><span class="p">(</span><span class="n">m_name</span><span class="o">=</span><span class="s">"close"</span><span class="p">):</span> <span class="c"># Use the stored value of the "parent" field to look up the parent</span> <span class="c"># document</span> <span class="n">parent_doc</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">document</span><span class="p">(</span><span class="n">c_name</span><span class="o">=</span><span class="n">child_doc</span><span class="p">[</span><span class="s">"parent"</span><span class="p">])</span> <span class="c"># Print the parent document's stored docstring field</span> <span class="k">print</span><span class="p">(</span><span class="n">parent_doc</span><span class="p">[</span><span class="s">"docstring"</span><span class="p">])</span> <span class="c"># Parents to children:</span> <span class="c"># Find classes with "big" in the docstring and print their methods</span> <span class="n">q</span> <span class="o">=</span> <span class="n">query</span><span class="o">.</span><span class="n">Term</span><span class="p">(</span><span class="s">"kind"</span><span class="p">,</span> <span class="s">"class"</span><span class="p">)</span> <span class="o">&</span> <span class="n">query</span><span class="o">.</span><span class="n">Term</span><span class="p">(</span><span class="s">"docstring"</span><span class="p">,</span> <span class="s">"big"</span><span class="p">)</span> <span class="k">for</span> <span class="n">hit</span> <span class="ow">in</span> <span class="n">s</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">limit</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s">"Class name="</span><span class="p">,</span> <span class="n">hit</span><span class="p">[</span><span class="s">"c_name"</span><span class="p">],</span> <span class="s">"methods:"</span><span class="p">)</span> <span class="k">for</span> <span class="n">child_doc</span> <span class="ow">in</span> <span class="n">s</span><span class="o">.</span><span class="n">documents</span><span class="p">(</span><span class="n">parent</span><span class="o">=</span><span class="n">hit</span><span class="p">[</span><span class="s">"c_name"</span><span class="p">]):</span> <span class="k">print</span><span class="p">(</span><span class="s">" Method name="</span><span class="p">,</span> <span class="n">child_doc</span><span class="p">[</span><span class="s">"m_name"</span><span class="p">])</span> </pre></div> </div> <p>This technique is more flexible than index-time nesting in that you can delete/update individual documents in the hierarchy piece by piece, although it doesn’t support finding different parent levels as easily. It is also slower than index-time nesting (potentially much slower), since you must perform additional searches for each found document.</p> <p>Future versions of Whoosh may include “join” queries to make this process more efficient (or at least more automatic).</p> </div> </div> </div> </div> </div> <div class="sphinxsidebar"> <div class="sphinxsidebarwrapper"> <h3><a href="index.html">Table Of Contents</a></h3> <ul> <li><a class="reference internal" href="#">Indexing and searching document hierarchies</a><ul> <li><a class="reference internal" href="#overview">Overview</a></li> <li><a class="reference internal" href="#using-nested-document-indexing">Using nested document indexing</a><ul> <li><a class="reference internal" href="#indexing">Indexing</a></li> <li><a class="reference internal" href="#nestedparent-query">NestedParent query</a></li> <li><a class="reference internal" href="#nestedchildren-query">NestedChildren query</a></li> <li><a class="reference internal" href="#deleting-and-updating-hierarchical-documents">Deleting and updating hierarchical documents</a></li> </ul> </li> <li><a class="reference internal" href="#using-query-time-joins">Using query-time joins</a></li> </ul> </li> </ul> <h4>Previous topic</h4> <p class="topless"><a href="threads.html" title="previous chapter">Concurrency, locking, and versioning</a></p> <h4>Next topic</h4> <p class="topless"><a href="recipes.html" title="next chapter">Whoosh recipes</a></p> <h3>This Page</h3> <ul class="this-page-menu"> <li><a href="_sources/nested.txt" rel="nofollow">Show Source</a></li> </ul> <div id="searchbox" style="display: none"> <h3>Quick search</h3> <form class="search" action="search.html" method="get"> <input type="text" name="q" /> <input type="submit" value="Go" /> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> <p class="searchtip" style="font-size: 90%"> Enter search terms or a module, class or function name. </p> </div> <script type="text/javascript">$('#searchbox').show(0);</script> </div> </div> <div class="clearer"></div> </div> <div class="related"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="genindex.html" title="General Index" >index</a></li> <li class="right" > <a href="py-modindex.html" title="Python Module Index" >modules</a> |</li> <li class="right" > <a href="recipes.html" title="Whoosh recipes" >next</a> |</li> <li class="right" > <a href="threads.html" title="Concurrency, locking, and versioning" >previous</a> |</li> <li><a href="index.html">Whoosh 2.5.1 documentation</a> »</li> </ul> </div> <div class="footer"> © Copyright 2007-2012 Matt Chaput. Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.1.3. </div> </body> </html>