Sophie

Sophie

distrib > Fedora > 18 > x86_64 > media > updates > by-pkgid > e450e7f3d6075c4a54de19e68d38177f > files > 465

groonga-doc-3.0.5-1.fc18.x86_64.rpm

.. -*- rst -*-

.. highlightlang:: none

.. groonga-command
.. database: indexing

Indexing
========

Groonga supports both online index construction and offline
index construction since 2.0.0.

Online index construction
-------------------------

In online index construction, registered documents can be
searchable quickly while indexing. But indexing requires
more cost rather than indexing by offline index
construction.

Online index construction is suitable for a search system
that values freshness. For example, a search system for
tweets, news, blog posts and so on will value
freshness. Online index construction can make fresh
documents searchable and keep searchable while indexing.

.. _offline-index-construction:

Offline index construction
--------------------------

In offline index construction, indexing cost is less than
indexing cost by online index construction. Indexing time
will be shorter. Index will be smaller. Resources required
for indexing will be smaller. But a registering document
cannot be searchable until all registered documents are
indexed.

Offline index construction is suitable for a search system
that values less required resources. If a search system
doesn't value freshness, offline index construction will be
suitable. For example, a reference manual search system
doesn't value freshness because a reference manual will be
updated only at a release.

How to use
----------

Groonga uses online index construction by default. We
register a document, we can search it quickly.

Groonga uses offline index construction by adding an index
to a column that already has data.

We define a schema:

.. groonga-command
.. include:: ../example/reference/indexing-schema.log
.. table_create Tweets TABLE_NO_KEY
.. column_create Tweets content COLUMN_SCALAR ShortText
.. table_create Lexicon TABLE_HASH_KEY|KEY_NORMALIZE ShortText --default_tokenizer TokenBigram

We register data:

.. groonga-command
.. include:: ../example/reference/indexing-data.log
.. load --table Tweets
.. [
.. {"content":"Hello!"},
.. {"content":"I just start it!"},
.. {"content":"I'm sleepy... Have a nice day... Good night..."}
.. ]

We search without index. We get no result:

.. groonga-command
.. include:: ../example/reference/indexing-search-without-index.log
.. select Tweets --match_columns content --query 'good nice'

We create index for ``Tweets.content``. Already registered
data in ``Tweets.content`` are indexed by offline index
construction:

.. groonga-command
.. include:: ../example/reference/indexing-offline-index-construction.log
.. column_create Lexicon tweet COLUMN_INDEX|WITH_POSITION Tweets content

We search with index. We get a matched record:

.. groonga-command
.. include:: ../example/reference/indexing-search-after-offline-index-construction.log
.. select Tweets --match_columns content --query 'good nice'

We register data again. They are indexed by online index
construction:

.. groonga-command
.. include:: ../example/reference/indexing-online-index-construction.log
.. load --table Tweets
.. [
.. {"content":"Good morning! Nice day."},
.. {"content":"Let's go shopping."}
.. ]

We can also get newly registered records by searching:

.. groonga-command
.. include:: ../example/reference/indexing-search-after-online-index-construction.log
.. select Tweets --match_columns content --query 'good nice'