Sophie

Sophie

distrib > Mandriva > 8.2 > i586 > media > contrib > by-pkgid > 0c4847654c8ecbf94e03d6d501ab1df3 > files > 4

swish-e-2.0.5-2mdk.i586.rpm


First, the good news:

New general features:
- Faster index and retrieval of douments (wildcard
search outperforms old one).A hash approach has been added for speed up
searches. This reduces disk i/o. Now you can search for things like
 "a* or b* or c* or d* or e* ..." without the penalty of reading the linked list
 for each word of the expanding list.

- Better use of memory. Lots of calls to free memory have been added.

- Phrase search. Example:
swish-e -w '"John Smith"' -f index.file
(Use " to delimite the phrase).

- XML MetaNames style. Example:
<metaname1>SomeText</metaname1>
Nested XML Metanames are allowed:
<metaname1>
SomeText
<metaname2>
MoreText
</metaname2>
SomeText
</metaname1>

- Other options like filtering and some patches from different
people have been added. (See previous messages). Filtering is
an addition from Rainer Scherg

- Better compression of numbers.

- Portable index file.


----------------------------------------
New features in config file:
- New directive TranslateCharacters to translate some characters in
the words. It takes two strings: The original characters and 
the translated characters.
Example:

TranslateCharacters Áá- aa/

This makes word "área" indexed as "area" and "9-1" as "9/1"
Remember that all the chars int these strings must also be in
WordCharacters.
This option is useful for non english languages.

- Special word in MetaNames. If you specify automatic in
MetaNames directive, the indexer will try to extract all the MetaNames
dinamically. This option only works with these types of MetaNames:

<metaname>someContent</metaname>

and

<!-- META START NAME="keyName" --> someContent <!-- META END -->

(Nested MetaNames are allowed)

It does not support:
<META NAME="keyName" CONTENT="someContent">

- New option in IgnoreWords (thanks to Rainer Scherg).
IgnoreWords File:path-to-file
eg: IgnoreWords File:/path/german.txt
You can find a german stopwords file in conf/stopwords 

----------------------------------------
New search options:
- Option -s to sort results by one or more document properties 
(those specified in PropertyNames in config file). 
(always descending)
Example:

swish-e -w test -f index.file -s cod aut

This will sort results by properties cod aut.

- Option -b to display results from the number specified up to the
number specified in -m.
Example:

swish-e -b 10 -m 5 -w test -f index.file 

This will show 5 results starting at 10th position
-----------------------------------------

New decompress option:
- Option -D shows more information


And now, the bad news:

- This version uses more memory than old swish-e. Like swish-e-1.3.2,
it stores all the data (words, files, properties, metanames) in memory
during the index proccess. But, now it also stores all the word positions
in memory during the index process (positions are required for phrase
search).
- Be careful using IgnoreLimit directive in config file. With this option
you can get "Automatic" stopwords and remove them from the index
file. The problem is that this feature is executed at the end of the index
proccess. So, if an automatic word is found, all the word positions must be
recomputed increasing the index time (this a pure memory-cpu process). It is 
better to add these words in the IgnoreWords directive.
To get the best performance you can do the following:
1. Run once the index process with IgnoreLimit enabled
2. Disable IgnoreLimit and add the found stopwords to the IgnoreWords
clause.