First, the good news: New general features: - Faster index and retrieval of douments (wildcard search outperforms old one).A hash approach has been added for speed up searches. This reduces disk i/o. Now you can search for things like "a* or b* or c* or d* or e* ..." without the penalty of reading the linked list for each word of the expanding list. - Better use of memory. Lots of calls to free memory have been added. - Phrase search. Example: swish-e -w '"John Smith"' -f index.file (Use " to delimite the phrase). - XML MetaNames style. Example: <metaname1>SomeText</metaname1> Nested XML Metanames are allowed: <metaname1> SomeText <metaname2> MoreText </metaname2> SomeText </metaname1> - Other options like filtering and some patches from different people have been added. (See previous messages). Filtering is an addition from Rainer Scherg - Better compression of numbers. - Portable index file. ---------------------------------------- New features in config file: - New directive TranslateCharacters to translate some characters in the words. It takes two strings: The original characters and the translated characters. Example: TranslateCharacters Áá- aa/ This makes word "área" indexed as "area" and "9-1" as "9/1" Remember that all the chars int these strings must also be in WordCharacters. This option is useful for non english languages. - Special word in MetaNames. If you specify automatic in MetaNames directive, the indexer will try to extract all the MetaNames dinamically. This option only works with these types of MetaNames: <metaname>someContent</metaname> and <!-- META START NAME="keyName" --> someContent <!-- META END --> (Nested MetaNames are allowed) It does not support: <META NAME="keyName" CONTENT="someContent"> - New option in IgnoreWords (thanks to Rainer Scherg). IgnoreWords File:path-to-file eg: IgnoreWords File:/path/german.txt You can find a german stopwords file in conf/stopwords ---------------------------------------- New search options: - Option -s to sort results by one or more document properties (those specified in PropertyNames in config file). (always descending) Example: swish-e -w test -f index.file -s cod aut This will sort results by properties cod aut. - Option -b to display results from the number specified up to the number specified in -m. Example: swish-e -b 10 -m 5 -w test -f index.file This will show 5 results starting at 10th position ----------------------------------------- New decompress option: - Option -D shows more information And now, the bad news: - This version uses more memory than old swish-e. Like swish-e-1.3.2, it stores all the data (words, files, properties, metanames) in memory during the index proccess. But, now it also stores all the word positions in memory during the index process (positions are required for phrase search). - Be careful using IgnoreLimit directive in config file. With this option you can get "Automatic" stopwords and remove them from the index file. The problem is that this feature is executed at the end of the index proccess. So, if an automatic word is found, all the word positions must be recomputed increasing the index time (this a pure memory-cpu process). It is better to add these words in the IgnoreWords directive. To get the best performance you can do the following: 1. Run once the index process with IgnoreLimit enabled 2. Disable IgnoreLimit and add the found stopwords to the IgnoreWords clause.