Sophie

Sophie

distrib > Mandriva > 9.1 > ppc > by-pkgid > bebff3570faee357416d2588192a229a > files > 148

mnogosearch-3.2.8-1mdk.ppc.rpm

TODO
----

 General development directions

* More various databases support.
* More various transport protocols support.
* More various APIs. e.g write Java class with libudmsearch support.
* Support for huge databases with hundred or thousand millions documents.
* Make more flexible database distribution between several machines.
* Make it more managable, i.e. administration tools, etc.


  Below there are things that will be implemented in the future. They
are given in no paticular order. If you want to change the order of
their development, please ask on general@mnogosearch.org.



Search quality and results presentation
---------------------------------------
* Click rank
* Administator defined dynamic site priority:
	- approved sites which should be displayed in the top of results;
	- disapproved sites (e.g. for abuse) which should not be displayed.
* Take in account words context: <b>, <font size="xx">, <big> and so on.
* Implement advanced search syntax like in big search engines:
	plus, minus, etc signs.
* Optional automatic URL limit by SERVER_NAME variable.
* "Exclude" limits, for example "to search though everything except
  given site": ue=http://esite/
* Content negotiation, e.g. process Accept-Language, Accept-Charset headers 
  and give tamplate in proper language and charset.
* Fuzzy search for accent letters, for example cyrillic "io" and "ie".
* Regex search
* Rank URLs with long pathnames lower than direct hits on let's say a domain 
name with no directory path.
* Various search results ordering, e.g. by date.


Indexing related stuff
----------------------
* Detect clones on site level. Currently it is implemented on page level
only. The idea is to detect that site being indexed is a mirror of another
site without having to index all pages but after indexing several pages only.
* SPAM clearance.
* Cookies support.
* Fix that indexer bacame slow when ServerTable is big. This is because
of full consecutive examination. Make in-memory cache for ServerTable part.
* Fix that "posgreSQL.org" and "posgresql.org" are considered as a
different sites.
* FTP digest ls-lR.gz support. For example,ftp://ftp.chg.ru/ls-lR.gz
* Make it possible for external parsers to return converted content 
together with headers like Content-Type, Title and so on.
* Dynamic decision whether to index document by it's content:
	- by language;
	- by "allowed words" list, i.e. document will be stored into
	  database only if it has words from this list.

Charset related stuff
---------------------
* Remove "ForceIISCharset1251 yes/no"command. Replcase it with 
enhanced "CharsetByServer <charset> <regexp> [<regexp>...]" 
commmand.
* Stateful character sets support: UTF-7, Asian ISO-2022-XX
and others. They will not be used as a LocalCharset because
of much space, however indexer should be able to index them,
as well as search frontend should be able to use them as
a BrowserCharset.


Misc
----
* Smart search results cache cleaning after reindexing.
* Make it possible to set table names in indexer.conf and search.htm
* There was a discussion about word separators back in January; see 
http://www.mail-archive.com/udmsearch%40web.izhcom.ru/msg00200.html.
* Learn about dublin core. A simple set of standard metadata for web pages.
  http://www.searchtools.com/related/metadata.html#dc
* Add curl library support.
* Rewrite mirroring functions. Make it possible to optionally store whole 
document, not only MaxDocSize.


Portability and code quality
----------------------------
Remove warnings on various platforms. Currenly it is built without
warnings on Linux and FreeBSD with these CFLAGS:

-Wall 
-Wconversion 
-Wshadow  
-Wpointer-arith 
-Wcast-qual 
-Wcast-align 
-Wwrite-strings  
-Waggregate-return  
-Wstrict-prototypes  
-Wmissing-prototypes 
-Wmissing-declarations 
-Wredundant-decls 
-Wnested-externs 
-Wlong-long 
-Winline

However some other platform compilers do produce warnings.
For example, mixed signed/unsigned chars on NetBSD Alpha compiler. 
Please report those warnings to general@mnogosearch.org!


Documentation
-------------
* Improove it!