Sophie: gnucash-sql-2.2.9-8.1mdv2010.1 i586

gnucash-sql-2.2.9-8.1mdv2010.1.i586.rpm


         GnuCash SQL/Postgres Support/HOWTO
         ----------------------------------

This directory contains code for SQL/Postgres support.  

The SQL backend allows multiple users to view/edit the same data
at the same time, as well as allowing a single user to keep data 
in an SQL database.  


Status
------
The SQL code has been "finished" for many years, and hasn't been 
in recent active development.  This means that the current SQL 
backend does not support some of the newer GnuCash features; 
in particular, it doesn't support the business objects needed for
the business accounting. It also doesn't currently support 
scheduled/recurring transactions.  Although the development is
currently dormant, we believe that it may heat up real soon, 
since there is a core group of developers who are interested 
in making SQL the default for all future versions of GnuCash.

As long as development is dormant, note that you may not get
prompt support on the mailing list, and, if you find bugs, they
might not be fixed quickly.

Currently, only Postgres is supported, but other databases are 
planned.  See also the 'to be done' list at the bottom for more 
detail.

Table of Contents
-----------------
-- Postgres Install Instructions
-- GnuCash Build Instructions
-- How To Use GnuCash with Postgres
-- Remote Access
-- Access Modes
-- Performance
-- ToDo List


Postgres Install Instructions
-----------------------------
1) Install PostgresSQL server, client and devel packages.
   (Both the 6.x and the 7.x versions of Postgres should work).

2) If installed from RedHat, then running /etc/rc.d/init.d/postgresql 
   will setup and initialize basic Postgres first-time setup & config.
3) As root, 'su - postgres' then run 'createuser' to add your user id
   (don't set a password on your postgres db name, yet, GnuCash doesn't 
   have a GUI to ask for your password yet)
   If you've forgotten what users are on the system, you can check
   by starting the 'psql' command shell and typing the command.
   'SELECT * FROM pg_shadow;'
   Note this only works for the postgres user, and other users that 
   have createuser permissions.

4) (optional) enable TCPIP connections to remote hosts. To do this:
   edit the file pg_hba.conf on the remote host to configure and allow 
   access from other hosts.  See 'man pg_hba.conf' for details.  
        RedHat:   /var/lib/pgsql/pg_hba.conf  
	Debian:   /etc/postgresql/pg_hba.conf 
   Also create users.  Be sure to set a password.
   (there appears to be no way to pass an empty password to postgres)
   Note also the user must have 'createuser' permissions in order to 
   lock tables (this is a bug in postgres 6.5 and maybe later ???)

5) Hints and Tips:  
   If you've forgotten what databases you've created in the past,
   you can look the filesystem: 'ls -la /var/lib/postgres/data/base'
   to view the existing databases.  Alternately, if you connect as 
   user postgres, you can 'SELECT * FROM pg_database;'


Alternatively, you can install into a brand new database without
using root privileges. Perform the following:

1) Install postgresql server, client and devel packages.
2) initdb -D ~/gnucash
   This creates a directory structure and supporting files
   under ~/gnucash. The gnucash directory is automatically 
   created. The directories and files are owned by the user
   running 'initdb' and have only user access; group and
   other are cleared.

   By default, the user running 'initdb' is setup as the
   database superuser for the new database.

   You don't have to use the name 'gnucash'; you can choose
   whatever you want.

3) pg_ctl -D ~/gnucash -o "-p 6543" start

   This starts a postmaster process for the new database
   listening on port 6543. If your system doesn't already
   have a postmaster processing running for the system
   database, you can leave out '-o "-p 6543"' and it will
   use the default port 5432. You can use any open port.

   By default, the postmaster process will only accept
   connections from processes on the local host.


GnuCash Build Instructions
--------------------------
Same as usual, but you must specify the flag '--enable-sql' in order
to build in Postgres support.  
i.e. 
./configure --enable-sql
and then 'make' and then 'make install'


How To Use GnuCash with Postgres
--------------------------------
a) Open your favorite datafile in the usual fashion.
b) Click on 'Save As'
c) Enter the following URL instead of a filename in the file picker:
    postgres://localhost/some_dbname_you_pick

The above steps will copy your data into that database.  You can 
then restart gnucash (or keep working) and type in the same URL
in the file open dialogs.  Or try it on the command line:

home:~ $ gnucash postgres://localhost/dbname_whatever

Note: GnuCash will automatically create the database if it does
not already exist. Do *not* try to create the database by hand,
or to specify a database that wasn't created by GnuCash.  

You can also "copy" from the SQL database to an XML file. 
If you don't trust the SQL backend, this might be a good way to
backup your data.  To do this:

a) Open the SQL database in "single user mode":
   postgres://localhost/dbname_whatever?mode=single-update
   It is important to specify the single-user mode
   (if you don't, not all of your data will be saved to the file)
b) Click on 'Save As' and enter an ordinary file name.

We promise that future versions of GnuCash will be backwards
compatible with the current SQL database layout.  There is a 
specific upgrade module designed into the system that handles
version upgrades.  


Remote Access
-------------
In principle, you can use URL's such as:
postgres://some.where.com/dbname
However, these URL's require that the host 'some.where.com' have
TCPIP access enabled (by following the postgres install instruction (6) 
above).  This is true even if the hostname is your local machine.  Thus,
unless you've setup Postgres TCPIP connections, you *MUST* use
URL's of the form postgres://localhost/dbname

You can specify usernames and passwords in the URL as follows:

postgres://localhost/dbname?user=whomever&password=asdf

Note that the username and password are the pg_shadow username and
password, *NOT* your unix login name and password.

(A GUI dialog to prompt for your password hasn't been implemented yet.)

Note that you (or your SQL database admin) will have to make sure
that access permissions on the various GnuCash tables have been set 
appropriately.  Typically, this will be by starting the 'psql'
shell, listing all relations with \z, and then issuing
a 'GRANT ALL on gncAccount TO someuser;', and so on, for each 
relation.   Failing to do this will result in lots of unhappiness.
The sysadmin will also need to make sure that TCP/IP connections
are properly enabled in the postgres server hba.conf file.


Access Modes
------------
There are four different ways or 'modes' in which to access your 
database.  These are 

mode=single-file
mode=single-update
mode=multi-user-poll
mode=multi-user

The first two are single-user access modes.  These are the
'safest' modes to use if you don't need multi-user capability.
The single-update mode is strongly preferred over single-file.
(The single-file mode is strongly deprecated, and can lead to 
to massive data loss under certain circumstances: for example,
if your sql connection dies after all the old data has been 
deleted, but before the new data has been written out.  Bluntly: 
DO NOT USE mode=single-file.  You won't be sorry.).

The multi-user modes are intended for use when multiple users
need to view and make changes to the data.  The default mode
is multi-user.  

If you have many users (more than a half-dozen or so), you may 
want to run in multi-user-poll mode, instead of the default 
multi-user.  This will stop the automatic updates of the 
local instance of gnucash, and thus reduce the amount of
GUI redraws that happen automatically as other users
edit data.  It should also reduce network traffic
slightly, and present a slightly lower load on the 
sql server.  Different users can mix-n-match the
two multi-user modes. 

See the file 'design.txt' for an in-depth explanation of these 
modes.

You can specify a particular access mode by specifying the URL

postgres://localhost/dbname?mode=whatever

You can alternate between multi-user and single-user modes for
the same database.  Just be sure that all users have logged off,
otherwise gnucash won't let you log on in single-user mode. 
Users are 'logged off' automatically when they exist gnucash.
In case they have crashed, (and thus appear to still be logged
in), you can log them off manually by issuing  the following:

echo "UPDATE gncsession SET time_off='NOW' WHERE time_off = 'infinity';" | psql dbname

Hopefully, you do not need to be warned that this dangerous
if there really are other users logged in.  Accessing the
database in single-user mode when there are other users logged
in will result in their work being clobbered.

Other Options
-------------
The 'options=' and 'tty=' postgres keywords are supported. See the
postgres documentation for more info.


Mangled Balances
----------------
If you've been working in multi-user mode for a while, and the 
balances seem to be mangled, log off all the users, log on the the 
same database in single-update mode, and log off again.  This
will cause balance checkpoints to be recomputed, and should 
probably fix any weird account balances you are seeing.

(Mangled account balances shouldn't happen, and if you have
a test case that can reliably recreate these, you should
submit a bug report).

(Note to developers: think of this as a safe fsck: balances
shouldn't get wacky, but if you've been running a long time
and something weird has happened, then a periodic cleanup 
is good to have.  Maybe this should be automated ...)

Performance
-----------
In 'single-user-update' mode,  data loads from the sql database 
should be 1.5x faster than comparable loads from the XML flat file, 
at least for medium datasets (measured at 3.5 seconds on a 700MHz 
Athalon a dataset with 3K transactions and 150 accounts, vs. 4.8 
seconds loading from file; postgres version 7.1.2).  

Hitting the 'save' button is a no-op and takes no CPU cycles.
(date is saved as its modified, so a global save is not needed).

'exit' will stall for a few seconds, while a 'vacuum analyze' is 
issued.  The 'vacuum' reclaims storage, and 'analyze' does some 
performance tuning.  Doing this regularly improves performance 
about 20% in the cases I looked at.

---------
Accounts are restored roughly at the rate of 50-75 per second
(for above hardware/software config).

---------
Performance in multi-user mode is still a can of worms, and will 
be a good bit slower for now. The working assumptions are that
there are millions of transactions, so you can't load all of them.  
But to load only a subset is hard, in several ways. First, its
technically hard to figure out what subset to load.  Then, once
this is done, it takes about 12 millisecs to load one transaction 
when loading them one at a time (on an Athalon 700MHz). This is
about 75 transactions a second. Of this, 90% of the CPU cycles 
are burned in the postgres server, so 'basic' tuning won't do 
the trick ... some different algorithm is needed.

Handy for raw-file-loading performance measurement is the the 
script 'scan-acct.pl' in src/optional/swig/examples'

Weird shit: loading the same data, but in slightly different
order, can make a *huge* difference in the speed of balance
subtotal calculations (used only in multi-user mode.)  Merely
rearranging the order of the splits in a transaction can cause
a factor of 20 (twenty) difference in performance. This can turn 
a 4.5 second load into a minute & a half load !!!! Yow! 
Investigating ...

To Be Done
----------
Core bugs/features that still need work:

High Priority:
--------------
Currently, the database backend is completely missing support for
"lots", and thus cannot support capital gains computations.  It is 
also missing support for business objects.

-- remove the bogus 'price query' call in the backend, change it to 
   use the qof query infrastructure.

-- port to use SQLLite as a possible backend.  Will need to  abandon
   balance checkpoints to do this.


Medium/Low Priority Work:
-------------------------
Nothing in this list is truly critical at this point, except possibly
the username/password dialog, and its effect with regards to initial db
access.

-- port to use libdbi bindings, instead of the postgres native
   call interface.  This should be easy.

-- Wire in the GUI to ask user for username/password to log onto the
   server.  (GUI already implemented, just not yet used).  

-- The correct URL format for username-password should be:
   postgres://username:passwd@host:port/dbname?key=value&key2=val2
   However, the username-password part of the URI is not correctly 
   parsed at the moment.

-- distinguish between 'save' and 'save-as' in gnc-book & backend.
   If user hits 'save' in the single-user or multi-user mode, it 
   should be a no-op (since the saves have already occurred).  Only 
   a 'save-as' requires a from-scratch sync.
   ?? Maybe this has already been fixed in the redesigned session 
   & backend  ???

-- document how to use the version/extension tables to allow
   seamless upwards compatibility in the face of database schema
   changes.

-- allow user to enter URL in GUI dialog.  User can currently type URL
   into the file dialog window; it would be nice to have something 
   slightly nicer tan this.

-- error code should include strings passed back, to be shown in 
   GUI dialogs. This is because the backend needs to return things
   like usernames, etc. in the dialogs, and the backend doesn't 
   have the interfaces for passing this kind of info.
   (actually, the PERR/PINFO strings might do ??)
   In some cases, we have PERR without setting a backend error...
   grep for all PERR's that don't set backend error.

-- note that transaction commit errors may in fact be i/o errors.
   If an i/o error occurred during commit, there would be some
   aberrant rollback behavior. Ouch.

-- the transaction rollback code needs to be a GUI popup...

-- review (actually, design & architect) the communications error
   handling policy.  For example, CopyToEngine() will leave
   the backend in a disabled state if a communication error occurs;
   there will be other debilitating conditions if the backend disappears,
   leaving the engine in a possibly confused state.

-- during open of register window, a query of splits is typically run
   twice (once during open -- LedgerDisplayInternal(), and again
   during LedgerDisplayRefresh()) this results in twice as much data 
   to the sql backend as needed.  This is a performance issue,
   should someday be fixed.

-- during transaction edit, query of splits is run twice --
   once during edit recordCB()->xaccSRSaveRegEntry(), and again
   recordCB()->xaccSRRedrawReg().  As above, this is performance 
   problem waiting to aggravate.

-- minor optimization: KVP's for null strings store null strings, 
   this is a waste of time. This occurs primarily for transaction 
   memos, I think.

-- minor optimization: keeping an iguid cache will avoid sql queries.
   also caching the count for single-user mode would be good

To Be Done, Part II
-------------------
This list only affects the multi-user and advanced/optional features.

Most of the items on this list are 'good-to-fix' but are not
truly critical.  They may result in aberrations, but general
operation is probably OK.

-- when an account is edited, price lookups happen *twice* !!
   once when opened for edit, and again when committed !!
   (the first one is due to gnc_account_tree_refresh ()
   called by gnc_account_window_create() near dialog-account.c:1642)

-- nice-to-have performance improvement:
   many queries are needlessly duplicated, even in multi-user mode.
   They are duplicated just in case some other user has modified
   the data.  But if we last checked just a few seconds ago,
   we don't really need to update, not yet.  We can wait a few 
   seconds ...  doing this uniformly can cut down on sql traffic.
   (This is particularly egregious for price queries).  Implement this 
   by putting a datestamp in along with the version number.  Let
   the backend check the datestamp, and if its aged less than 
   15 seconds, skip doing the sql query.)

   DONE --- but we need to detect redundant price queries .. ughh

-- if always in multi-mode, then initial checkpoints not set up.
   They're not needed until there are a lot of transactions in the 
   system.  Need to recompute checkpoints on some periodic basis.

-- NOTIFY is not updating main window balances ...

-- implement price change events

-- implement a version table so that we can be upwardly compatible 
   with future database changes.

-- use FOREIGN KEY to ensure that table integrity is maintained.

-- use table constraint ON DELETE to make sure we have no
   dangling splits when a transaction is deleted.  As added 
   benefit, this might simplify some logic.

-- during sync, detect and report conflicting edits to accounts 
   and transactions.  See the notes for pgendSync() for details
   as to what this is about.  For the first pass, this is not a 
   serious issue; its a 'nice to have' thing.  (sync is called
   when user hits the 'save' button, and should be disabled for 
   multi-user modes)

-- implement account rollback (i.e. if other user has modified the
   account, we need to do something to merge their work into ours...)
   ditto for prices ...

-- bug: if another user deletes an account, we need to look at the 
   audit trail to see if the thing has been deleted.
   Otherwise, any edit of this account will incorrectly
   add the deleted account back in. (Note that from the user
   perspective, deleting accounts is a bad idea ...)
   (this is handled with an event, but is not handled via poll)

-- fix caching in the face of lost contact to the backend.  If the
   backend can't contact its server, then we should just save up caches,
   and then when contact with backend re-established, we should spit
   them out.  The pgendSync routine should more or less do the trick;
   note, however, that the XML file format needs to save the version 
   number ...

-- Implement various advanced database features, such as checking the
   user's permission to view/edit account by account ... (hmmm this
   done by the dbadmin... using SQL commands... which means if user
   tries to write to something they're not allowed to write to,
   then they should be bounced back.)  Does some user have the permission
   to create new accounts ??

-- Review versioning verification in backend. The desired semantic for 
   updates should be like CVS: multiple nearly-simultaneous writers 
   are allowed; however, only one wins, and others are kicked back. 
   The losers know themselves because they are trying to update info 
   of the wrong version. 
   -- pgend_transaction_commit does it correctly; but the GUI doesn't
      report a rollback.  (need to get err message out of engine, into 
      GUI).
   -- pgTransactionSync() is broken, but its not used anywhere.
   -- pgend_account_commit checks version but doesn't rollback.
      (nor does the GUI report a rollback.)
   -- pgendSync does the right thing, except that it doesn't
      detect any version conflicts, nor does it notify the user
      of such conflicts.  (actually, it also screws checkpoints).

   I'm not sure how critical this all is; with a small number of users
   it shouldn't be a problem.  With a huge number of users, each editing
   the same transaction (unlikely!?) then there is risk of clobbered
   data, but so what?  versioning is at least partly a people-management
   problem.  Anyway, what's there now should be pretty good & should
   work for now.  Except its mostly untested.

-- use version numbers for commodities.  Right now, multi-user
   updates of commodities are not detected (this seem OK for
   now, since this is a rare occurrence, right ???)

-- multi-user 'save-as' has unexpected results:
   As it currently works, save-as (sync) is a mass-copy of all data 
   out of the engine into the named storage location.  If the indicated 
   storage location doesn't exist, its created, and all of the engine 
   data is copied into it.  If the storage location (i.e. database) 
   does exist, and contains data, then the engine contents are merged 
   into it.

   From the engine point of view, the above is the easiest thing to do.
   However, in the multi-user mode, it may not be what the user is 
   expecting.  In the multi-user mode, the engine would not contain 
   a full copy of the database data; instead, it contains only a subset. 
   A 'save as', if implemented as above, would save only that subset, 
   instead of all of the data.   If the user thought 'save as'
   means the same thing as 'copy', and hoped to copy all of the sql 
   data to a file, they would be sorely disappointed.  I don't 
   currently have an opinion on the 'best' way of dealing with this 
   situation.  A reasonable solution may well be 'don't do that'.

-- Implement PR_BALANCE query type in gncquery.c.  This query is 
   supposed to return unbalanced transactions from the database.
   I think it might be better to leave this unbalanced, and just
   plain force everything in the database to be balanced.  In this
   case, we should then modify the code to check for unbalanced 
   transactions before committing them, or otherwise force the backend
   to run in balanced mode.  We could set 'force_double_entry=2'
   but this is currently unimplemented in the engine.  Alternately, 
   we could call 'Scrub' at appropriate times.
    

============================= END OF FILE ========================