GnuCash SQL/Postgres Support/HOWTO ---------------------------------- This directory contains code for SQL/Postgres support. The SQL backend allows multiple users to view/edit the same data at the same time, as well as allowing a single user to keep data in an SQL database. Status ------ The SQL code has been "finished" for many years, and hasn't been in recent active development. This means that the current SQL backend does not support some of the newer GnuCash features; in particular, it doesn't support the business objects needed for the business accounting. It also doesn't currently support scheduled/recurring transactions. Although the development is currently dormant, we believe that it may heat up real soon, since there is a core group of developers who are interested in making SQL the default for all future versions of GnuCash. As long as development is dormant, note that you may not get prompt support on the mailing list, and, if you find bugs, they might not be fixed quickly. Currently, only Postgres is supported, but other databases are planned. See also the 'to be done' list at the bottom for more detail. Table of Contents ----------------- -- Postgres Install Instructions -- GnuCash Build Instructions -- How To Use GnuCash with Postgres -- Remote Access -- Access Modes -- Performance -- ToDo List Postgres Install Instructions ----------------------------- 1) Install PostgresSQL server, client and devel packages. (Both the 6.x and the 7.x versions of Postgres should work). 2) If installed from RedHat, then running /etc/rc.d/init.d/postgresql will setup and initialize basic Postgres first-time setup & config. 3) As root, 'su - postgres' then run 'createuser' to add your user id (don't set a password on your postgres db name, yet, GnuCash doesn't have a GUI to ask for your password yet) If you've forgotten what users are on the system, you can check by starting the 'psql' command shell and typing the command. 'SELECT * FROM pg_shadow;' Note this only works for the postgres user, and other users that have createuser permissions. 4) (optional) enable TCPIP connections to remote hosts. To do this: edit the file pg_hba.conf on the remote host to configure and allow access from other hosts. See 'man pg_hba.conf' for details. RedHat: /var/lib/pgsql/pg_hba.conf Debian: /etc/postgresql/pg_hba.conf Also create users. Be sure to set a password. (there appears to be no way to pass an empty password to postgres) Note also the user must have 'createuser' permissions in order to lock tables (this is a bug in postgres 6.5 and maybe later ???) 5) Hints and Tips: If you've forgotten what databases you've created in the past, you can look the filesystem: 'ls -la /var/lib/postgres/data/base' to view the existing databases. Alternately, if you connect as user postgres, you can 'SELECT * FROM pg_database;' Alternatively, you can install into a brand new database without using root privileges. Perform the following: 1) Install postgresql server, client and devel packages. 2) initdb -D ~/gnucash This creates a directory structure and supporting files under ~/gnucash. The gnucash directory is automatically created. The directories and files are owned by the user running 'initdb' and have only user access; group and other are cleared. By default, the user running 'initdb' is setup as the database superuser for the new database. You don't have to use the name 'gnucash'; you can choose whatever you want. 3) pg_ctl -D ~/gnucash -o "-p 6543" start This starts a postmaster process for the new database listening on port 6543. If your system doesn't already have a postmaster processing running for the system database, you can leave out '-o "-p 6543"' and it will use the default port 5432. You can use any open port. By default, the postmaster process will only accept connections from processes on the local host. GnuCash Build Instructions -------------------------- Same as usual, but you must specify the flag '--enable-sql' in order to build in Postgres support. i.e. ./configure --enable-sql and then 'make' and then 'make install' How To Use GnuCash with Postgres -------------------------------- a) Open your favorite datafile in the usual fashion. b) Click on 'Save As' c) Enter the following URL instead of a filename in the file picker: postgres://localhost/some_dbname_you_pick The above steps will copy your data into that database. You can then restart gnucash (or keep working) and type in the same URL in the file open dialogs. Or try it on the command line: home:~ $ gnucash postgres://localhost/dbname_whatever Note: GnuCash will automatically create the database if it does not already exist. Do *not* try to create the database by hand, or to specify a database that wasn't created by GnuCash. You can also "copy" from the SQL database to an XML file. If you don't trust the SQL backend, this might be a good way to backup your data. To do this: a) Open the SQL database in "single user mode": postgres://localhost/dbname_whatever?mode=single-update It is important to specify the single-user mode (if you don't, not all of your data will be saved to the file) b) Click on 'Save As' and enter an ordinary file name. We promise that future versions of GnuCash will be backwards compatible with the current SQL database layout. There is a specific upgrade module designed into the system that handles version upgrades. Remote Access ------------- In principle, you can use URL's such as: postgres://some.where.com/dbname However, these URL's require that the host 'some.where.com' have TCPIP access enabled (by following the postgres install instruction (6) above). This is true even if the hostname is your local machine. Thus, unless you've setup Postgres TCPIP connections, you *MUST* use URL's of the form postgres://localhost/dbname You can specify usernames and passwords in the URL as follows: postgres://localhost/dbname?user=whomever&password=asdf Note that the username and password are the pg_shadow username and password, *NOT* your unix login name and password. (A GUI dialog to prompt for your password hasn't been implemented yet.) Note that you (or your SQL database admin) will have to make sure that access permissions on the various GnuCash tables have been set appropriately. Typically, this will be by starting the 'psql' shell, listing all relations with \z, and then issuing a 'GRANT ALL on gncAccount TO someuser;', and so on, for each relation. Failing to do this will result in lots of unhappiness. The sysadmin will also need to make sure that TCP/IP connections are properly enabled in the postgres server hba.conf file. Access Modes ------------ There are four different ways or 'modes' in which to access your database. These are mode=single-file mode=single-update mode=multi-user-poll mode=multi-user The first two are single-user access modes. These are the 'safest' modes to use if you don't need multi-user capability. The single-update mode is strongly preferred over single-file. (The single-file mode is strongly deprecated, and can lead to to massive data loss under certain circumstances: for example, if your sql connection dies after all the old data has been deleted, but before the new data has been written out. Bluntly: DO NOT USE mode=single-file. You won't be sorry.). The multi-user modes are intended for use when multiple users need to view and make changes to the data. The default mode is multi-user. If you have many users (more than a half-dozen or so), you may want to run in multi-user-poll mode, instead of the default multi-user. This will stop the automatic updates of the local instance of gnucash, and thus reduce the amount of GUI redraws that happen automatically as other users edit data. It should also reduce network traffic slightly, and present a slightly lower load on the sql server. Different users can mix-n-match the two multi-user modes. See the file 'design.txt' for an in-depth explanation of these modes. You can specify a particular access mode by specifying the URL postgres://localhost/dbname?mode=whatever You can alternate between multi-user and single-user modes for the same database. Just be sure that all users have logged off, otherwise gnucash won't let you log on in single-user mode. Users are 'logged off' automatically when they exist gnucash. In case they have crashed, (and thus appear to still be logged in), you can log them off manually by issuing the following: echo "UPDATE gncsession SET time_off='NOW' WHERE time_off = 'infinity';" | psql dbname Hopefully, you do not need to be warned that this dangerous if there really are other users logged in. Accessing the database in single-user mode when there are other users logged in will result in their work being clobbered. Other Options ------------- The 'options=' and 'tty=' postgres keywords are supported. See the postgres documentation for more info. Mangled Balances ---------------- If you've been working in multi-user mode for a while, and the balances seem to be mangled, log off all the users, log on the the same database in single-update mode, and log off again. This will cause balance checkpoints to be recomputed, and should probably fix any weird account balances you are seeing. (Mangled account balances shouldn't happen, and if you have a test case that can reliably recreate these, you should submit a bug report). (Note to developers: think of this as a safe fsck: balances shouldn't get wacky, but if you've been running a long time and something weird has happened, then a periodic cleanup is good to have. Maybe this should be automated ...) Performance ----------- In 'single-user-update' mode, data loads from the sql database should be 1.5x faster than comparable loads from the XML flat file, at least for medium datasets (measured at 3.5 seconds on a 700MHz Athalon a dataset with 3K transactions and 150 accounts, vs. 4.8 seconds loading from file; postgres version 7.1.2). Hitting the 'save' button is a no-op and takes no CPU cycles. (date is saved as its modified, so a global save is not needed). 'exit' will stall for a few seconds, while a 'vacuum analyze' is issued. The 'vacuum' reclaims storage, and 'analyze' does some performance tuning. Doing this regularly improves performance about 20% in the cases I looked at. --------- Accounts are restored roughly at the rate of 50-75 per second (for above hardware/software config). --------- Performance in multi-user mode is still a can of worms, and will be a good bit slower for now. The working assumptions are that there are millions of transactions, so you can't load all of them. But to load only a subset is hard, in several ways. First, its technically hard to figure out what subset to load. Then, once this is done, it takes about 12 millisecs to load one transaction when loading them one at a time (on an Athalon 700MHz). This is about 75 transactions a second. Of this, 90% of the CPU cycles are burned in the postgres server, so 'basic' tuning won't do the trick ... some different algorithm is needed. Handy for raw-file-loading performance measurement is the the script 'scan-acct.pl' in src/optional/swig/examples' Weird shit: loading the same data, but in slightly different order, can make a *huge* difference in the speed of balance subtotal calculations (used only in multi-user mode.) Merely rearranging the order of the splits in a transaction can cause a factor of 20 (twenty) difference in performance. This can turn a 4.5 second load into a minute & a half load !!!! Yow! Investigating ... To Be Done ---------- Core bugs/features that still need work: High Priority: -------------- Currently, the database backend is completely missing support for "lots", and thus cannot support capital gains computations. It is also missing support for business objects. -- remove the bogus 'price query' call in the backend, change it to use the qof query infrastructure. -- port to use SQLLite as a possible backend. Will need to abandon balance checkpoints to do this. Medium/Low Priority Work: ------------------------- Nothing in this list is truly critical at this point, except possibly the username/password dialog, and its effect with regards to initial db access. -- port to use libdbi bindings, instead of the postgres native call interface. This should be easy. -- Wire in the GUI to ask user for username/password to log onto the server. (GUI already implemented, just not yet used). -- The correct URL format for username-password should be: postgres://username:passwd@host:port/dbname?key=value&key2=val2 However, the username-password part of the URI is not correctly parsed at the moment. -- distinguish between 'save' and 'save-as' in gnc-book & backend. If user hits 'save' in the single-user or multi-user mode, it should be a no-op (since the saves have already occurred). Only a 'save-as' requires a from-scratch sync. ?? Maybe this has already been fixed in the redesigned session & backend ??? -- document how to use the version/extension tables to allow seamless upwards compatibility in the face of database schema changes. -- allow user to enter URL in GUI dialog. User can currently type URL into the file dialog window; it would be nice to have something slightly nicer tan this. -- error code should include strings passed back, to be shown in GUI dialogs. This is because the backend needs to return things like usernames, etc. in the dialogs, and the backend doesn't have the interfaces for passing this kind of info. (actually, the PERR/PINFO strings might do ??) In some cases, we have PERR without setting a backend error... grep for all PERR's that don't set backend error. -- note that transaction commit errors may in fact be i/o errors. If an i/o error occurred during commit, there would be some aberrant rollback behavior. Ouch. -- the transaction rollback code needs to be a GUI popup... -- review (actually, design & architect) the communications error handling policy. For example, CopyToEngine() will leave the backend in a disabled state if a communication error occurs; there will be other debilitating conditions if the backend disappears, leaving the engine in a possibly confused state. -- during open of register window, a query of splits is typically run twice (once during open -- LedgerDisplayInternal(), and again during LedgerDisplayRefresh()) this results in twice as much data to the sql backend as needed. This is a performance issue, should someday be fixed. -- during transaction edit, query of splits is run twice -- once during edit recordCB()->xaccSRSaveRegEntry(), and again recordCB()->xaccSRRedrawReg(). As above, this is performance problem waiting to aggravate. -- minor optimization: KVP's for null strings store null strings, this is a waste of time. This occurs primarily for transaction memos, I think. -- minor optimization: keeping an iguid cache will avoid sql queries. also caching the count for single-user mode would be good To Be Done, Part II ------------------- This list only affects the multi-user and advanced/optional features. Most of the items on this list are 'good-to-fix' but are not truly critical. They may result in aberrations, but general operation is probably OK. -- when an account is edited, price lookups happen *twice* !! once when opened for edit, and again when committed !! (the first one is due to gnc_account_tree_refresh () called by gnc_account_window_create() near dialog-account.c:1642) -- nice-to-have performance improvement: many queries are needlessly duplicated, even in multi-user mode. They are duplicated just in case some other user has modified the data. But if we last checked just a few seconds ago, we don't really need to update, not yet. We can wait a few seconds ... doing this uniformly can cut down on sql traffic. (This is particularly egregious for price queries). Implement this by putting a datestamp in along with the version number. Let the backend check the datestamp, and if its aged less than 15 seconds, skip doing the sql query.) DONE --- but we need to detect redundant price queries .. ughh -- if always in multi-mode, then initial checkpoints not set up. They're not needed until there are a lot of transactions in the system. Need to recompute checkpoints on some periodic basis. -- NOTIFY is not updating main window balances ... -- implement price change events -- implement a version table so that we can be upwardly compatible with future database changes. -- use FOREIGN KEY to ensure that table integrity is maintained. -- use table constraint ON DELETE to make sure we have no dangling splits when a transaction is deleted. As added benefit, this might simplify some logic. -- during sync, detect and report conflicting edits to accounts and transactions. See the notes for pgendSync() for details as to what this is about. For the first pass, this is not a serious issue; its a 'nice to have' thing. (sync is called when user hits the 'save' button, and should be disabled for multi-user modes) -- implement account rollback (i.e. if other user has modified the account, we need to do something to merge their work into ours...) ditto for prices ... -- bug: if another user deletes an account, we need to look at the audit trail to see if the thing has been deleted. Otherwise, any edit of this account will incorrectly add the deleted account back in. (Note that from the user perspective, deleting accounts is a bad idea ...) (this is handled with an event, but is not handled via poll) -- fix caching in the face of lost contact to the backend. If the backend can't contact its server, then we should just save up caches, and then when contact with backend re-established, we should spit them out. The pgendSync routine should more or less do the trick; note, however, that the XML file format needs to save the version number ... -- Implement various advanced database features, such as checking the user's permission to view/edit account by account ... (hmmm this done by the dbadmin... using SQL commands... which means if user tries to write to something they're not allowed to write to, then they should be bounced back.) Does some user have the permission to create new accounts ?? -- Review versioning verification in backend. The desired semantic for updates should be like CVS: multiple nearly-simultaneous writers are allowed; however, only one wins, and others are kicked back. The losers know themselves because they are trying to update info of the wrong version. -- pgend_transaction_commit does it correctly; but the GUI doesn't report a rollback. (need to get err message out of engine, into GUI). -- pgTransactionSync() is broken, but its not used anywhere. -- pgend_account_commit checks version but doesn't rollback. (nor does the GUI report a rollback.) -- pgendSync does the right thing, except that it doesn't detect any version conflicts, nor does it notify the user of such conflicts. (actually, it also screws checkpoints). I'm not sure how critical this all is; with a small number of users it shouldn't be a problem. With a huge number of users, each editing the same transaction (unlikely!?) then there is risk of clobbered data, but so what? versioning is at least partly a people-management problem. Anyway, what's there now should be pretty good & should work for now. Except its mostly untested. -- use version numbers for commodities. Right now, multi-user updates of commodities are not detected (this seem OK for now, since this is a rare occurrence, right ???) -- multi-user 'save-as' has unexpected results: As it currently works, save-as (sync) is a mass-copy of all data out of the engine into the named storage location. If the indicated storage location doesn't exist, its created, and all of the engine data is copied into it. If the storage location (i.e. database) does exist, and contains data, then the engine contents are merged into it. From the engine point of view, the above is the easiest thing to do. However, in the multi-user mode, it may not be what the user is expecting. In the multi-user mode, the engine would not contain a full copy of the database data; instead, it contains only a subset. A 'save as', if implemented as above, would save only that subset, instead of all of the data. If the user thought 'save as' means the same thing as 'copy', and hoped to copy all of the sql data to a file, they would be sorely disappointed. I don't currently have an opinion on the 'best' way of dealing with this situation. A reasonable solution may well be 'don't do that'. -- Implement PR_BALANCE query type in gncquery.c. This query is supposed to return unbalanced transactions from the database. I think it might be better to leave this unbalanced, and just plain force everything in the database to be balanced. In this case, we should then modify the code to check for unbalanced transactions before committing them, or otherwise force the backend to run in balanced mode. We could set 'force_double_entry=2' but this is currently unimplemented in the engine. Alternately, we could call 'Scrub' at appropriate times. ============================= END OF FILE ========================