Sophie

Sophie

distrib > Mandriva > 2009.0 > x86_64 > media > contrib-testing > by-pkgid > 796a1f30f68de7ebe302be121282f901 > files > 446

postgresql8.2-devel-8.2.12-1mdv2009.0.x86_64.rpm

From pgsql-hackers-owner+M77861=pgman=candle.pha.pa.us@postgresql.org Fri Dec 23 05:19:20 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
From: Simon Riggs <simon@2ndquadrant.com>
To: Stephen Frost <sfrost@snowman.net>
cc: Martijn van Oosterhout <kleptog@svana.org>, 
	   Jim C. Nasby <jnasby@pervasive.com>, 
	   bizgres-general <bizgres-general@pgfoundry.org>, 
	  pgsql-hackers@postgresql.org
In-Reply-To: <20051222223625.GC6026@ns.snowman.net>
References: <1135261893.2964.502.camel@localhost.localdomain>
	  <20051222183751.GG72143@pervasive.com>  <20051222201826.GH21783@svana.org>
	  <1135289583.2964.536.camel@localhost.localdomain>
	  <20051222223625.GC6026@ns.snowman.net>
Date: Fri, 23 Dec 2005 10:18:43 +0000
Message-ID: <1135333123.2964.589.camel@localhost.localdomain>
X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) 
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.1 required=5 tests=[AWL=0.100]
X-Spam-Score: 0.1
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  4728

On Thu, 2005-12-22 at 17:36 -0500, Stephen Frost wrote:
> * Simon Riggs (simon@2ndquadrant.com) wrote:
> > On Thu, 2005-12-22 at 21:18 +0100, Martijn van Oosterhout wrote:
> > > Considering "WAL bypass" is code for "breaks PITR"
> > 
> > No it isn't. All of the WAL bypass logic does *not* operate when PITR is
> > active. The WAL bypass logic is aimed at Data Warehouses, which
> > typically never operate in PITR mode for performance reasons, however
> > the choice is yours.

OK, thanks for saying all of that; you probably speak for many in
raising these concerns. I'll answer each bit as we come to it. Suffice
to say, your concerns are good and so are the answers:

> Eh?  PITR mode is bad for performance?  Maybe I missed something but I
> wouldn't have thought PITR would degrade regular performance all that
> badly.  

PITR mode is *not* bad for performance. On a very heavily loaded
write-intensive test system, the general PITR overhead on regular
performance was around 1% - so almost negligible.

We have been discussing a number of optimizations to specific commands
that would allow them to avoid writing WAL and thus speed up their
performance. If archive_command is set then WAL will always be written;
if it is not set then these commands will (or could) go faster:

- CREATE TABLE AS SELECT (in 8.1)
- COPY LOCK (patch submitted)
- COPY in same transaction as CREATE TABLE (patch submitted)
- INSERT SELECT in same transaction as CREATE TABLE (this discussion)

(There are a number of other conditions also, such as there must be no
indexes on a table. All of which now documented with the patch)

> So long as it doesn't take 15 minutes or some such to move the
> WAL to somewhere else (and I'm not sure that'd even slow things down..).
> For a Data Warehouse, have you got a better way of doing backups such
> that you don't lose at minimum most of a day's work?  

Yes. Don't just use the backup facilities on their own. Think about how
the architecture of your systems will work and see if there is a better
way when you look at very large systems.

> I'm not exactly a
> big fan do doing a pg_dump every night either given that the database is
> 360GB.  Much nicer to take a weekly dump of the database and then do
> PITR for a week or two before taking another dump of the db.

e.g. Keep your reference data (low volume) in an Operational Data Store
(ODS) database, protected by archiving. Keep your main fact data (high
volume) in the Data Warehouse, but save the data in slices as you load
it, so that a recovery is simply a reload of the database: no PITR or
pg_dump required, so high performance data transformation and load work
is possible. This is a commonly used architectural design pattern.

> I like the idea of making COPY go faster, but please don't break my
> backup system while you're at it.  

On a personal note, I would only add that I spent a long time working on
PITR and I would never design anything that would intentionally break it
(nor would patches be accepted that did that). That probably gives me
the confidence to approach designs that might look like I'm doing that,
but without actually straying over the edge.

> I'm honestly kind of nervous about
> what you mean by checking it PITR is active- how is that done, exactly?
> Check if you have a script set to rotate the logs elsewhere?  Or is it
> checking if you're in the taking-a-full-database-backup stage?  Or what?

Internally, we use XLogArchivingActive(). Externally this will be set
when the admin sets archive_command to a particular value.

My original preference was for a parameter called archive_mode= ON | OFF
which would allow us to more easily discuss this, but this does not
currently exist.

> What's the performance decrease when using PITR, and what's it from?  Is
> it just that COPY isn't as fast?  Honestly, I could live with COPY being
> not as fast as it could be if my backups work. :)

These commands will not be optimized for speed when archive_command is set:
- CREATE TABLE AS SELECT (in 8.1)
- COPY LOCK (patch submitted)

> Sorry for sounding concerned but, well, backups are very important and
> so is performance and I'm afraid either I've not read all the
> documentation about the issues being discussed here or there isn't
> enough out there to make sense of it all yet. :)

If you choose PITR, then you are safe. If you do not, the crash recovery
of the database is not endangered by these optimizations.

Hope that covers all of your concerns?

I'm just writing a course that explains many of these techniques,
available in the New Year.

Best Regards, Simon Riggs


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

From pgsql-hackers-owner+M78004=pgman=candle.pha.pa.us@postgresql.org Wed Dec 28 20:59:03 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
From: Bruce Momjian <pgman@candle.pha.pa.us>
Message-ID: <200512290158.jBT1wEK28785@candle.pha.pa.us>
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
In-Reply-To: <20051226122206.GA12934@svana.org>
To: Martijn van Oosterhout <kleptog@svana.org>
Date: Wed, 28 Dec 2005 20:58:14 -0500 (EST)
cc: Simon Riggs <simon@2ndquadrant.com>, Tom Lane <tgl@sss.pgh.pa.us>, 
	   Greg Stark <gsstark@mit.edu>, Rod Taylor <pg@rbt.ca>, 
	   Qingqing Zhou <zhouqq@cs.toronto.edu>, pgsql-hackers@postgresql.org
X-Mailer: ELM [version 2.4ME+ PL121 (25)]
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.122 required=5 tests=[AWL=0.122]
X-Spam-Score: 0.122
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  3461


Having read through this thread, I would like to propose a
syntax/behavior.

I think we all now agree that the logging is more part of the table than
the command itself.  Right now we have a COPY LOCK patch, but people are
going to want to control logging for INSERT INTO ... SELECT, and UPDATE,
and all sorts of other things, so I think we are best adding an ALTER
TABLE capability.  I am thinking of this syntax:

	ALTER TABLE name RELIABILITY option

where "option" is:

	DROP [ TABLE ON CRASH ]
	DELETE [ ROWS ON CRASH ]
	EXCLUSIVE
	SHARE

Let me explain each option.  DROP would drop the table on a restart
after a non-clean shutdown.  It would do _no_ logging on the table and
allow concurrent access, plus index access.  DELETE is the same as DROP,
but it just truncates the table (perhaps TRUNCATE is a better word).

EXCLUSIVE would allow only a single session to modify the table, and
would do all changes by appending to the table, similar to COPY LOCK. 
EXCLUSIVE would also not allow indexes because those can not be isolated
like appending to the heap.  EXCLUSIVE would write all dirty shared
buffers for the table and fsync them before committing.  SHARE is the
functionality we have now, with full logging.

Does this get us any closer to a TODO item?  It isn't great, but I think
it is pretty clear, and I assume pg_dump would use ALTER to load each
table.  The advanage is that the COPY statements themselves are
unchanged so they would work in loading into older versions of
PostgreSQL.

---------------------------------------------------------------------------

Martijn van Oosterhout wrote:
-- Start of PGP signed section.
> On Mon, Dec 26, 2005 at 12:03:27PM +0000, Simon Riggs wrote:
> > I would not be against such a table-level switch, but the exact
> > behaviour would need to be specified more closely before this became a
> > TODO item, IMHO.
> 
> Well, I think at a per table level is the only sensible level. If a
> table isn't logged, neither are the indexes. After an unclean shutdown
> the data could be anywhere between OK and rubbish, with no way of
> finding out which way.
> 
> > If someone has a 100 GB table, they would not appreciate the table being
> > truncated if a transaction to load 1 GB of data aborts, forcing recovery
> > of the 100 GB table.
> 
> Ah, but wouldn't such a large table be partitioned in such a way that
> you could have the most recent partition having the loaded data.
> Personally, I think these "shared temp tables" have more applications
> than meet the eye. I've had systems with cache tables which could be
> wiped on boot. Though I think my preference would be to TRUNCATE rather
> than DROP on unclean shutdown.
> 
> Have a nice day,
> -- 
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> > tool for doing 5% of the work and then sitting around waiting for someone
> > else to do the other 95% so you can sue them.
-- End of PGP section, PGP failed!

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

From pgsql-hackers-owner+M78007=pgman=candle.pha.pa.us@postgresql.org Wed Dec 28 22:06:13 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
Message-ID: <43B3527A.4040709@commandprompt.com>
Date: Wed, 28 Dec 2005 19:05:30 -0800
From: Joshua D. Drake <jd@commandprompt.com>
Organization: Command Prompt, Inc.
User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)
X-Accept-Language: en-us, en
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Martijn van Oosterhout <kleptog@svana.org>, 
	   Simon Riggs <simon@2ndquadrant.com>, Tom Lane <tgl@sss.pgh.pa.us>, 
	   Greg Stark <gsstark@mit.edu>, Rod Taylor <pg@rbt.ca>, 
	   Qingqing Zhou <zhouqq@cs.toronto.edu>, pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
References: <200512290158.jBT1wEK28785@candle.pha.pa.us>
In-Reply-To: <200512290158.jBT1wEK28785@candle.pha.pa.us>
X-Greylist: Sender succeded SMTP AUTH authentication, not delayed by milter-greylist-1.6 (hosting.commandprompt.com [192.168.1.101]); Wed, 28 Dec 2005 18:57:25 -0800 (PST)
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.05 required=5 tests=[AWL=0.050, UPPERCASE_25_50=0]
X-Spam-Score: 0.05
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:   725

  now agree that the logging is more part of the table than
> the command itself.  Right now we have a COPY LOCK patch, but people are
> going to want to control logging for INSERT INTO ... SELECT, and UPDATE,
> and all sorts of other things, so I think we are best adding an ALTER
> TABLE capability.  I am thinking of this syntax:
> 
> 	ALTER TABLE name RELIABILITY option
> 
> where "option" is:
> 
> 	DROP [ TABLE ON CRASH ]
> 	DELETE [ ROWS ON CRASH ]
> 	EXCLUSIVE
> 	SHARE

I would say ON FAILURE (Crash just seems way to scary :))

Joshua D. Drake


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

From pgsql-hackers-owner+M78008=pgman=candle.pha.pa.us@postgresql.org Wed Dec 28 23:09:58 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
From: Bruce Momjian <pgman@candle.pha.pa.us>
Message-ID: <200512290409.jBT49LD13611@candle.pha.pa.us>
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
In-Reply-To: <43B3527A.4040709@commandprompt.com>
To: Joshua D. Drake <jd@commandprompt.com>
Date: Wed, 28 Dec 2005 23:09:21 -0500 (EST)
cc: Martijn van Oosterhout <kleptog@svana.org>, 
	   Simon Riggs <simon@2ndquadrant.com>, Tom Lane <tgl@sss.pgh.pa.us>, 
	   Greg Stark <gsstark@mit.edu>, Rod Taylor <pg@rbt.ca>, 
	   Qingqing Zhou <zhouqq@cs.toronto.edu>, pgsql-hackers@postgresql.org
X-Mailer: ELM [version 2.4ME+ PL121 (25)]
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.122 required=5 tests=[AWL=0.122, UPPERCASE_25_50=0]
X-Spam-Score: 0.122
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  1111

Joshua D. Drake wrote:
>   now agree that the logging is more part of the table than
> > the command itself.  Right now we have a COPY LOCK patch, but people are
> > going to want to control logging for INSERT INTO ... SELECT, and UPDATE,
> > and all sorts of other things, so I think we are best adding an ALTER
> > TABLE capability.  I am thinking of this syntax:
> > 
> > 	ALTER TABLE name RELIABILITY option
> > 
> > where "option" is:
> > 
> > 	DROP [ TABLE ON CRASH ]
> > 	DELETE [ ROWS ON CRASH ]
> > 	EXCLUSIVE
> > 	SHARE
> 
> I would say ON FAILURE (Crash just seems way to scary :))

Agreed, maybe ON RECOVERY.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

From simon@2ndquadrant.com Thu Dec 29 08:19:47 2005
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
From: Simon Riggs <simon@2ndquadrant.com>
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Martijn van Oosterhout <kleptog@svana.org>, Tom Lane <tgl@sss.pgh.pa.us>, 
	   Greg Stark <gsstark@mit.edu>, Rod Taylor <pg@rbt.ca>, 
	   Qingqing Zhou <zhouqq@cs.toronto.edu>, pgsql-hackers@postgresql.org
In-Reply-To: <200512290158.jBT1wEK28785@candle.pha.pa.us>
References: <200512290158.jBT1wEK28785@candle.pha.pa.us>
Date: Thu, 29 Dec 2005 13:19:45 +0000
Message-ID: <1135862385.2964.804.camel@localhost.localdomain>
X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) 
Content-Length:  7026

On Wed, 2005-12-28 at 20:58 -0500, Bruce Momjian wrote:
> Having read through this thread, I would like to propose a
> syntax/behavior.
> 
> I think we all now agree that the logging is more part of the table than
> the command itself.  Right now we have a COPY LOCK patch, but people are
> going to want to control logging for INSERT INTO ... SELECT, and UPDATE,
> and all sorts of other things, so I think we are best adding an ALTER
> TABLE capability.  I am thinking of this syntax:
> 
> 	ALTER TABLE name RELIABILITY option
> 
> where "option" is:
> 
> 	DROP [ TABLE ON CRASH ]
> 	DELETE [ ROWS ON CRASH ]
> 	EXCLUSIVE
> 	SHARE
> 
> Let me explain each option.  DROP would drop the table on a restart
> after a non-clean shutdown.  It would do _no_ logging on the table and
> allow concurrent access, plus index access.  DELETE is the same as DROP,
> but it just truncates the table (perhaps TRUNCATE is a better word).
> 
> EXCLUSIVE would allow only a single session to modify the table, and
> would do all changes by appending to the table, similar to COPY LOCK. 
> EXCLUSIVE would also not allow indexes because those can not be isolated
> like appending to the heap.  EXCLUSIVE would write all dirty shared
> buffers for the table and fsync them before committing.  SHARE is the
> functionality we have now, with full logging.
> 
> Does this get us any closer to a TODO item?  It isn't great, but I think
> it is pretty clear, and I assume pg_dump would use ALTER to load each
> table.  The advanage is that the COPY statements themselves are
> unchanged so they would work in loading into older versions of
> PostgreSQL.

First off, thanks for summarising a complex thread.

My view would be that this thread has been complex because everybody has
expressed a somewhat different requirement, which could be broken down
as:
1. The need for a multi-user-accessible yet temporary table
2. Loading data into a table immediately after it is created (i.e. in
same transaction), including but not limited to a reload from pg_dump
3. How to load data quickly into an existing table (COPY)
4. How to add/modify data quickly in an existing table (INSERT SELECT,
UPDATE)

I can see the need for all of those individually; my existing patch
submission covers (2) and (3) only. I very much like your thought to
coalesce these various requirements into a single coherent model.

For requirement (1), table level options make sense. We would:
- CREATE TABLE ALLTHINGS
- ALTER TABLE ALLTHINGS RELIABILITY DELETE ROWS ON RECOVERY
- lots of SQL, all fast because not logged

(2) is catered for adequately by the existing COPY patch i.e. it will
detect whether a table has just been created and then avoid writing WAL.
In the patch, pg_dump has *not* been altered to use COPY LOCK, so a
pg_dump *will* work with any other version of PostgreSQL, which *would
not* be the case if we added ALTER TABLE ... RELIABILITY statements into
it. Also, a pg_dump created at an earlier version could also be loaded
faster using the patch. The only requirement is to issue all SQL as part
of the same transaction - which is catered for by the
--single-transaction option on pg_restore and psql. So (2) is catered
for fully without the need for an ALTER TABLE ... RELIABILITY statement
or COPY LOCK.

For requirement (3), I would use table level options like this:
(the table already exists and is reasonably big; we should not assume
that everybody can and does use partitioning)
- ALTER TABLE RELIABILITY ALLTHINGS2 EXCLUSIVE
- COPY
- ALTER TABLE RELIABILITY ALLTHINGS2 SHARE

For a load into an existing table I would always do all three actions
together. COPY LOCK does exactly that *and* does it atomically. 

The two ways of doing (3) have a few pros/cons either way:
Pro for ALTER TABLE:
- same syntax as req (1)
- doesn't need the keyword LOCK 
- allows INSERT SELECT, UPDATE operations also (req 4)
Cons:
- existing programs have to add additional statements to take advantage
of this; with COPY LOCK we would add just a single keyword
- operation is not atomic, which might lead to some operations waiting
for a lock to operate as unlogged, since they would execute before the
second ALTER TABLE gets there
- operation will be understood by some, but not others. They will forget
to switch the RELIABILITY back on and then lose their whole table when
the database crashes. (watch...)

...but would it be a problem to have both?


So, my thinking would be to separate things into two:
a) Add a TODO item "shared temp tables" that caters for (1) and (4)

	ALTER TABLE name RELIABILITY 
		{DELETE ROWS AT RECOVERY | FULL RECOVERY}
(syntax TBD)

which would 
- truncate all rows and remove all index entries during recovery
- use shared_buffers, not temp_buffers
- never write xlog records, even when in PITR mode
- would avoid writing WAL for both heap *and* index tuples

b) Leave the COPY patch as is, since it caters for reqs (2) and (3) as
*separate* optimizations (but using a common infrastructure in code).
[This work was based upon discussions on -hackers only 6 months ago, so
its not like its been snuck in or anything
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00075.php ]

These two thoughts are separable. There is no need to
have-both-or-neither within PostgreSQL.

Eventually, I'd like all of these options, as a database designer.

Best Regards, Simon Riggs

> --------------------------------------------------------------------------
> 
> Martijn van Oosterhout wrote:
> -- Start of PGP signed section.
> > On Mon, Dec 26, 2005 at 12:03:27PM +0000, Simon Riggs wrote:
> > > I would not be against such a table-level switch, but the exact
> > > behaviour would need to be specified more closely before this became a
> > > TODO item, IMHO.
> > 
> > Well, I think at a per table level is the only sensible level. If a
> > table isn't logged, neither are the indexes. After an unclean shutdown
> > the data could be anywhere between OK and rubbish, with no way of
> > finding out which way.
> > 
> > > If someone has a 100 GB table, they would not appreciate the table being
> > > truncated if a transaction to load 1 GB of data aborts, forcing recovery
> > > of the 100 GB table.
> > 
> > Ah, but wouldn't such a large table be partitioned in such a way that
> > you could have the most recent partition having the loaded data.
> > Personally, I think these "shared temp tables" have more applications
> > than meet the eye. I've had systems with cache tables which could be
> > wiped on boot. Though I think my preference would be to TRUNCATE rather
> > than DROP on unclean shutdown.
> > 
> > Have a nice day,
> > -- 
> > Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> > > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> > > tool for doing 5% of the work and then sitting around waiting for someone
> > > else to do the other 95% so you can sue them.
> -- End of PGP section, PGP failed!
> 

From pgsql-hackers-owner+M78019=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 08:20:11 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
From: Simon Riggs <simon@2ndquadrant.com>
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Martijn van Oosterhout <kleptog@svana.org>, Tom Lane <tgl@sss.pgh.pa.us>, 
	   Greg Stark <gsstark@mit.edu>, Rod Taylor <pg@rbt.ca>, 
	   Qingqing Zhou <zhouqq@cs.toronto.edu>, pgsql-hackers@postgresql.org
In-Reply-To: <200512290158.jBT1wEK28785@candle.pha.pa.us>
References: <200512290158.jBT1wEK28785@candle.pha.pa.us>
Date: Thu, 29 Dec 2005 13:19:45 +0000
Message-ID: <1135862385.2964.804.camel@localhost.localdomain>
X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) 
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.112 required=5 tests=[AWL=0.112]
X-Spam-Score: 0.112
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  7139

On Wed, 2005-12-28 at 20:58 -0500, Bruce Momjian wrote:
> Having read through this thread, I would like to propose a
> syntax/behavior.
> 
> I think we all now agree that the logging is more part of the table than
> the command itself.  Right now we have a COPY LOCK patch, but people are
> going to want to control logging for INSERT INTO ... SELECT, and UPDATE,
> and all sorts of other things, so I think we are best adding an ALTER
> TABLE capability.  I am thinking of this syntax:
> 
> 	ALTER TABLE name RELIABILITY option
> 
> where "option" is:
> 
> 	DROP [ TABLE ON CRASH ]
> 	DELETE [ ROWS ON CRASH ]
> 	EXCLUSIVE
> 	SHARE
> 
> Let me explain each option.  DROP would drop the table on a restart
> after a non-clean shutdown.  It would do _no_ logging on the table and
> allow concurrent access, plus index access.  DELETE is the same as DROP,
> but it just truncates the table (perhaps TRUNCATE is a better word).
> 
> EXCLUSIVE would allow only a single session to modify the table, and
> would do all changes by appending to the table, similar to COPY LOCK. 
> EXCLUSIVE would also not allow indexes because those can not be isolated
> like appending to the heap.  EXCLUSIVE would write all dirty shared
> buffers for the table and fsync them before committing.  SHARE is the
> functionality we have now, with full logging.
> 
> Does this get us any closer to a TODO item?  It isn't great, but I think
> it is pretty clear, and I assume pg_dump would use ALTER to load each
> table.  The advanage is that the COPY statements themselves are
> unchanged so they would work in loading into older versions of
> PostgreSQL.

First off, thanks for summarising a complex thread.

My view would be that this thread has been complex because everybody has
expressed a somewhat different requirement, which could be broken down
as:
1. The need for a multi-user-accessible yet temporary table
2. Loading data into a table immediately after it is created (i.e. in
same transaction), including but not limited to a reload from pg_dump
3. How to load data quickly into an existing table (COPY)
4. How to add/modify data quickly in an existing table (INSERT SELECT,
UPDATE)

I can see the need for all of those individually; my existing patch
submission covers (2) and (3) only. I very much like your thought to
coalesce these various requirements into a single coherent model.

For requirement (1), table level options make sense. We would:
- CREATE TABLE ALLTHINGS
- ALTER TABLE ALLTHINGS RELIABILITY DELETE ROWS ON RECOVERY
- lots of SQL, all fast because not logged

(2) is catered for adequately by the existing COPY patch i.e. it will
detect whether a table has just been created and then avoid writing WAL.
In the patch, pg_dump has *not* been altered to use COPY LOCK, so a
pg_dump *will* work with any other version of PostgreSQL, which *would
not* be the case if we added ALTER TABLE ... RELIABILITY statements into
it. Also, a pg_dump created at an earlier version could also be loaded
faster using the patch. The only requirement is to issue all SQL as part
of the same transaction - which is catered for by the
--single-transaction option on pg_restore and psql. So (2) is catered
for fully without the need for an ALTER TABLE ... RELIABILITY statement
or COPY LOCK.

For requirement (3), I would use table level options like this:
(the table already exists and is reasonably big; we should not assume
that everybody can and does use partitioning)
- ALTER TABLE RELIABILITY ALLTHINGS2 EXCLUSIVE
- COPY
- ALTER TABLE RELIABILITY ALLTHINGS2 SHARE

For a load into an existing table I would always do all three actions
together. COPY LOCK does exactly that *and* does it atomically. 

The two ways of doing (3) have a few pros/cons either way:
Pro for ALTER TABLE:
- same syntax as req (1)
- doesn't need the keyword LOCK 
- allows INSERT SELECT, UPDATE operations also (req 4)
Cons:
- existing programs have to add additional statements to take advantage
of this; with COPY LOCK we would add just a single keyword
- operation is not atomic, which might lead to some operations waiting
for a lock to operate as unlogged, since they would execute before the
second ALTER TABLE gets there
- operation will be understood by some, but not others. They will forget
to switch the RELIABILITY back on and then lose their whole table when
the database crashes. (watch...)

...but would it be a problem to have both?


So, my thinking would be to separate things into two:
a) Add a TODO item "shared temp tables" that caters for (1) and (4)

	ALTER TABLE name RELIABILITY 
		{DELETE ROWS AT RECOVERY | FULL RECOVERY}
(syntax TBD)

which would 
- truncate all rows and remove all index entries during recovery
- use shared_buffers, not temp_buffers
- never write xlog records, even when in PITR mode
- would avoid writing WAL for both heap *and* index tuples

b) Leave the COPY patch as is, since it caters for reqs (2) and (3) as
*separate* optimizations (but using a common infrastructure in code).
[This work was based upon discussions on -hackers only 6 months ago, so
its not like its been snuck in or anything
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00075.php ]

These two thoughts are separable. There is no need to
have-both-or-neither within PostgreSQL.

Eventually, I'd like all of these options, as a database designer.

Best Regards, Simon Riggs

> --------------------------------------------------------------------------
> 
> Martijn van Oosterhout wrote:
> -- Start of PGP signed section.
> > On Mon, Dec 26, 2005 at 12:03:27PM +0000, Simon Riggs wrote:
> > > I would not be against such a table-level switch, but the exact
> > > behaviour would need to be specified more closely before this became a
> > > TODO item, IMHO.
> > 
> > Well, I think at a per table level is the only sensible level. If a
> > table isn't logged, neither are the indexes. After an unclean shutdown
> > the data could be anywhere between OK and rubbish, with no way of
> > finding out which way.
> > 
> > > If someone has a 100 GB table, they would not appreciate the table being
> > > truncated if a transaction to load 1 GB of data aborts, forcing recovery
> > > of the 100 GB table.
> > 
> > Ah, but wouldn't such a large table be partitioned in such a way that
> > you could have the most recent partition having the loaded data.
> > Personally, I think these "shared temp tables" have more applications
> > than meet the eye. I've had systems with cache tables which could be
> > wiped on boot. Though I think my preference would be to TRUNCATE rather
> > than DROP on unclean shutdown.
> > 
> > Have a nice day,
> > -- 
> > Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> > > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> > > tool for doing 5% of the work and then sitting around waiting for someone
> > > else to do the other 95% so you can sue them.
> -- End of PGP section, PGP failed!
> 


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

From pgsql-hackers-owner+M78021=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 09:35:58 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
From: Rod Taylor <pg@rbt.ca>
To: Simon Riggs <simon@2ndquadrant.com>
cc: Bruce Momjian <pgman@candle.pha.pa.us>, 
	   Martijn van Oosterhout <kleptog@svana.org>, Tom Lane <tgl@sss.pgh.pa.us>, 
	   Greg Stark <gsstark@mit.edu>, Qingqing Zhou <zhouqq@cs.toronto.edu>, 
	  pgsql-hackers@postgresql.org
In-Reply-To: <1135862385.2964.804.camel@localhost.localdomain>
References: <200512290158.jBT1wEK28785@candle.pha.pa.us>
	  <1135862385.2964.804.camel@localhost.localdomain>
Date: Thu, 29 Dec 2005 09:35:27 -0500
Message-ID: <1135866927.61038.13.camel@home>
X-Mailer: Evolution 2.4.2.1 FreeBSD GNOME Team Port 
X-SA-Exim-Mail-From: pg@rbt.ca
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
X-SA-Exim-Version: 3.1 (built Tue Feb 24 05:09:27 GMT 2004)
X-SA-Exim-Scanned: Yes
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.024 required=5 tests=[AWL=0.024, UPPERCASE_25_50=0]
X-Spam-Score: 0.024
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:   506


> So, my thinking would be to separate things into two:
> a) Add a TODO item "shared temp tables" that caters for (1) and (4)
> 
> 	ALTER TABLE name RELIABILITY 
> 		{DELETE ROWS AT RECOVERY | FULL RECOVERY}
> (syntax TBD)

DELETE ROWS AT RECOVERY would need to be careful or disallowed when
referenced via a foreign key to ensure the database is not restored in
an inconsistent state.

-- 


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

From pg@rbt.ca Thu Dec 29 09:35:35 2005
From: Rod Taylor <pg@rbt.ca>
To: Simon Riggs <simon@2ndquadrant.com>
cc: Bruce Momjian <pgman@candle.pha.pa.us>, 
	   Martijn van Oosterhout <kleptog@svana.org>, Tom Lane <tgl@sss.pgh.pa.us>, 
	   Greg Stark <gsstark@mit.edu>, Qingqing Zhou <zhouqq@cs.toronto.edu>, 
	  pgsql-hackers@postgresql.org
In-Reply-To: <1135862385.2964.804.camel@localhost.localdomain>
References: <200512290158.jBT1wEK28785@candle.pha.pa.us>
	  <1135862385.2964.804.camel@localhost.localdomain>
Date: Thu, 29 Dec 2005 09:35:27 -0500
Message-ID: <1135866927.61038.13.camel@home>
X-Mailer: Evolution 2.4.2.1 FreeBSD GNOME Team Port 
X-SA-Exim-Mail-From: pg@rbt.ca
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on psi.look.ca
X-Spam-Level: 
X-Spam-Status: No, hits=0.7 required=9.0 tests=UPPERCASE_25_50 autolearn=no 
	version=2.63
X-SA-Exim-Version: 3.1 (built Tue Feb 24 05:09:27 GMT 2004)
X-SA-Exim-Scanned: Yes
Content-Length:   393


> So, my thinking would be to separate things into two:
> a) Add a TODO item "shared temp tables" that caters for (1) and (4)
> 
> 	ALTER TABLE name RELIABILITY 
> 		{DELETE ROWS AT RECOVERY | FULL RECOVERY}
> (syntax TBD)

DELETE ROWS AT RECOVERY would need to be careful or disallowed when
referenced via a foreign key to ensure the database is not restored in
an inconsistent state.

-- 

From pgsql-hackers-owner+M78022=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 10:10:57 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
From: Simon Riggs <simon@2ndquadrant.com>
To: Rod Taylor <pg@rbt.ca>
cc: Bruce Momjian <pgman@candle.pha.pa.us>, 
	   Martijn van Oosterhout <kleptog@svana.org>, Tom Lane <tgl@sss.pgh.pa.us>, 
	   Greg Stark <gsstark@mit.edu>, Qingqing Zhou <zhouqq@cs.toronto.edu>, 
	  pgsql-hackers@postgresql.org
In-Reply-To: <1135866927.61038.13.camel@home>
References: <200512290158.jBT1wEK28785@candle.pha.pa.us>
	  <1135862385.2964.804.camel@localhost.localdomain>
	  <1135866927.61038.13.camel@home>
Date: Thu, 29 Dec 2005 15:10:40 +0000
Message-ID: <1135869040.2964.824.camel@localhost.localdomain>
X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) 
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.113 required=5 tests=[AWL=0.113]
X-Spam-Score: 0.113
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:   888

On Thu, 2005-12-29 at 09:35 -0500, Rod Taylor wrote:
> > So, my thinking would be to separate things into two:
> > a) Add a TODO item "shared temp tables" that caters for (1) and (4)
> > 
> > 	ALTER TABLE name RELIABILITY 
> > 		{DELETE ROWS AT RECOVERY | FULL RECOVERY}
> > (syntax TBD)
> 
> DELETE ROWS AT RECOVERY would need to be careful or disallowed when
> referenced via a foreign key to ensure the database is not restored in
> an inconsistent state.

I think we'd need to apply the same rule as we do for temp tables: they
cannot be referenced by a permanent table.

There are possibly some other restrictions also. Anyone?

Best Regards, Simon Riggs


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

From tgl@sss.pgh.pa.us Thu Dec 29 11:12:13 2005
To: Simon Riggs <simon@2ndquadrant.com>
cc: Bruce Momjian <pgman@candle.pha.pa.us>, 
	   Martijn van Oosterhout <kleptog@svana.org>, Greg Stark <gsstark@mit.edu>, 
	   Rod Taylor <pg@rbt.ca>, Qingqing Zhou <zhouqq@cs.toronto.edu>, 
	  pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and 
In-Reply-To: <1135862385.2964.804.camel@localhost.localdomain> 
References: <200512290158.jBT1wEK28785@candle.pha.pa.us>  <1135862385.2964.804.camel@localhost.localdomain>
Comments: In-reply-to Simon Riggs <simon@2ndquadrant.com>
	message dated "Thu, 29 Dec 2005 13:19:45 +0000"
Date: Thu, 29 Dec 2005 11:12:11 -0500
Message-ID: <7273.1135872731@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Content-Length:  1963

Simon Riggs <simon@2ndquadrant.com> writes:
> My view would be that this thread has been complex because everybody has
> expressed a somewhat different requirement, which could be broken down
> as:
> 1. The need for a multi-user-accessible yet temporary table
> 2. Loading data into a table immediately after it is created (i.e. in
> same transaction), including but not limited to a reload from pg_dump
> 3. How to load data quickly into an existing table (COPY)
> 4. How to add/modify data quickly in an existing table (INSERT SELECT,
> UPDATE)

> I can see the need for all of those individually; my existing patch
> submission covers (2) and (3) only. I very much like your thought to
> coalesce these various requirements into a single coherent model.

However, you then seem to be arguing for still using the COPY LOCK
syntax, which I think Bruce intended would go away in favor of using
these ALTER commands.  Certainly that's what I'd prefer --- COPY has
got too darn many options already.

> In the patch, pg_dump has *not* been altered to use COPY LOCK, so a
> pg_dump *will* work with any other version of PostgreSQL, which *would
> not* be the case if we added ALTER TABLE ... RELIABILITY statements into
> it.

Wrong --- the good thing about ALTER TABLE is that an old version of
Postgres would simply reject it and keep going.  Therefore we could get
the speedup in dumps without losing compatibility, which is not true
of COPY LOCK.

BTW, this is a perfect example of the use-case for not abandoning a
dump-file load simply because one command fails.  (We have relied on
this sort of reasoning many times before, too, for example by using
"SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.)
I don't think that "wrap the whole load into begin/end" is really a very
workable answer, because there are far too many scenarios where you
can't do that.  Another one where it doesn't help is a data-only dump.

			regards, tom lane

From pgsql-hackers-owner+M78028=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 11:12:41 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
To: Simon Riggs <simon@2ndquadrant.com>
cc: Bruce Momjian <pgman@candle.pha.pa.us>, 
	   Martijn van Oosterhout <kleptog@svana.org>, Greg Stark <gsstark@mit.edu>, 
	   Rod Taylor <pg@rbt.ca>, Qingqing Zhou <zhouqq@cs.toronto.edu>, 
	  pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and 
In-Reply-To: <1135862385.2964.804.camel@localhost.localdomain> 
References: <200512290158.jBT1wEK28785@candle.pha.pa.us>  <1135862385.2964.804.camel@localhost.localdomain>
Comments: In-reply-to Simon Riggs <simon@2ndquadrant.com>
	message dated "Thu, 29 Dec 2005 13:19:45 +0000"
Date: Thu, 29 Dec 2005 11:12:11 -0500
Message-ID: <7273.1135872731@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.053 required=5 tests=[AWL=0.053]
X-Spam-Score: 0.053
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  2075

Simon Riggs <simon@2ndquadrant.com> writes:
> My view would be that this thread has been complex because everybody has
> expressed a somewhat different requirement, which could be broken down
> as:
> 1. The need for a multi-user-accessible yet temporary table
> 2. Loading data into a table immediately after it is created (i.e. in
> same transaction), including but not limited to a reload from pg_dump
> 3. How to load data quickly into an existing table (COPY)
> 4. How to add/modify data quickly in an existing table (INSERT SELECT,
> UPDATE)

> I can see the need for all of those individually; my existing patch
> submission covers (2) and (3) only. I very much like your thought to
> coalesce these various requirements into a single coherent model.

However, you then seem to be arguing for still using the COPY LOCK
syntax, which I think Bruce intended would go away in favor of using
these ALTER commands.  Certainly that's what I'd prefer --- COPY has
got too darn many options already.

> In the patch, pg_dump has *not* been altered to use COPY LOCK, so a
> pg_dump *will* work with any other version of PostgreSQL, which *would
> not* be the case if we added ALTER TABLE ... RELIABILITY statements into
> it.

Wrong --- the good thing about ALTER TABLE is that an old version of
Postgres would simply reject it and keep going.  Therefore we could get
the speedup in dumps without losing compatibility, which is not true
of COPY LOCK.

BTW, this is a perfect example of the use-case for not abandoning a
dump-file load simply because one command fails.  (We have relied on
this sort of reasoning many times before, too, for example by using
"SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.)
I don't think that "wrap the whole load into begin/end" is really a very
workable answer, because there are far too many scenarios where you
can't do that.  Another one where it doesn't help is a data-only dump.

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

From pgsql-hackers-owner+M78025=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 10:57:46 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
Message-ID: <51082.68.143.134.146.1135872877.squirrel@www.dunslane.net>
Date: Thu, 29 Dec 2005 10:14:37 -0600 (CST)
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
From: Andrew Dunstan <andrew@dunslane.net>
To: <pgman@candle.pha.pa.us>
In-Reply-To: <200512290158.jBT1wEK28785@candle.pha.pa.us>
References: <200512290158.jBT1wEK28785@candle.pha.pa.us>
X-Priority: 3
Importance: Normal
X-MSMail-Priority: Normal
cc: <kleptog@svana.org>,  <simon@2ndquadrant.com>,  <tgl@sss.pgh.pa.us>, 
	   <gsstark@mit.edu>,  <pg@rbt.ca>,  <zhouqq@cs.toronto.edu>, 
	   <pgsql-hackers@postgresql.org>
X-Mailer: SquirrelMail (version 1.2.5)
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.082 required=5 tests=[AWL=0.082]
X-Spam-Score: 0.082
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  1185

Bruce Momjian said:
> DROP would drop the table on a restart
> after a non-clean shutdown.  It would do _no_ logging on the table and
> allow concurrent access, plus index access.  DELETE is the same as
> DROP, but it just truncates the table (perhaps TRUNCATE is a better
> word).
>
> EXCLUSIVE would allow only a single session to modify the table, and
> would do all changes by appending to the table, similar to COPY LOCK.
> EXCLUSIVE would also not allow indexes because those can not be
> isolated like appending to the heap.  EXCLUSIVE would write all dirty
> shared buffers for the table and fsync them before committing.  SHARE
> is the functionality we have now, with full logging.


I an horribly scared that this will be used as a "performance boost" for
normal use. I would at least like to see some restrictions that make it
harder to mis-use. Perhaps restrict to superuser?

cheers

andrew





---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly

From tgl@sss.pgh.pa.us Thu Dec 29 11:24:30 2005
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Andrew Dunstan <andrew@dunslane.net>, kleptog@svana.org, 
	  simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, 
	  pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and 
In-Reply-To: <200512291605.jBTG5gi00396@candle.pha.pa.us> 
References: <200512291605.jBTG5gi00396@candle.pha.pa.us>
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
	message dated "Thu, 29 Dec 2005 11:05:42 -0500"
Date: Thu, 29 Dec 2005 11:24:28 -0500
Message-ID: <7966.1135873468@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Content-Length:   612

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Andrew Dunstan wrote:
>> I an horribly scared that this will be used as a "performance boost" for
>> normal use. I would at least like to see some restrictions that make it
>> harder to mis-use. Perhaps restrict to superuser?

> Certainly restrict to table owner.

I can see the argument for superuser-only: decisions about data
integrity tradeoffs should be reserved to the DBA, who is the one who
will get blamed if the database loses data, no matter how stupid his
users are.

But I'm not wedded to that.  I could live with table-owner.

			regards, tom lane

From pgsql-hackers-owner+M78031=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 11:38:17 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
From: Bruce Momjian <pgman@candle.pha.pa.us>
Message-ID: <200512291637.jBTGbdC03848@candle.pha.pa.us>
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
In-Reply-To: <7273.1135872731@sss.pgh.pa.us>
To: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 29 Dec 2005 11:37:39 -0500 (EST)
cc: Simon Riggs <simon@2ndquadrant.com>, 
	   Martijn van Oosterhout <kleptog@svana.org>, Greg Stark <gsstark@mit.edu>, 
	   Rod Taylor <pg@rbt.ca>, Qingqing Zhou <zhouqq@cs.toronto.edu>, 
	  pgsql-hackers@postgresql.org
X-Mailer: ELM [version 2.4ME+ PL121 (25)]
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.122 required=5 tests=[AWL=0.122]
X-Spam-Score: 0.122
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  3932

Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > My view would be that this thread has been complex because everybody has
> > expressed a somewhat different requirement, which could be broken down
> > as:
> > 1. The need for a multi-user-accessible yet temporary table
> > 2. Loading data into a table immediately after it is created (i.e. in
> > same transaction), including but not limited to a reload from pg_dump
> > 3. How to load data quickly into an existing table (COPY)
> > 4. How to add/modify data quickly in an existing table (INSERT SELECT,
> > UPDATE)
> 
> > I can see the need for all of those individually; my existing patch
> > submission covers (2) and (3) only. I very much like your thought to
> > coalesce these various requirements into a single coherent model.
> 
> However, you then seem to be arguing for still using the COPY LOCK
> syntax, which I think Bruce intended would go away in favor of using
> these ALTER commands.  Certainly that's what I'd prefer --- COPY has
> got too darn many options already.
> 
> > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a
> > pg_dump *will* work with any other version of PostgreSQL, which *would
> > not* be the case if we added ALTER TABLE ... RELIABILITY statements into
> > it.
> 
> Wrong --- the good thing about ALTER TABLE is that an old version of
> Postgres would simply reject it and keep going.  Therefore we could get
> the speedup in dumps without losing compatibility, which is not true
> of COPY LOCK.
> 
> BTW, this is a perfect example of the use-case for not abandoning a
> dump-file load simply because one command fails.  (We have relied on
> this sort of reasoning many times before, too, for example by using
> "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.)
> I don't think that "wrap the whole load into begin/end" is really a very
> workable answer, because there are far too many scenarios where you
> can't do that.  Another one where it doesn't help is a data-only dump.

Yep, Tom is echoing my reaction.  There is a temptation to add things up
onto existing commands, e.g. LOCK, and while it works, it makes for some
very complex user API's.  Having COPY behave differently because it is
in a transaction is fine as long as it is user-invisible, but once you
require users to do that to get the speedup, it isn't user-invisible
anymore.

(I can see it now, "Why is pg_dump putting things in transactions?",
"Because it prevents it from being logged."  "Oh, should I be doing that
in my code?"  "Perhaps, if you want ..."  You can see where that
discussion is going.  Having them see "ATER TABLE ... RELIBILITY
TRUNCATE" is very clear, and very clear on how it can be used in user
code.)

I think there is great utility in giving users one API, namely
RELIABILITY (or some other keyword), and telling them that is where they
control logging.  I realize adding one keyword, LOCK, to an existing
command isn't a big deal, but once you decentralize your API enough
times, you end up with a terribly complex database system.  It is this
design rigidity that helps make PostgreSQL so much easier to use than
other database systems.

I do think it is valid concern about someone use the table between the
CREATE and the ALTER TABLE RELIABILITY.  One solution would be to allow
the RELIABILITY as part of the CREATE TABLE, another is to tell users to
create the table inside a transaction.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

From pgsql-hackers-owner+M78036=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 12:21:12 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
To: Andrew Dunstan <andrew@dunslane.net>
cc: <pgman@candle.pha.pa.us>,  <kleptog@svana.org>,  <simon@2ndquadrant.com>, 
	   <tgl@sss.pgh.pa.us>,  <gsstark@mit.edu>,  <pg@rbt.ca>, 
	   <zhouqq@cs.toronto.edu>,  <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
References: <200512290158.jBT1wEK28785@candle.pha.pa.us>
	 <51082.68.143.134.146.1135872877.squirrel@www.dunslane.net>
In-Reply-To: <51082.68.143.134.146.1135872877.squirrel@www.dunslane.net>
From: Greg Stark <gsstark@mit.edu>
Organization: The Emacs Conspiracy; member since 1992
Date: 29 Dec 2005 12:20:32 -0500
Message-ID: <87vex74y73.fsf@stark.xeocode.com>
Lines: 42
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.4
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.112 required=5 tests=[AWL=0.112]
X-Spam-Score: 0.112
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  1983

"Andrew Dunstan" <andrew@dunslane.net> writes:

> Bruce Momjian said:
> > DROP would drop the table on a restart
> > after a non-clean shutdown.  It would do _no_ logging on the table and
> > allow concurrent access, plus index access.  DELETE is the same as
> > DROP, but it just truncates the table (perhaps TRUNCATE is a better
> > word).
> >
> > EXCLUSIVE would allow only a single session to modify the table, and
> > would do all changes by appending to the table, similar to COPY LOCK.
> > EXCLUSIVE would also not allow indexes because those can not be
> > isolated like appending to the heap.  EXCLUSIVE would write all dirty
> > shared buffers for the table and fsync them before committing.  SHARE
> > is the functionality we have now, with full logging.
> 
> I an horribly scared that this will be used as a "performance boost" for
> normal use. I would at least like to see some restrictions that make it
> harder to mis-use. Perhaps restrict to superuser?

Well that's its whole purpose. At least you can hardly argue that you didn't
realize the consequences of "DELETE ROWS ON RECOVERY"... :)

Some thoughts:

a) I'm not sure I understand the purpose of EXCLUSIVE. When would I ever want to
   use it instead of DELETE ROWS?

b) It seems like the other feature people were talking about of not logging
   for a table created within the same transaction should be handled by
   having this flag implicitly set for any such newly created table.
   Ie, the test for whether to log would look like:

   if (!table->logged && table->xid != myxid) ...

c) Every option in ALTER TABLE should be in CREATE TABLE as well.

d) Yes as someone else mentioned, this should only be allowable on a table
   with no foreign keys referencing it. 

-- 
greg


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

From pgsql-hackers-owner+M78037=pgman=candle.pha.pa.us@postgresql.org Thu Dec 29 12:31:40 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
From: Bruce Momjian <pgman@candle.pha.pa.us>
Message-ID: <200512291730.jBTHUnn09840@candle.pha.pa.us>
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
In-Reply-To: <87vex74y73.fsf@stark.xeocode.com>
To: Greg Stark <gsstark@mit.edu>
Date: Thu, 29 Dec 2005 12:30:49 -0500 (EST)
cc: Andrew Dunstan <andrew@dunslane.net>, kleptog@svana.org, 
	  simon@2ndquadrant.com, tgl@sss.pgh.pa.us, pg@rbt.ca, zhouqq@cs.toronto.edu, 
	  pgsql-hackers@postgresql.org
X-Mailer: ELM [version 2.4ME+ PL121 (25)]
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.122 required=5 tests=[AWL=0.122]
X-Spam-Score: 0.122
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  3304

Greg Stark wrote:
> "Andrew Dunstan" <andrew@dunslane.net> writes:
> 
> > Bruce Momjian said:
> > > DROP would drop the table on a restart
> > > after a non-clean shutdown.  It would do _no_ logging on the table and
> > > allow concurrent access, plus index access.  DELETE is the same as
> > > DROP, but it just truncates the table (perhaps TRUNCATE is a better
> > > word).
> > >
> > > EXCLUSIVE would allow only a single session to modify the table, and
> > > would do all changes by appending to the table, similar to COPY LOCK.
> > > EXCLUSIVE would also not allow indexes because those can not be
> > > isolated like appending to the heap.  EXCLUSIVE would write all dirty
> > > shared buffers for the table and fsync them before committing.  SHARE
> > > is the functionality we have now, with full logging.
> > 
> > I an horribly scared that this will be used as a "performance boost" for
> > normal use. I would at least like to see some restrictions that make it
> > harder to mis-use. Perhaps restrict to superuser?
> 
> Well that's its whole purpose. At least you can hardly argue that you didn't
> realize the consequences of "DELETE ROWS ON RECOVERY"... :)

True.  I think we are worried about non-owners using it, but the owner
had to grant permissions for others to modify it, so we might be OK.

> Some thoughts:
> 
> a) I'm not sure I understand the purpose of EXCLUSIVE. When would I ever want to
>    use it instead of DELETE ROWS?

Good question.  The use case is doing COPY into a table that already had
data.  EXCLUSIVE allows additions to the table but preserves the
existing data on a crash.

> b) It seems like the other feature people were talking about of not logging
>    for a table created within the same transaction should be handled by
>    having this flag implicitly set for any such newly created table.
>    Ie, the test for whether to log would look like:
> 
>    if (!table->logged && table->xid != myxid) ...

Yes, the question is whether we want to limit users to having this
optimization _only_ when they have created the table in the same
transaction, and the short answer is we don't.

> c) Every option in ALTER TABLE should be in CREATE TABLE as well.

I looked into that and see that things like:

    ALTER [ COLUMN ] column SET STATISTICS integer
    ALTER [ COLUMN ] column SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

are not supported by CREATE TABLE, and probably shouldn't be because the
value can be changed after the table is created.  I think the only
things we usually support in CREATE TABLE are those that cannot be
altered.

> d) Yes as someone else mentioned, this should only be allowable on a table
>    with no foreign keys referencing it. 

Right, and EXCLUSIVE can not have an index either.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly

From simon@2ndquadrant.com Fri Dec 30 08:10:53 2005
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
From: Simon Riggs <simon@2ndquadrant.com>
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Andrew Dunstan <andrew@dunslane.net>, Tom Lane <tgl@sss.pgh.pa.us>, 
	   Martijn van Oosterhout <kleptog@svana.org>, Greg Stark <gsstark@mit.edu>, 
	   Rod Taylor <pg@rbt.ca>, Qingqing Zhou <zhouqq@cs.toronto.edu>, 
	  pgsql-hackers@postgresql.org
In-Reply-To: <200512291637.jBTGbdC03848@candle.pha.pa.us>
References: <200512291637.jBTGbdC03848@candle.pha.pa.us>
Date: Fri, 30 Dec 2005 13:09:12 +0000
Message-ID: <1135948152.2862.113.camel@localhost.localdomain>
X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) 
Content-Length:  6343

On Thu, 2005-12-29 at 11:37 -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Simon Riggs <simon@2ndquadrant.com> writes:
> > > My view would be that this thread has been complex because everybody has
> > > expressed a somewhat different requirement, which could be broken down
> > > as:
> > > 1. The need for a multi-user-accessible yet temporary table
> > > 2. Loading data into a table immediately after it is created (i.e. in
> > > same transaction), including but not limited to a reload from pg_dump
> > > 3. How to load data quickly into an existing table (COPY)
> > > 4. How to add/modify data quickly in an existing table (INSERT SELECT,
> > > UPDATE)

> > However, you then seem to be arguing for still using the COPY LOCK
> > syntax, which I think Bruce intended would go away in favor of using
> > these ALTER commands.  Certainly that's what I'd prefer --- COPY has
> > got too darn many options already.

COPY LOCK was Tom's suggestion at the end of a long discussion thread on
this precise issue. Nobody objected to it at that point; I implemented
it *exactly* that way because I wanted to very visibly follow the
consensus of the community, after informed debate.
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00068.php

Please re-read the links to previous discussions.
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php
There are points there, not made by me, that still apply and need to be
considered here, yet have not been.

Just to restate my current thinking:
- agree we should have ALTER TABLE ... RELIABILITY DELETE ROWS
- we should have COPY LOCK rather than 
ALTER TABLE .... RELIABILITY EXCLUSIVE
(Though I welcome better wording and syntax in either case; it is the
behaviour only that I discuss).

It seems now that we have agreed approaches for (1), (2) and (4). Please
note that I have listened to the needs of others with regard to
requirement (1), as espoused by earlier by Hannu and again now by
Martijn. Some of the points about requirement (3) I made in my previous
post have not yet been addressed, IMHO.

My mind is not fixed. AFAICS there are valid points remaining on both
sides of the discussion about loading data quickly into an existing
table.

> I do think it is valid concern about someone use the table between the
> CREATE and the ALTER TABLE RELIABILITY.  One solution would be to allow
> the RELIABILITY as part of the CREATE TABLE, another is to tell users to
> create the table inside a transaction.

Neither solution works for this use case:

> > 3. How to load data quickly into an existing table (COPY)

This is the only use case for which ALTER TABLE ... EXCLUSIVE makes
sense. That option means that any write lock held upon the table would
be an EXCLUSIVE table lock, so would never be a performance gain with
single row INSERT, UPDATE or DELETEs. 

Following Andrew's concerns, I'd also note that ALTER TABLE requires a
much higher level of privilege to operate than does COPY. That sounds
like it will make things more secure, but all it does is open up the
administrative rights, since full ownership rights must be obtained
merely to load data. 

> Having COPY behave differently because it is
> in a transaction is fine as long as it is user-invisible

Good

> I think there is great utility in giving users one API, namely
> RELIABILITY (or some other keyword), and telling them that is where they
> control logging.  I realize adding one keyword, LOCK, to an existing
> command isn't a big deal, but once you decentralize your API enough
> times, you end up with a terribly complex database system.  It is this
> design rigidity that helps make PostgreSQL so much easier to use than
> other database systems.

I do see the appeal of your suggestion...

TRUNCATE is a special command to delete quickly. There is no requirement
to do an ALTER TABLE statement before that command executes.

Balance would suggest that a special command to load data quickly would
be reasonably accepted by users.




Minor points below:

> > > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a
> > > pg_dump *will* work with any other version of PostgreSQL, which *would
> > > not* be the case if we added ALTER TABLE ... RELIABILITY statements into
> > > it.
> > 
> > Wrong --- the good thing about ALTER TABLE is that an old version of
> > Postgres would simply reject it and keep going.  Therefore we could get
> > the speedup in dumps without losing compatibility, which is not true
> > of COPY LOCK.

That was pointing out one of Bruce's objections was not relevant because
it assumed COPY LOCK was required to make pg_restore go faster; that was
not the case - so there is no valid objection either way now.

> > BTW, this is a perfect example of the use-case for not abandoning a
> > dump-file load simply because one command fails.  (We have relied on
> > this sort of reasoning many times before, too, for example by using
> > "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.)
> > I don't think that "wrap the whole load into begin/end" is really a very
> > workable answer, because there are far too many scenarios where you
> > can't do that.  Another one where it doesn't help is a data-only dump.

Which is why --single-transaction is not the default, per the earlier
discussion on that point (on -patches).

> Yep, Tom is echoing my reaction.  There is a temptation to add things up
> onto existing commands, e.g. LOCK, and while it works, it makes for some
> very complex user API's.  Having COPY behave differently because it is
> in a transaction is fine as long as it is user-invisible, but once you
> require users to do that to get the speedup, it isn't user-invisible
> anymore.
> 
> (I can see it now, "Why is pg_dump putting things in transactions?",
> "Because it prevents it from being logged."  "Oh, should I be doing that
> in my code?"  "Perhaps, if you want ..."  You can see where that
> discussion is going.  Having them see "ATER TABLE ... RELIBILITY
> TRUNCATE" is very clear, and very clear on how it can be used in user
> code.)

The above case is not an argument against COPY LOCK. Exactly what you
say above would still occur even when we have ALTER TABLE ...
RELIABILITY statement, since COPY LOCK and
COPY-optimized-within-same-transaction are different things.

Best Regards, Simon Riggs

From pgsql-hackers-owner+M78064=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 11:50:49 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
From: Bruce Momjian <pgman@candle.pha.pa.us>
Message-ID: <200512301649.jBUGnxn21488@candle.pha.pa.us>
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
In-Reply-To: <1135948152.2862.113.camel@localhost.localdomain>
To: Simon Riggs <simon@2ndquadrant.com>
Date: Fri, 30 Dec 2005 11:49:59 -0500 (EST)
cc: Andrew Dunstan <andrew@dunslane.net>, Tom Lane <tgl@sss.pgh.pa.us>, 
	   Martijn van Oosterhout <kleptog@svana.org>, Greg Stark <gsstark@mit.edu>, 
	   Rod Taylor <pg@rbt.ca>, Qingqing Zhou <zhouqq@cs.toronto.edu>, 
	  pgsql-hackers@postgresql.org
X-Mailer: ELM [version 2.4ME+ PL121 (25)]
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.12 required=5 tests=[AWL=0.120]
X-Spam-Score: 0.12
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  8888

Simon Riggs wrote:
> On Thu, 2005-12-29 at 11:37 -0500, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Simon Riggs <simon@2ndquadrant.com> writes:
> > > > My view would be that this thread has been complex because everybody has
> > > > expressed a somewhat different requirement, which could be broken down
> > > > as:
> > > > 1. The need for a multi-user-accessible yet temporary table
> > > > 2. Loading data into a table immediately after it is created (i.e. in
> > > > same transaction), including but not limited to a reload from pg_dump
> > > > 3. How to load data quickly into an existing table (COPY)
> > > > 4. How to add/modify data quickly in an existing table (INSERT SELECT,
> > > > UPDATE)
> 
> > > However, you then seem to be arguing for still using the COPY LOCK
> > > syntax, which I think Bruce intended would go away in favor of using
> > > these ALTER commands.  Certainly that's what I'd prefer --- COPY has
> > > got too darn many options already.
> 
> COPY LOCK was Tom's suggestion at the end of a long discussion thread on
> this precise issue. Nobody objected to it at that point; I implemented
> it *exactly* that way because I wanted to very visibly follow the
> consensus of the community, after informed debate.
> http://archives.postgresql.org/pgsql-hackers/2005-06/msg00068.php
> 
> Please re-read the links to previous discussions.
> http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php
> There are points there, not made by me, that still apply and need to be
> considered here, yet have not been.

Yes, I know we agreed to the COPY LOCK, but new features now being
requested, so we have to re-evaluate where we are going with COPY LOCK
to get a more consistent solution.

> Just to restate my current thinking:
> - agree we should have ALTER TABLE ... RELIABILITY DELETE ROWS
> - we should have COPY LOCK rather than 
> ALTER TABLE .... RELIABILITY EXCLUSIVE
> (Though I welcome better wording and syntax in either case; it is the
> behaviour only that I discuss).
> 
> It seems now that we have agreed approaches for (1), (2) and (4). Please
> note that I have listened to the needs of others with regard to
> requirement (1), as espoused by earlier by Hannu and again now by
> Martijn. Some of the points about requirement (3) I made in my previous
> post have not yet been addressed, IMHO.
> 
> My mind is not fixed. AFAICS there are valid points remaining on both
> sides of the discussion about loading data quickly into an existing
> table.
> 
> > I do think it is valid concern about someone use the table between the
> > CREATE and the ALTER TABLE RELIABILITY.  One solution would be to allow
> > the RELIABILITY as part of the CREATE TABLE, another is to tell users to
> > create the table inside a transaction.
> 
> Neither solution works for this use case:
> 
> > > 3. How to load data quickly into an existing table (COPY)
> 
> This is the only use case for which ALTER TABLE ... EXCLUSIVE makes
> sense. That option means that any write lock held upon the table would
> be an EXCLUSIVE table lock, so would never be a performance gain with
> single row INSERT, UPDATE or DELETEs. 

Ah, but people wanted fast INSERT INTO ... SELECT, and that would use
EXCLUSIVE too.  What about a massive UPDATE?  Perhaps that could use
EXCLUSIVE?  We don't want to add "LOCK" to every command that might use
EXCLUSIVE.  ALTER is much better for this.

I agree if we thought EXCLUSIVE would only be used for COPY, we could
use LOCK, but I am thinking it will be used for other commands as well.

> Following Andrew's concerns, I'd also note that ALTER TABLE requires a
> much higher level of privilege to operate than does COPY. That sounds
> like it will make things more secure, but all it does is open up the
> administrative rights, since full ownership rights must be obtained
> merely to load data. 

True, but as pointed out by others, I don't see that happening too
often.

> > Having COPY behave differently because it is
> > in a transaction is fine as long as it is user-invisible
> 
> Good
> 
> > I think there is great utility in giving users one API, namely
> > RELIABILITY (or some other keyword), and telling them that is where they
> > control logging.  I realize adding one keyword, LOCK, to an existing
> > command isn't a big deal, but once you decentralize your API enough
> > times, you end up with a terribly complex database system.  It is this
> > design rigidity that helps make PostgreSQL so much easier to use than
> > other database systems.
> 
> I do see the appeal of your suggestion...
> 
> TRUNCATE is a special command to delete quickly. There is no requirement
> to do an ALTER TABLE statement before that command executes.

The TRUNCATE happens during recovery.  There is no user interaction.  It
happens because we can't restore the contents of the table in a
consistent state because no logging was used.  Basically, a table marked
RELIABILITY TRUNCATE would be truncated on a recovery start of the
postmaster.

> Balance would suggest that a special command to load data quickly would
> be reasonably accepted by users.
> 
> 
> 
> 
> Minor points below:
> 
> > > > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a
> > > > pg_dump *will* work with any other version of PostgreSQL, which *would
> > > > not* be the case if we added ALTER TABLE ... RELIABILITY statements into
> > > > it.
> > > 
> > > Wrong --- the good thing about ALTER TABLE is that an old version of
> > > Postgres would simply reject it and keep going.  Therefore we could get
> > > the speedup in dumps without losing compatibility, which is not true
> > > of COPY LOCK.
> 
> That was pointing out one of Bruce's objections was not relevant because
> it assumed COPY LOCK was required to make pg_restore go faster; that was
> not the case - so there is no valid objection either way now.

I don't consider the single-transaction to be a no-cost solution.  You
are adding flags to commands, and you are using a dump layout for
performance where the purpose for the layout is not clear.  The ALTER is
clear to the user, and it allows nologging operations to happen after
the table is created.

In fact, for use in pg_dump, I think DROP is the proper operation for
loading, not your transaction wrapping solution.  We already agree we
need DROP (or TRUNCATE), so why not use that rather than the transaction
wrap idea?

> > > BTW, this is a perfect example of the use-case for not abandoning a
> > > dump-file load simply because one command fails.  (We have relied on
> > > this sort of reasoning many times before, too, for example by using
> > > "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.)
> > > I don't think that "wrap the whole load into begin/end" is really a very
> > > workable answer, because there are far too many scenarios where you
> > > can't do that.  Another one where it doesn't help is a data-only dump.
> 
> Which is why --single-transaction is not the default, per the earlier
> discussion on that point (on -patches).

Right, but why not use DROP/TRUNCATE?  That works for old dumps too, and
has no downsides, meaning it can be always on.

> > Yep, Tom is echoing my reaction.  There is a temptation to add things up
> > onto existing commands, e.g. LOCK, and while it works, it makes for some
> > very complex user API's.  Having COPY behave differently because it is
> > in a transaction is fine as long as it is user-invisible, but once you
> > require users to do that to get the speedup, it isn't user-invisible
> > anymore.
> > 
> > (I can see it now, "Why is pg_dump putting things in transactions?",
> > "Because it prevents it from being logged."  "Oh, should I be doing that
> > in my code?"  "Perhaps, if you want ..."  You can see where that
> > discussion is going.  Having them see "ATER TABLE ... RELIBILITY
> > TRUNCATE" is very clear, and very clear on how it can be used in user
> > code.)
> 
> The above case is not an argument against COPY LOCK. Exactly what you
> say above would still occur even when we have ALTER TABLE ...
> RELIABILITY statement, since COPY LOCK and
> COPY-optimized-within-same-transaction are different things.

See my posting above that we might want EXCLUSIVE for other commands,
meaning ALTER makes more sense.

So, to summarize, I think we should add DROP/TRUNCATE, and use that by
default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE
for more than just COPY, we need to add ALTER TABLE EXCLUSIVE.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

From pgsql-hackers-owner+M78065=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 12:40:48 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
Message-ID: <43B570C9.6060406@dunslane.net>
Date: Fri, 30 Dec 2005 12:39:21 -0500
From: Andrew Dunstan <andrew@dunslane.net>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.7.12-1.3.1
X-Accept-Language: en-us, en
To: Tom Lane <tgl@sss.pgh.pa.us>
cc: simon@2ndquadrant.com, pgman@candle.pha.pa.us, kleptog@svana.org, 
	  gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, 
	  pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
References: <1135948152.2862.113.camel@localhost.localdomain>  <56737.68.143.134.146.1135954413.squirrel@www.dunslane.net>  <11876.1135954626@sss.pgh.pa.us>
In-Reply-To: <11876.1135954626@sss.pgh.pa.us>
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.041 required=5 tests=[AWL=0.041]
X-Spam-Score: 0.041
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  1815



Tom Lane wrote:

>"Andrew Dunstan" <andrew@dunslane.net> writes:
>  
>
>>Simon Riggs said:
>>    
>>
>>>Following Andrew's concerns, I'd also note that ALTER TABLE requires a
>>>much higher level of privilege to operate than does COPY. That sounds
>>>like it will make things more secure, but all it does is open up the
>>>administrative rights, since full ownership rights must be obtained
>>>merely to load data.
>>>      
>>>
>
>  
>
>>My concern is more about making plain that this is for special operations,
>>not normal operations. Or maybe I have misunderstood the purpose.
>>    
>>
>
>Rephrase that as "full ownership rights must be obtained to load data in
>a way that requires dropping any existing indexes and locking out other
>users of the table".  I don't think the use-case for this will be very
>large for non-owners, or indeed even for owners except during initial
>table creation; and so I don't think the above argument is strong.
>
>			
>  
>

Those restrictions aren't true of Bruce's proposed drop and
delete/truncate recovery modes, are they?

People do crazy things in pursuit of performance. Illustration: a few
months ago I was instrumenting an app (based on MySQL/ISAM) and I
noticed that under load it simply didn't update the inventory properly -
of 1000 orders placed within a few seconds it might reduce inventory by
3 or 4. I reported this and they shrugged their shoulders and said
"well, we'd have to lock the table and that would slow everything down
...".

I just want to be sure we aren't providing a footgun. "Oh, just set
recovery mode to delete. It won't make any difference unless you crash
and you'll run faster."

cheers

andrew




---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

From pgsql-hackers-owner+M78066=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 12:58:52 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
From: Bruce Momjian <pgman@candle.pha.pa.us>
Message-ID: <200512301758.jBUHwFv03107@candle.pha.pa.us>
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
In-Reply-To: <43B570C9.6060406@dunslane.net>
To: Andrew Dunstan <andrew@dunslane.net>
Date: Fri, 30 Dec 2005 12:58:15 -0500 (EST)
cc: Tom Lane <tgl@sss.pgh.pa.us>, simon@2ndquadrant.com, kleptog@svana.org, 
	  gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, 
	  pgsql-hackers@postgresql.org
X-Mailer: ELM [version 2.4ME+ PL121 (25)]
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.12 required=5 tests=[AWL=0.120]
X-Spam-Score: 0.12
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  1996

Andrew Dunstan wrote:
> >>My concern is more about making plain that this is for special operations,
> >>not normal operations. Or maybe I have misunderstood the purpose.
> >>    
> >>
> >
> >Rephrase that as "full ownership rights must be obtained to load data in
> >a way that requires dropping any existing indexes and locking out other
> >users of the table".  I don't think the use-case for this will be very
> >large for non-owners, or indeed even for owners except during initial
> >table creation; and so I don't think the above argument is strong.
> >
> >			
> >  
> >
> 
> Those restrictions aren't true of Bruce's proposed drop and
> delete/truncate recovery modes, are they?

Only the owner could do the ALTER, for sure, but once the owner sets it,
any user with permission to write to the table would have those
characteristics.

> People do crazy things in pursuit of performance. Illustration: a few
> months ago I was instrumenting an app (based on MySQL/ISAM) and I
> noticed that under load it simply didn't update the inventory properly -
> of 1000 orders placed within a few seconds it might reduce inventory by
> 3 or 4. I reported this and they shrugged their shoulders and said
> "well, we'd have to lock the table and that would slow everything down
> ...".
> 
> I just want to be sure we aren't providing a footgun. "Oh, just set
> recovery mode to delete. It won't make any difference unless you crash
> and you'll run faster."

I think we have to trust the object owner in this case.  I don't know of
any super-user-only ALTER commands, but I suppose we could set it up
that way if we wanted.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

From pgsql-hackers-owner+M78070=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 14:29:06 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
From: Simon Riggs <simon@2ndquadrant.com>
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Andrew Dunstan <andrew@dunslane.net>, Tom Lane <tgl@sss.pgh.pa.us>, 
	   Martijn van Oosterhout <kleptog@svana.org>, Greg Stark <gsstark@mit.edu>, 
	   Rod Taylor <pg@rbt.ca>, Qingqing Zhou <zhouqq@cs.toronto.edu>, 
	  pgsql-hackers@postgresql.org
In-Reply-To: <200512301649.jBUGnxn21488@candle.pha.pa.us>
References: <200512301649.jBUGnxn21488@candle.pha.pa.us>
Date: Fri, 30 Dec 2005 19:28:41 +0000
Message-ID: <1135970921.5052.68.camel@localhost.localdomain>
X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) 
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.034 required=5 tests=[AWL=0.034]
X-Spam-Score: 0.034
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  3112

On Fri, 2005-12-30 at 11:49 -0500, Bruce Momjian wrote:

> Yes, I know we agreed to the COPY LOCK, but new features now being
> requested, so we have to re-evaluate where we are going with COPY LOCK
> to get a more consistent solution.

Thank you. 

> Ah, but people wanted fast INSERT INTO ... SELECT, and that would use
> EXCLUSIVE too.  What about a massive UPDATE?  Perhaps that could use
> EXCLUSIVE?  We don't want to add "LOCK" to every command that might use
> EXCLUSIVE.  ALTER is much better for this.

> I agree if we thought EXCLUSIVE would only be used for COPY, we could
> use LOCK, but I am thinking it will be used for other commands as well.

Agreed, I will look to implement this.

Could the internals of my recent patch be reviewed? Changing the user
interface is less of a problem than changing the internals, which is
where the hard work takes place. I do not want to extend this work
further only to have that part rejected later. 

The implications of EXCLUSIVE are:
- there will be a check on each and every I, U, D to check the state of
the relation
- *every* operation that attempts a write lock will attempt to acquire
an EXCLUSIVE full table lock instead
- following successful completion of *each* DML statement, the relation
will be heap_sync'd involving a full scan of the buffer cache

Can I clarify the wording of the syntax? Is EXCLUSIVE the right word?
How about FASTLOAD or BULKLOAD? Those words seem less likely to be
misused in the future - i.e. we are invoking a special mode, rather than
invoking a special "go faster" option.

> I don't consider the single-transaction to be a no-cost solution.  You
> are adding flags to commands, and you are using a dump layout for
> performance where the purpose for the layout is not clear.  The ALTER is
> clear to the user, and it allows nologging operations to happen after
> the table is created.
> 
> In fact, for use in pg_dump, I think DROP is the proper operation for
> loading, not your transaction wrapping solution.  We already agree we
> need DROP (or TRUNCATE), so why not use that rather than the transaction
> wrap idea?

This was discussed on-list by 2 core team members, a committer and
myself, but I see no requirements change here. You even accepted the
invisible COPY optimization in your last post - why unpick that now?
Please forgive my tone, but I am lost for reasonable yet expressive
words. 

The --single-transaction mode would apply even if the dump was created
using an earlier version of pg_dump. pg_dump has *not* been altered at
all. (And I would again add that the idea was not my own)

> So, to summarize, I think we should add DROP/TRUNCATE, and use that by
> default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE
> for more than just COPY, we need to add ALTER TABLE EXCLUSIVE.

Would you mind stating again what you mean, just so I can understand
this? Your summary isn't enough.

Best Regards, Simon Riggs


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

From pgsql-hackers-owner+M78072=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 16:15:30 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
From: Bruce Momjian <pgman@candle.pha.pa.us>
Message-ID: <200512302114.jBULEno02301@candle.pha.pa.us>
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
In-Reply-To: <1135970921.5052.68.camel@localhost.localdomain>
To: Simon Riggs <simon@2ndquadrant.com>
Date: Fri, 30 Dec 2005 16:14:49 -0500 (EST)
cc: Andrew Dunstan <andrew@dunslane.net>, Tom Lane <tgl@sss.pgh.pa.us>, 
	   Martijn van Oosterhout <kleptog@svana.org>, Greg Stark <gsstark@mit.edu>, 
	   Rod Taylor <pg@rbt.ca>, Qingqing Zhou <zhouqq@cs.toronto.edu>, 
	  pgsql-hackers@postgresql.org
X-Mailer: ELM [version 2.4ME+ PL121 (25)]
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.12 required=5 tests=[AWL=0.120]
X-Spam-Score: 0.12
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  7285

Simon Riggs wrote:
> On Fri, 2005-12-30 at 11:49 -0500, Bruce Momjian wrote:
> 
> > Yes, I know we agreed to the COPY LOCK, but new features now being
> > requested, so we have to re-evaluate where we are going with COPY LOCK
> > to get a more consistent solution.
> 
> Thank you. 

Good.  I think we can be happy that COPY LOCK didn't get into a release,
so we don't have to support it forever.  When we are adding features, we
have to consider not only the current release, but future releases and
what people will ask for in the future so the syntax can be expanded
without breaking previous usage.

> > Ah, but people wanted fast INSERT INTO ... SELECT, and that would use
> > EXCLUSIVE too.  What about a massive UPDATE?  Perhaps that could use
> > EXCLUSIVE?  We don't want to add "LOCK" to every command that might use
> > EXCLUSIVE.  ALTER is much better for this.
> 
> > I agree if we thought EXCLUSIVE would only be used for COPY, we could
> > use LOCK, but I am thinking it will be used for other commands as well.
> 
> Agreed, I will look to implement this.
> 
> Could the internals of my recent patch be reviewed? Changing the user
> interface is less of a problem than changing the internals, which is
> where the hard work takes place. I do not want to extend this work
> further only to have that part rejected later. 

OK, I will look it over this week or next.

> The implications of EXCLUSIVE are:
> - there will be a check on each and every I, U, D to check the state of
> the relation
> - *every* operation that attempts a write lock will attempt to acquire
> an EXCLUSIVE full table lock instead
> - following successful completion of *each* DML statement, the relation
> will be heap_sync'd involving a full scan of the buffer cache

Yes, I think that is it.  What we can do is implement EXCLUSIVE to
affect only COPY at this point, and document that, and later add other
commands.

> Can I clarify the wording of the syntax? Is EXCLUSIVE the right word?
> How about FASTLOAD or BULKLOAD? Those words seem less likely to be
> misused in the future - i.e. we are invoking a special mode, rather than
> invoking a special "go faster" option.

The problem with the FASTLOAD/BULKLOAD words is that EXCLUSIVE mode is
probably not the best for loading.  I would think TRUNCATE would be a
better option.

In fact, in loading a table, I think both EXCLUSIVE and TRUNCATE would be
the same, mostly.  You would create the table, set its RELIABILITY to
TRUNCATE, COPY into the table, then set the RELIABILITY to SHARE or
DEFAULT.  The second ALTER has to sync all the dirty data blocks, which
the same thing EXCLUSIVE does at the conclusion of COPY.

So, we need a name for EXCLUSIVE mode that suggests how it is different
from TRUNCATE, and in this case, the difference is that EXCLUSIVE
preserves the previous contents of the table on recovery, while TRUNCATE
does not.  Do you want to call the mode PRESERVE, or EXCLUSIVE WRITER?
Anyway, the keywords are easy to modify, even after the patch is
submitted.  FYI, I usually go through keywords.c looking for a keyword
we already use.

> > I don't consider the single-transaction to be a no-cost solution.  You
> > are adding flags to commands, and you are using a dump layout for
> > performance where the purpose for the layout is not clear.  The ALTER is
> > clear to the user, and it allows nologging operations to happen after
> > the table is created.
> > 
> > In fact, for use in pg_dump, I think DROP is the proper operation for
> > loading, not your transaction wrapping solution.  We already agree we
> > need DROP (or TRUNCATE), so why not use that rather than the transaction
> > wrap idea?
> 
> This was discussed on-list by 2 core team members, a committer and
> myself, but I see no requirements change here. You even accepted the
> invisible COPY optimization in your last post - why unpick that now?
> Please forgive my tone, but I am lost for reasonable yet expressive
> words. 

Do you think you are the only one who has rewritten a patch multiple
times?  We all have.  The goal is to get the functionality into the
system in the most seamless way possible.  Considering the number of
people who use PostgreSQL, if it takes use 10 tries, it is worth it
considering the thousands of people who will use it.   Would you have us
include a sub-optimal patch and have thousands of people adjust to its
non-optimal functionality?  I am sure you would not.  Perhaps a company
would say, "Oh, just ship it", but we don't.

> The --single-transaction mode would apply even if the dump was created
> using an earlier version of pg_dump. pg_dump has *not* been altered at
> all. (And I would again add that the idea was not my own)

I assume you mean this:

	http://archives.postgresql.org/pgsql-patches/2005-12/msg00257.php

I guess with the ALTER commands I don't see much value in the
--single-transaction flag.  I am sure others suggested it, but would
they suggest it now given our current direction.  The fact that the
patch was submitted does not give it any more weight --- the question is
does this feature make sense for 8.2.  The goal is not to cram as many
optimizations into PostgreSQL as possible, the goal is to present a
consistent usable system to users.

> > So, to summarize, I think we should add DROP/TRUNCATE, and use that by
> > default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE
> > for more than just COPY, we need to add ALTER TABLE EXCLUSIVE.
> 
> Would you mind stating again what you mean, just so I can understand
> this? Your summary isn't enough.

New ALTER TABLE mode, perhaps call it PERSISTENCE:

	ALTER TABLE tab PERSISTENCE DROP ON RECOVERY
	ALTER TABLE tab PERSISTENCE TRUNCATE ON RECOVERY

These would drop or truncate all tables with this flag on a non-clean
start of the postmaster, and write something in the server logs. 
However, I don't know that we have the code in place to DROP/TRUNCATE in
recovery mode, and it would affect all databases, so it could be quite
complex to implement.  In this mode, no WAL logs would be written for
table modifications, though DDL commands would have to be logged.

	ALTER TABLE tab PERSISTENCE PRESERVE (or STABLE?)

Table contents are preserved across recoveries, but data modifications
can happen only one at a time.  I don't think we have a lock mode that
does this, so I am worried a new lock mode will have to be created.  A
simplified solution at this stage would be to take an exclusive lock on
the table, but really we just need a single-writer table lock, which I
don't think we have. initially this can implemented to only affect COPY
but later can be done for other commands. 

	ALTER TABLE tab PERSISTENCE DEFAULT

This would be our current default mode, which is full concurrency and
persistence.

It took me over an hour to write this, but I feel the time is worth it
because of the number of users who use our software.
 
-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

From pgsql-hackers-owner+M78076=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 17:37:00 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Simon Riggs <simon@2ndquadrant.com>, Andrew Dunstan <andrew@dunslane.net>, 
	   Tom Lane <tgl@sss.pgh.pa.us>, Martijn van Oosterhout <kleptog@svana.org>, 
	   Greg Stark <gsstark@mit.edu>, Rod Taylor <pg@rbt.ca>, 
	   Qingqing Zhou <zhouqq@cs.toronto.edu>, pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
References: <200512302114.jBULEno02301@candle.pha.pa.us>
In-Reply-To: <200512302114.jBULEno02301@candle.pha.pa.us>
From: Greg Stark <gsstark@mit.edu>
Organization: The Emacs Conspiracy; member since 1992
Date: 30 Dec 2005 17:36:24 -0500
Message-ID: <87mzii8b6f.fsf@stark.xeocode.com>
Lines: 28
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.4
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.113 required=5 tests=[AWL=0.113]
X-Spam-Score: 0.113
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  1424


As far as EXCLUSIVE or COPY LOCK goes, I think this would be useful
functionality but perhaps there doesn't have to be any proprietary user
interface to it at all. Why not just check if the conditions are already
present to allow the optimization and if so go ahead.

That is, if the current transaction already has an exclusive lock on the table
and there are no indexes (and PITR isn't active) then Postgres could go ahead
and use the same WAL skipping logic as the other operations that already so
so. This would work for inserts whether coming from COPY or plain SQL INSERTs.

The nice thing about this is that the user's SQL wouldn't need any proprietary
extensions at all. Just tell people to do

BEGIN;
LOCK TABLE foo;
COPY foo from ...
COMMIT;

There could be a COPY LOCK option to obtain a lock, but it would be purely for
user convenience so they don't have to bother with BEGIN and COMMIt.

The only downside is a check to see if an exclusive table lock is present on
every copy and insert. That might be significant but perhaps there are ways to
finess that. If not perhaps only doing it on COPY would be a good compromise.

-- 
greg


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly

From pgsql-hackers-owner+M78077=pgman=candle.pha.pa.us@postgresql.org Fri Dec 30 17:47:18 2005
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
From: Bruce Momjian <pgman@candle.pha.pa.us>
Message-ID: <200512302246.jBUMkjF25196@candle.pha.pa.us>
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
In-Reply-To: <87mzii8b6f.fsf@stark.xeocode.com>
To: Greg Stark <gsstark@mit.edu>
Date: Fri, 30 Dec 2005 17:46:45 -0500 (EST)
cc: Simon Riggs <simon@2ndquadrant.com>, Andrew Dunstan <andrew@dunslane.net>, 
	   Tom Lane <tgl@sss.pgh.pa.us>, Martijn van Oosterhout <kleptog@svana.org>, 
	   Rod Taylor <pg@rbt.ca>, Qingqing Zhou <zhouqq@cs.toronto.edu>, 
	  pgsql-hackers@postgresql.org
X-Mailer: ELM [version 2.4ME+ PL121 (25)]
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.12 required=5 tests=[AWL=0.120]
X-Spam-Score: 0.12
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  2135

Greg Stark wrote:
> 
> As far as EXCLUSIVE or COPY LOCK goes, I think this would be useful
> functionality but perhaps there doesn't have to be any proprietary user
> interface to it at all. Why not just check if the conditions are already
> present to allow the optimization and if so go ahead.
> 
> That is, if the current transaction already has an exclusive lock on the table
> and there are no indexes (and PITR isn't active) then Postgres could go ahead
> and use the same WAL skipping logic as the other operations that already so
> so. This would work for inserts whether coming from COPY or plain SQL INSERTs.
> 
> The nice thing about this is that the user's SQL wouldn't need any proprietary
> extensions at all. Just tell people to do
> 
> BEGIN;
> LOCK TABLE foo;
> COPY foo from ...
> COMMIT;
> 
> There could be a COPY LOCK option to obtain a lock, but it would be purely for
> user convenience so they don't have to bother with BEGIN and COMMIt.
> 
> The only downside is a check to see if an exclusive table lock is present on
> every copy and insert. That might be significant but perhaps there are ways to
> finess that. If not perhaps only doing it on COPY would be a good compromise.

Well, again, if we wanted to use EXCLUSIVE only for COPY, this might
make sense.  However, also consider that the idea for EXCLUSIVE was that
users could continue read-only queries on the table while it is being
loaded (like COPY allows now), and that in EXCLUSIVE mode, we are only
going to write into new pages.  

If someone has an exclusive lock on the table and does a COPY or SELECT
INTO do we want to assume we are only going to write into new pages, and
do we want to force an exclusive lock rather than a single-writer lock? 
I don't think so.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

From mpaesold@gmx.at Sat Dec 31 06:59:51 2005
Date: Sat, 31 Dec 2005 12:59:44 +0100 (MET)
From: Michael Paesold <mpaesold@gmx.at>
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: simon@2ndquadrant.com, andrew@dunslane.net, tgl@sss.pgh.pa.us, 
	  kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, 
	  pgsql-hackers@postgresql.org
References: <200512302114.jBULEno02301@candle.pha.pa.us>
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
X-Priority: 3 (Normal)
X-Authenticated: #1946847
Message-ID: <14969.1136030384@www6.gmx.net>
X-Mailer: WWW-Mail 1.6 (Global Message Exchange)
X-Flags: 0001
Content-Length:  1305

Bruce Momjian wrote:

> > The --single-transaction mode would apply even if the dump was created
> > using an earlier version of pg_dump. pg_dump has *not* been altered at
> > all. (And I would again add that the idea was not my own)
> 
> I assume you mean this:
> 
> 	http://archives.postgresql.org/pgsql-patches/2005-12/msg00257.php
> 
> I guess with the ALTER commands I don't see much value in the
> --single-transaction flag.  I am sure others suggested it, but would
> they suggest it now given our current direction.

I just want to add that --single-transaction has a value of it's own. There
were times when I wanted to restore parts of a dump all-or-nothing. 

This is possible with PostgreSQL, unlike many other DBM systems, because
people like Tom Lane have invested in ensuring that all DDL is working
without implicitly committing an enclosing transaction.

Using pg_restore directly into a database, it is not possible to get a
single transaction right now. One has to restore to a file and manually
added BEGIN/COMMIT. Just for that I think --single-transaction is a great
addition and a missing feature.

I think more people have a use-case for that.

Best Regards,
Michael Paesold

-- 
Telefonieren Sie schon oder sparen Sie noch?
NEU: GMX Phone_Flat http://www.gmx.net/de/go/telefonie

From pgsql-hackers-owner+M78213=pgman=candle.pha.pa.us@postgresql.org Tue Jan  3 12:08:43 2006
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
From: Bruce Momjian <pgman@candle.pha.pa.us>
Message-ID: <200601031708.k03H85j27170@candle.pha.pa.us>
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
In-Reply-To: <17173.1136306881@sss.pgh.pa.us>
To: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue, 3 Jan 2006 12:08:05 -0500 (EST)
cc: Jim C. Nasby <jnasby@pervasive.com>, 
	   Andrew Dunstan <andrew@dunslane.net>, simon@2ndquadrant.com, 
	  kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, 
	  pgsql-hackers@postgresql.org
X-Mailer: ELM [version 2.4ME+ PL121 (25)]
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.121 required=5 tests=[AWL=0.121]
X-Spam-Score: 0.121
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  1125

Tom Lane wrote:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > On Tue, Jan 03, 2006 at 11:26:51AM -0500, Tom Lane wrote:
> >> Such an ALTER would certainly require exclusive lock on the table,
> >> so I'm not sure that I see much use-case for doing it like that.
> >> You'd want to do the ALTER and commit so as not to lock other people
> >> out of the table entirely while doing the bulk data-pushing.
> 
> > Maybe this just isn't clear, but would EXCLUSIVE block writes from all
> > other sessions then?
> 
> I don't think it should (which implies that EXCLUSIVE is a bad name).

Agreed, EXCLUSIVE was used to mean an _exclusive_ writer.  The new words
I proposed were PRESERVE or STABLE.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

From tgl@sss.pgh.pa.us Tue Jan  3 12:37:34 2006
To: Stephen Frost <sfrost@snowman.net>
cc: Jim C. Nasby <jnasby@pervasive.com>, 
	   Bruce Momjian <pgman@candle.pha.pa.us>, 
	   Andrew Dunstan <andrew@dunslane.net>, kleptog@svana.org, 
	  simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, 
	  pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and 
In-Reply-To: <20060103165359.GP6026@ns.snowman.net> 
References: <200512291605.jBTG5gi00396@candle.pha.pa.us>  <7966.1135873468@sss.pgh.pa.us>  <20060103154521.GC82560@pervasive.com>  <20060103162137.GO6026@ns.snowman.net>  <16856.1136305742@sss.pgh.pa.us>  <20060103165359.GP6026@ns.snowman.net>
Comments: In-reply-to Stephen Frost <sfrost@snowman.net>
	message dated "Tue, 03 Jan 2006 11:54:01 -0500"
Date: Tue, 03 Jan 2006 12:37:32 -0500
Message-ID: <17841.1136309852@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Content-Length:   976

Stephen Frost <sfrost@snowman.net> writes:
> The problem is that you might want to grant 'truncate' to people who
> *aren't* particularly trusted.  For truncate, at least I have a
> real-world use-case for it.

I don't find this use-case particularly convincing.  If the users are
allowed to delete all data in a given table, then that table must be
dedicated to them anyway; so it's not that easy to see why you can't
risk giving them ownership rights on it.  The worst they can do is
screw up their own data, no?

In any case, I don't see what's so wrong with the model of using
SECURITY DEFINER interface functions when you want a security
restriction that's finer-grain than the system provides.  I really
*don't* want to see us trying to, say, categorize every variety of
ALTER TABLE as a separately grantable privilege.  I could live with
something like a catchall "ADMIN" privilege ... except it's not
clear how that would differ from ownership.

			regards, tom lane

From pgsql-hackers-owner+M78221=pgman=candle.pha.pa.us@postgresql.org Tue Jan  3 13:30:34 2006
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
Date: Tue, 3 Jan 2006 13:30:56 -0500
From: Stephen Frost <sfrost@snowman.net>
To: Tom Lane <tgl@sss.pgh.pa.us>
cc: Jim C. Nasby <jnasby@pervasive.com>, 
	   Bruce Momjian <pgman@candle.pha.pa.us>, 
	   Andrew Dunstan <andrew@dunslane.net>, kleptog@svana.org, 
	  simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, 
	  pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
Message-ID: <20060103183056.GR6026@ns.snowman.net>
Mail-Followup-To: Tom Lane <tgl@sss.pgh.pa.us>,
	"Jim C. Nasby" <jnasby@pervasive.com>,
	Bruce Momjian <pgman@candle.pha.pa.us>,
	Andrew Dunstan <andrew@dunslane.net>, kleptog@svana.org,
	simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca,
	zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org
References: <200512291605.jBTG5gi00396@candle.pha.pa.us>  <7966.1135873468@sss.pgh.pa.us>  <20060103154521.GC82560@pervasive.com>  <20060103162137.GO6026@ns.snowman.net>  <16856.1136305742@sss.pgh.pa.us>  <20060103165359.GP6026@ns.snowman.net>  <17841.1136309852@sss.pgh.pa.us>
Content-Disposition: inline
In-Reply-To: <17841.1136309852@sss.pgh.pa.us>
X-Editor: Vim http://www.vim.org/
X-Info: http://www.snowman.net
X-Operating-System: Linux/2.4.24ns.3.0 (i686)
X-Uptime: 12:39:16 up 206 days,  9:50, 11 users,  load average: 0.02, 0.05, 0.05
User-Agent: Mutt/1.5.9i
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.105 required=5 tests=[AWL=0.105]
X-Spam-Score: 0.105
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  2666

-- Start of PGP signed section.
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I don't find this use-case particularly convincing.  If the users are
> allowed to delete all data in a given table, then that table must be
> dedicated to them anyway; so it's not that easy to see why you can't
> risk giving them ownership rights on it.  The worst they can do is
> screw up their own data, no?

Being able to delete all data in a given table in no way implies
ownership rights.  The tables are part of a specification which the
users are being asked to respond to.  Being able to change the table
types or remove the constraints put on the tables would allow the 
users to upload garbage which would then affect downstream processing.

We can't guarentee this won't happen anyway but we try to confine the
things they can mess up to a reasonable set which we can check for (and
do, through a rather involved error checking system).  There are *alot*
of things built on top of the table structures and having them change
would basically break the whole system (without the appropriate changes
being made to the other parts of the system).

> In any case, I don't see what's so wrong with the model of using
> SECURITY DEFINER interface functions when you want a security
> restriction that's finer-grain than the system provides.  I really
> *don't* want to see us trying to, say, categorize every variety of
> ALTER TABLE as a separately grantable privilege.  I could live with
> something like a catchall "ADMIN" privilege ... except it's not
> clear how that would differ from ownership.

I don't think anyone's asked for 'ALTER TABLE' privileges to be
seperately grantable.  It seems to me that the privileges which *need*
to be grantable are ones associated with DML statements.  I would 
classify TRUNCATE, VACUUM and ANALYZE as DML statements (along with 
select, insert, update, and delete).  They're PostgreSQL-specific DML 
statements but they still fall into that category.  I don't think 
it's a coincidence that the SQL-defined DML statements are all, 
individually, grantable.

That doesn't mean I think we should get rid of RULE, REFERENCES or
TRIGGER, though honestly I've very rarely needed to grant any of them 
(I don't think I've ever granted RULE or TRIGGER...).  References is
DDL-oriented, but for *other* tables; RULE and TRIGGER are DDL and I
can't really justify why someone other than the owner would need them
but I'm guessing someone's using them.  I don't think their existance
should imply that if we ever change the grants again we have to include
all types of 'ALTER TABLE', etc, though.

	Thanks,

		Stephen
-- End of PGP section, PGP failed!

From sfrost@snowman.net Tue Jan  3 13:30:13 2006
Date: Tue, 3 Jan 2006 13:30:56 -0500
From: Stephen Frost <sfrost@snowman.net>
To: Tom Lane <tgl@sss.pgh.pa.us>
cc: Jim C. Nasby <jnasby@pervasive.com>, 
	   Bruce Momjian <pgman@candle.pha.pa.us>, 
	   Andrew Dunstan <andrew@dunslane.net>, kleptog@svana.org, 
	  simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, 
	  pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
Message-ID: <20060103183056.GR6026@ns.snowman.net>
Mail-Followup-To: Tom Lane <tgl@sss.pgh.pa.us>,
	"Jim C. Nasby" <jnasby@pervasive.com>,
	Bruce Momjian <pgman@candle.pha.pa.us>,
	Andrew Dunstan <andrew@dunslane.net>, kleptog@svana.org,
	simon@2ndquadrant.com, gsstark@mit.edu, pg@rbt.ca,
	zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org
References: <200512291605.jBTG5gi00396@candle.pha.pa.us>  <7966.1135873468@sss.pgh.pa.us>  <20060103154521.GC82560@pervasive.com>  <20060103162137.GO6026@ns.snowman.net>  <16856.1136305742@sss.pgh.pa.us>  <20060103165359.GP6026@ns.snowman.net>  <17841.1136309852@sss.pgh.pa.us>
Content-Disposition: inline
In-Reply-To: <17841.1136309852@sss.pgh.pa.us>
X-Editor: Vim http://www.vim.org/
X-Info: http://www.snowman.net
X-Operating-System: Linux/2.4.24ns.3.0 (i686)
X-Uptime: 12:39:16 up 206 days,  9:50, 11 users,  load average: 0.02, 0.05, 0.05
User-Agent: Mutt/1.5.9i
Content-Length:  2666

-- Start of PGP signed section.
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I don't find this use-case particularly convincing.  If the users are
> allowed to delete all data in a given table, then that table must be
> dedicated to them anyway; so it's not that easy to see why you can't
> risk giving them ownership rights on it.  The worst they can do is
> screw up their own data, no?

Being able to delete all data in a given table in no way implies
ownership rights.  The tables are part of a specification which the
users are being asked to respond to.  Being able to change the table
types or remove the constraints put on the tables would allow the 
users to upload garbage which would then affect downstream processing.

We can't guarentee this won't happen anyway but we try to confine the
things they can mess up to a reasonable set which we can check for (and
do, through a rather involved error checking system).  There are *alot*
of things built on top of the table structures and having them change
would basically break the whole system (without the appropriate changes
being made to the other parts of the system).

> In any case, I don't see what's so wrong with the model of using
> SECURITY DEFINER interface functions when you want a security
> restriction that's finer-grain than the system provides.  I really
> *don't* want to see us trying to, say, categorize every variety of
> ALTER TABLE as a separately grantable privilege.  I could live with
> something like a catchall "ADMIN" privilege ... except it's not
> clear how that would differ from ownership.

I don't think anyone's asked for 'ALTER TABLE' privileges to be
seperately grantable.  It seems to me that the privileges which *need*
to be grantable are ones associated with DML statements.  I would 
classify TRUNCATE, VACUUM and ANALYZE as DML statements (along with 
select, insert, update, and delete).  They're PostgreSQL-specific DML 
statements but they still fall into that category.  I don't think 
it's a coincidence that the SQL-defined DML statements are all, 
individually, grantable.

That doesn't mean I think we should get rid of RULE, REFERENCES or
TRIGGER, though honestly I've very rarely needed to grant any of them 
(I don't think I've ever granted RULE or TRIGGER...).  References is
DDL-oriented, but for *other* tables; RULE and TRIGGER are DDL and I
can't really justify why someone other than the owner would need them
but I'm guessing someone's using them.  I don't think their existance
should imply that if we ever change the grants again we have to include
all types of 'ALTER TABLE', etc, though.

	Thanks,

		Stephen
-- End of PGP section, PGP failed!

From pgsql-hackers-owner+M78233=pgman=candle.pha.pa.us@postgresql.org Tue Jan  3 17:39:06 2006
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
From: Bruce Momjian <pgman@candle.pha.pa.us>
Message-ID: <200601032238.k03McP804163@candle.pha.pa.us>
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
In-Reply-To: <20060103212750.GT82560@pervasive.com>
To: Jim C. Nasby <jnasby@pervasive.com>
Date: Tue, 3 Jan 2006 17:38:25 -0500 (EST)
cc: Tom Lane <tgl@sss.pgh.pa.us>, Andrew Dunstan <andrew@dunslane.net>, 
	  simon@2ndquadrant.com, kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, 
	  zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org
X-Mailer: ELM [version 2.4ME+ PL121 (25)]
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.121 required=5 tests=[AWL=0.121]
X-Spam-Score: 0.121
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  1714

Jim C. Nasby wrote:
> > We would be creating a new lock type for this.
> 
> Sorry if I've just missed this in the thread, but what would  the new
> lock type do? My impression is that as it stands you can either do:
> 
> BEGIN;
> ALTER TABLE EXCLUSIVE;
> ...
> ALTER TABLE SHARE; --fsync
> COMMIT;
> 
> Which would block all other access to the table as soon as the first
> ALTER TABLE happens. Or you can:
> 
> ALTER TABLE EXCLUSIVE;
> ...
> ALTER TABLE SHARE;
> 
> Which means that between the two ALTER TABLES every backend that does
> DML on that table will not have that DML logged, but because there's no
> exclusive lock that DML would be allowed to occur.

Right, the DML will be single-threaded and fsync of all dirty pages will
happen before commit of each transaction.

> BTW, there might be some usecase for the second scenario, in which case
> it would probably be better to tell the user to aquire a table-lock on
> their own rather than do it automatically as part of the update...

> > Basically meaning your idea of update while EXCLUSIVE/PRESERVE/STABLE is
> > happening is never going to be implemented because it is just too hard
> > to do, and too prone to error.
> 
> What I figured. Never hurts to ask though. :)

Actually, it does hurt because it generates discussion volume for no
purpose.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

From pgsql-hackers-owner+M78234=pgman=candle.pha.pa.us@postgresql.org Tue Jan  3 17:54:16 2006
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
From: Simon Riggs <simon@2ndquadrant.com>
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Andrew Dunstan <andrew@dunslane.net>, Tom Lane <tgl@sss.pgh.pa.us>, 
	   Martijn van Oosterhout <kleptog@svana.org>, Greg Stark <gsstark@mit.edu>, 
	   Rod Taylor <pg@rbt.ca>, Qingqing Zhou <zhouqq@cs.toronto.edu>, 
	  pgsql-hackers@postgresql.org
In-Reply-To: <200512302114.jBULEno02301@candle.pha.pa.us>
References: <200512302114.jBULEno02301@candle.pha.pa.us>
Date: Tue, 03 Jan 2006 22:53:53 +0000
Message-ID: <1136328833.5052.223.camel@localhost.localdomain>
X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) 
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.04 required=5 tests=[AWL=0.040]
X-Spam-Score: 0.04
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  5373

On Fri, 2005-12-30 at 16:14 -0500, Bruce Momjian wrote:
> Simon Riggs wrote:
> > The implications of EXCLUSIVE are:
> > - there will be a check on each and every I, U, D to check the state of
> > the relation
> > - *every* operation that attempts a write lock will attempt to acquire
> > an EXCLUSIVE full table lock instead
> > - following successful completion of *each* DML statement, the relation
> > will be heap_sync'd involving a full scan of the buffer cache
> 
> Yes, I think that is it.  What we can do is implement EXCLUSIVE to
> affect only COPY at this point, and document that, and later add other
> commands.
> 
> > Can I clarify the wording of the syntax? Is EXCLUSIVE the right word?
> > How about FASTLOAD or BULKLOAD? Those words seem less likely to be
> > misused in the future - i.e. we are invoking a special mode, rather than
> > invoking a special "go faster" option.
> 
> The problem with the FASTLOAD/BULKLOAD words is that EXCLUSIVE mode is
> probably not the best for loading.  I would think TRUNCATE would be a
> better option.
> 
> In fact, in loading a table, I think both EXCLUSIVE and TRUNCATE would be
> the same, mostly.  You would create the table, set its RELIABILITY to
> TRUNCATE, COPY into the table, then set the RELIABILITY to SHARE or
> DEFAULT.  The second ALTER has to sync all the dirty data blocks, which
> the same thing EXCLUSIVE does at the conclusion of COPY.
> 
> So, we need a name for EXCLUSIVE mode that suggests how it is different
> from TRUNCATE, and in this case, the difference is that EXCLUSIVE
> preserves the previous contents of the table on recovery, while TRUNCATE
> does not.  Do you want to call the mode PRESERVE, or EXCLUSIVE WRITER?
> Anyway, the keywords are easy to modify, even after the patch is
> submitted.  FYI, I usually go through keywords.c looking for a keyword
> we already use.

I'm very happy for suggestions on what these new modes are called.

> > > So, to summarize, I think we should add DROP/TRUNCATE, and use that by
> > > default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE
> > > for more than just COPY, we need to add ALTER TABLE EXCLUSIVE.
> > 
> > Would you mind stating again what you mean, just so I can understand
> > this? Your summary isn't enough.
> 
> New ALTER TABLE mode, perhaps call it PERSISTENCE:
> 
> 	ALTER TABLE tab PERSISTENCE DROP ON RECOVERY
> 	ALTER TABLE tab PERSISTENCE TRUNCATE ON RECOVERY
> 
> These would drop or truncate all tables with this flag on a non-clean
> start of the postmaster, and write something in the server logs. 
> However, I don't know that we have the code in place to DROP/TRUNCATE in
> recovery mode, and it would affect all databases, so it could be quite
> complex to implement.  In this mode, no WAL logs would be written for
> table modifications, though DDL commands would have to be logged.

Right now, this will be a TODO item... it looks like it will take some
thought to implement correctly.

> 	ALTER TABLE tab PERSISTENCE PRESERVE (or STABLE?)
> 
> Table contents are preserved across recoveries, but data modifications
> can happen only one at a time.  I don't think we have a lock mode that
> does this, so I am worried a new lock mode will have to be created.  A
> simplified solution at this stage would be to take an exclusive lock on
> the table, but really we just need a single-writer table lock, which I
> don't think we have. initially this can implemented to only affect COPY
> but later can be done for other commands. 

ExclusiveLock locks out everything apart from readers, no new lock mode
AFAICS. Implementing that is little additional work for COPY.

Tom had a concern about setting this for I, U, D commands via the
executor. Not sure what the details of that are, as yet.

We can use either of the unlogged modes for pg_dump, so I'd suggest its
this one. Everybody happy with this being the new default in pg_dump, or
should it be an option?

> 	ALTER TABLE tab PERSISTENCE DEFAULT
> 
> This would be our current default mode, which is full concurrency and
> persistence.

I'm thinking whether the ALTER TABLE statement might be better with two
bool flags rather than a 3-state char.

flag 1: ENABLE LOGGING | DISABLE LOGGING

flag 2: FULL RECOVERY | TRUNCATE ON RECOVERY

Giving 3 possible sets of options:

-- the default
ALTER TABLE mytable ENABLE LOGGING FULL RECOVERY; (default)

-- EXCLUSIVE mode
ALTER TABLE mytable DISABLE LOGGING FULL RECOVERY;
...which would be used like this
	ALTER TABLE mytable DISABLE LOGGING;
	COPY or other bulk data manipulation SQL
	ALTER TABLE mytable ENABLE LOGGING;
...since FULL RECOVERY is the default.

-- multiuser temp table mode
ALTER TABLE mytable DISABLE LOGGING TRUNCATE ON RECOVERY;
...which would usually be left on all the time

which only uses one new keyword LOGGING and yet all the modes are fairly
explicit as to what they do.

An alternative might be the slightly more verbose:
	ALTER TABLE mytable DISABLE LOGGING FORCE EXCLUSIVE TABLE LOCK;
which would be turned off by
	ALTER TABLE mytable ENABLE LOGGING;

Comments?

Best Regards, Simon Riggs


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly

From simon@2ndquadrant.com Tue Jan  3 18:10:32 2006
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
From: Simon Riggs <simon@2ndquadrant.com>
To: Jim C. Nasby <jnasby@pervasive.com>, 
	   Bruce Momjian <pgman@candle.pha.pa.us>
cc: Tom Lane <tgl@sss.pgh.pa.us>, Andrew Dunstan <andrew@dunslane.net>, 
	  kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, zhouqq@cs.toronto.edu, 
	  pgsql-hackers@postgresql.org
In-Reply-To: <200601032120.k03LKl609990@candle.pha.pa.us>
References: <200601032120.k03LKl609990@candle.pha.pa.us>
Date: Tue, 03 Jan 2006 23:10:16 +0000
Message-ID: <1136329816.5052.239.camel@localhost.localdomain>
X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) 
Content-Length:  2118

On Tue, 2006-01-03 at 16:20 -0500, Bruce Momjian wrote:
> Jim C. Nasby wrote:

> > Idealistically, if EXCLUSIVE/PRESERVE/STABLE does it's thing by only
> > appending new pages, it would be nice if other backends could continue
> > performing updates at the same time, assuming there's free space
> > available elsewhere within the table (and that you'd be able to recover
> > those logged changes regardless of the non-logged operations). But
> > that's a pretty lofty goal...
> 
> "Idealistically", yep.  It would be great if we could put a helmet on
> and the computer would read your mind.  :-)
> 
> Basically meaning your idea of update while EXCLUSIVE/PRESERVE/STABLE is
> happening is never going to be implemented because it is just too hard
> to do, and too prone to error.

The reason for locking the whole table was to ensure that we do not have
a mixture of logged and non-logged writers writing to the same data
blocks, since that could damage blocks unrecoverably in the event of a
crash. (Though perhaps only if full_block_writes is on)

The ALTER TABLE .. EXCLUSIVE/(insert name) mode would mean that *any*
backend who took a write lock on the table, would lock out the whole
table. So this new mode is not restricted to the job/user who ran the
ALTER TABLE command. (I would note that that is how Oracle and Teradata
do this for pre-load utility table locking, but why should we follow
them on that?)

Currently, when we add a new row when the FSM is empty, we check the
last block of the table. That would cause multiple writers to access the
same blocks and so we would be in danger. The only way to avoid that
would be for logged writers (who would use the FSM if it were not empty)
to notify back to the FSM that they have just added a block - and remove
the behaviour to look for the last block.

Anyway, one step at a time. *Maybe* we can do that in the future, but
right now I'd like to add the basic fast write/load functionality.

Also, I think I will do the docs first this time, just so everyone can
read what we're getting ahead of time, to ensure we all agree.

Best Regards, Simon Riggs

From pgsql-hackers-owner+M78236=pgman=candle.pha.pa.us@postgresql.org Tue Jan  3 18:24:20 2006
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
From: Simon Riggs <simon@2ndquadrant.com>
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Jim C. Nasby <jnasby@pervasive.com>, Tom Lane <tgl@sss.pgh.pa.us>, 
	   Andrew Dunstan <andrew@dunslane.net>, kleptog@svana.org, gsstark@mit.edu, 
	  pg@rbt.ca, zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org
In-Reply-To: <200601032238.k03McP804163@candle.pha.pa.us>
References: <200601032238.k03McP804163@candle.pha.pa.us>
Date: Tue, 03 Jan 2006 23:23:54 +0000
Message-ID: <1136330634.5052.247.camel@localhost.localdomain>
X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) 
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.043 required=5 tests=[AWL=0.043]
X-Spam-Score: 0.043
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:   725

On Tue, 2006-01-03 at 17:38 -0500, Bruce Momjian wrote:

> Right, the DML will be single-threaded and fsync of all dirty pages will
> happen before commit of each transaction.

heap_sync() would occur at end of statement, as it does with CTAS. We
could delay until EOT but I'm not sure I see why; in most cases they'd
be the same point anyway.

I'd been toying with the idea of making the freshly added blocks live
only in temp_buffers to avoid the shared_buffers overhead, but that was
starting to sounds too wierd for my liking.

Best Regards, Simon Riggs


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

From simon@2ndquadrant.com Tue Jan  3 18:58:13 2006
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
From: Simon Riggs <simon@2ndquadrant.com>
To: Michael Paesold <mpaesold@gmx.at>
cc: Bruce Momjian <pgman@candle.pha.pa.us>, andrew@dunslane.net, 
	  tgl@sss.pgh.pa.us, kleptog@svana.org, gsstark@mit.edu, pg@rbt.ca, 
	  zhouqq@cs.toronto.edu, pgsql-hackers@postgresql.org
In-Reply-To: <14969.1136030384@www6.gmx.net>
References: <200512302114.jBULEno02301@candle.pha.pa.us>
	  <14969.1136030384@www6.gmx.net>
Date: Tue, 03 Jan 2006 23:58:09 +0000
Message-ID: <1136332689.5052.263.camel@localhost.localdomain>
X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) 
Content-Length:  1493

On Sat, 2005-12-31 at 12:59 +0100, Michael Paesold wrote:
> Bruce Momjian wrote:
> 
> > > The --single-transaction mode would apply even if the dump was created
> > > using an earlier version of pg_dump. pg_dump has *not* been altered at
> > > all. (And I would again add that the idea was not my own)
> > 
> > I assume you mean this:
> > 
> > 	http://archives.postgresql.org/pgsql-patches/2005-12/msg00257.php
> > 
> > I guess with the ALTER commands I don't see much value in the
> > --single-transaction flag.  I am sure others suggested it, but would
> > they suggest it now given our current direction.
> 
> I just want to add that --single-transaction has a value of it's own. There
> were times when I wanted to restore parts of a dump all-or-nothing. 
> 
> This is possible with PostgreSQL, unlike many other DBM systems, because
> people like Tom Lane have invested in ensuring that all DDL is working
> without implicitly committing an enclosing transaction.
> 
> Using pg_restore directly into a database, it is not possible to get a
> single transaction right now. One has to restore to a file and manually
> added BEGIN/COMMIT. Just for that I think --single-transaction is a great
> addition and a missing feature.
> 
> I think more people have a use-case for that.

I did originally separate the --single-transaction patch for this
reason. I think its a valid patch on its own and its wrapped and ready
to go, with some deletions from the doc patch.

Best Regards, Simon Riggs

From pgsql-hackers-owner+M78239=pgman=candle.pha.pa.us@postgresql.org Tue Jan  3 19:12:18 2006
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
From: Simon Riggs <simon@2ndquadrant.com>
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Tom Lane <tgl@sss.pgh.pa.us>, Martijn van Oosterhout <kleptog@svana.org>, 
	   Greg Stark <gsstark@mit.edu>, Rod Taylor <pg@rbt.ca>, 
	   Qingqing Zhou <zhouqq@cs.toronto.edu>, pgsql-hackers@postgresql.org
In-Reply-To: <200512291637.jBTGbdC03848@candle.pha.pa.us>
References: <200512291637.jBTGbdC03848@candle.pha.pa.us>
Date: Wed, 04 Jan 2006 00:11:55 +0000
Message-ID: <1136333515.5052.273.camel@localhost.localdomain>
X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) 
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.045 required=5 tests=[AWL=0.045]
X-Spam-Score: 0.045
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  1200

On Thu, 2005-12-29 at 11:37 -0500, Bruce Momjian wrote:
> Having COPY behave differently because it is
> in a transaction is fine as long as it is user-invisible, but once you
> require users to do that to get the speedup, it isn't user-invisible
> anymore.

Since we're agreed on adding ALTER TABLE rather than COPY LOCK, we have
our explicit mechanism for speedup.

However, it costs a single line of code and very very little execution
time to add in the optimization to COPY to make it bypass WAL when
executed in the same transaction that created the table. Everything else
is already there.

As part of the use_wal test:
+ 	if (resultRelInfo->ri_NumIndices == 0 && 
+         !XLogArchivingActive()            &&
>>         (cstate->rel->rd_createSubid != InvalidSubTransactionId ))
+             use_wal = false;

the value is already retrieved from cache...

Can anyone see a reason *not* to put that change in also? We just don't
advertise it as the "suggested" route to gaining performance, nor would
we rely on it for pg_dump/restore performance. 

Best Regards, Simon Riggs


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

From pgsql-hackers-owner+M78303=pgman=candle.pha.pa.us@postgresql.org Thu Jan  5 12:23:39 2006
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
X-Greylist: from auto-whitelisted by SQLgrey-
From: Bruce Momjian <pgman@candle.pha.pa.us>
Message-ID: <200601051722.k05HMSM02052@candle.pha.pa.us>
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
In-Reply-To: <1136328833.5052.223.camel@localhost.localdomain>
To: Simon Riggs <simon@2ndquadrant.com>
Date: Thu, 5 Jan 2006 12:22:28 -0500 (EST)
cc: Andrew Dunstan <andrew@dunslane.net>, Tom Lane <tgl@sss.pgh.pa.us>, 
	   Martijn van Oosterhout <kleptog@svana.org>, Greg Stark <gsstark@mit.edu>, 
	   Rod Taylor <pg@rbt.ca>, Qingqing Zhou <zhouqq@cs.toronto.edu>, 
	  pgsql-hackers@postgresql.org
X-Mailer: ELM [version 2.4ME+ PL121 (25)]
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Status: No, score=0.12 required=5 tests=[AWL=0.120]
X-Spam-Score: 0.12
X-Spam-Level: 
X-Mailing-List: pgsql-hackers
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
List-Help: <mailto:majordomo@postgresql.org?body=help>
List-Id: <pgsql-hackers.postgresql.org>
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
List-Post: <mailto:pgsql-hackers@postgresql.org>
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Content-Length:  6020

Simon Riggs wrote:
> > So, we need a name for EXCLUSIVE mode that suggests how it is different
> > from TRUNCATE, and in this case, the difference is that EXCLUSIVE
> > preserves the previous contents of the table on recovery, while TRUNCATE
> > does not.  Do you want to call the mode PRESERVE, or EXCLUSIVE WRITER?
> > Anyway, the keywords are easy to modify, even after the patch is
> > submitted.  FYI, I usually go through keywords.c looking for a keyword
> > we already use.
> 
> I'm very happy for suggestions on what these new modes are called.
> 
> > > > So, to summarize, I think we should add DROP/TRUNCATE, and use that by
> > > > default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE
> > > > for more than just COPY, we need to add ALTER TABLE EXCLUSIVE.
> > > 
> > > Would you mind stating again what you mean, just so I can understand
> > > this? Your summary isn't enough.
> > 
> > New ALTER TABLE mode, perhaps call it PERSISTENCE:
> > 
> > 	ALTER TABLE tab PERSISTENCE DROP ON RECOVERY
> > 	ALTER TABLE tab PERSISTENCE TRUNCATE ON RECOVERY
> > 
> > These would drop or truncate all tables with this flag on a non-clean
> > start of the postmaster, and write something in the server logs. 
> > However, I don't know that we have the code in place to DROP/TRUNCATE in
> > recovery mode, and it would affect all databases, so it could be quite
> > complex to implement.  In this mode, no WAL logs would be written for
> > table modifications, though DDL commands would have to be logged.
> 
> Right now, this will be a TODO item... it looks like it will take some
> thought to implement correctly.

OK, I know my suggestions have made it more complicated.

TODO added:

* Allow control over which tables are WAL-logged

  Allow tables to bypass WAL writes and just fsync() dirty pages on
  commit.  To do this, only a single writer can modify the table, and
  writes must happen only on new pages.  Readers can continue accessing
  the table.  This would affect COPY, and perhaps INSERT/UPDATE too.
  Another option is to avoid transaction logging entirely and truncate
  or drop the table on crash recovery.  These should be implemented
  using ALTER TABLE, e.g. ALTER TABLE PERSISTENCE [ DROP | TRUNCATE |
  STABLE | DEFAULT ].  Tables using non-default logging should not use
  referential integrity with default-logging tables, and tables using
  stable logging probably can not have indexes.  [walcontrol]


> > 	ALTER TABLE tab PERSISTENCE PRESERVE (or STABLE?)
> > 
> > Table contents are preserved across recoveries, but data modifications
> > can happen only one at a time.  I don't think we have a lock mode that
> > does this, so I am worried a new lock mode will have to be created.  A
> > simplified solution at this stage would be to take an exclusive lock on
> > the table, but really we just need a single-writer table lock, which I
> > don't think we have. initially this can implemented to only affect COPY
> > but later can be done for other commands. 
> 
> ExclusiveLock locks out everything apart from readers, no new lock mode
> AFAICS. Implementing that is little additional work for COPY.

Nice.

> Tom had a concern about setting this for I, U, D commands via the
> executor. Not sure what the details of that are, as yet.

That is much more complicated than the COPY-only idea, for sure.  I am
thinking we could add the ALTER syntax and just do COPY at this stage,
meaning that I/U/D still do full logging until we get to improving them.
The big benefit is that the user API doesn't need to change when we
improve the code.  In fact I think we could do the TRUNCATE/DROP easily
for I/U/D, but the STABLE option would require work and we don't need to
implement it in the first patch.

> We can use either of the unlogged modes for pg_dump, so I'd suggest its
> this one. Everybody happy with this being the new default in pg_dump, or
> should it be an option?
> 
> > 	ALTER TABLE tab PERSISTENCE DEFAULT
> > 
> > This would be our current default mode, which is full concurrency and
> > persistence.
> 
> I'm thinking whether the ALTER TABLE statement might be better with two
> bool flags rather than a 3-state char.
> 
> flag 1: ENABLE LOGGING | DISABLE LOGGING
> 
> flag 2: FULL RECOVERY | TRUNCATE ON RECOVERY
> 
> Giving 3 possible sets of options:
> 
> -- the default
> ALTER TABLE mytable ENABLE LOGGING FULL RECOVERY; (default)
> 
> -- EXCLUSIVE mode
> ALTER TABLE mytable DISABLE LOGGING FULL RECOVERY;
> ...which would be used like this
> 	ALTER TABLE mytable DISABLE LOGGING;
> 	COPY or other bulk data manipulation SQL
> 	ALTER TABLE mytable ENABLE LOGGING;
> ...since FULL RECOVERY is the default.
> 
> -- multiuser temp table mode
> ALTER TABLE mytable DISABLE LOGGING TRUNCATE ON RECOVERY;
> ...which would usually be left on all the time
> 
> which only uses one new keyword LOGGING and yet all the modes are fairly
> explicit as to what they do.
> 
> An alternative might be the slightly more verbose:
> 	ALTER TABLE mytable DISABLE LOGGING FORCE EXCLUSIVE TABLE LOCK;
> which would be turned off by
> 	ALTER TABLE mytable ENABLE LOGGING;
> 
> Comments?

I had the same idea originally, but avoided it because the logging
really does affect what other options you can use.  For example, if you
want truncate on recovery, you certainly do not want logging, so it
seems the options are not really independent.  In fact if someone asks
for truncate on recovery, do we automatically turn off logging for them,
or throw an error, or a warning.  It just seemed too error-prone and
confusing, though perhaps more logical.  Of course, if others like the
above, we can do it.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

From simon@2ndquadrant.com Thu Jan  5 16:56:25 2006
Return-path: <simon@2ndquadrant.com>
Received: from smtp.nildram.co.uk (smtp.nildram.co.uk [195.112.4.54])
	by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k05LuPb02246
	for <pgman@candle.pha.pa.us>; Thu, 5 Jan 2006 16:56:25 -0500 (EST)
Received: from [192.168.0.3] (unknown [84.12.184.6])
	by smtp.nildram.co.uk (Postfix) with ESMTP
	id A9F0F268C4E; Thu,  5 Jan 2006 21:56:18 +0000 (GMT)
Subject: Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and
From: Simon Riggs <simon@2ndquadrant.com>
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Tom Lane <tgl@sss.pgh.pa.us>, Martijn van Oosterhout <kleptog@svana.org>,
   Greg Stark <gsstark@mit.edu>, Rod Taylor <pg@rbt.ca>,
   Qingqing Zhou <zhouqq@cs.toronto.edu>, pgsql-hackers@postgresql.org
In-Reply-To: <200601051727.k05HR5p02803@candle.pha.pa.us>
References: <200601051727.k05HR5p02803@candle.pha.pa.us>
Content-Type: text/plain
Date: Thu, 05 Jan 2006 21:56:21 +0000
Message-ID: <1136498181.21025.285.camel@localhost.localdomain>
MIME-Version: 1.0
X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) 
Content-Transfer-Encoding: 7bit
Status: OR

On Thu, 2006-01-05 at 12:27 -0500, Bruce Momjian wrote:

> Seems like a nice optimization.

Negative thoughts: Toast tables have a toast index on them, yes? We have
agreed that we cannot use the optimization if we have indexes on the
main table. It follows that we cannot use the optimization if we have
*any* toasted data, since that would require a pointer between two
blocks, which would not be correctly recovered following a crash. If we
log the toast table then there could be a mismatch between heap and
toast table; if we don't log the toast table there could be a mismatch
between toast table and toast index.

We can test to see if the toast table is empty when we do ALTER TABLE,
but loading operations may try to create toasted data rows.

Presumably that means we must either:
i) abort a COPY if we get a toastable value
ii) if we get a toastable value, insert the row into a new block, which
we do logging of, then also log the toast insert and the toast index
insert - i.e. some blocks we log, others not

This is still useful for many applications, IMHO, but the list of
restrictions seems to be growing. Worse, we wouldn't know that the toast
tables were empty until after we did the COPY TO for a pg_dump, so we
wouldn't be able to retrospectively add an ALTER TABLE command ahead of
the COPY. 

Thoughts? Hopefully there are some flaws in my thinking here,

Best Regards, Simon Riggs