Sophie

Sophie

distrib > Mandriva > 2006.0 > x86_64 > by-pkgid > b8f4049de69feba5041d49ed4382e582 > files > 575

postgresql-docs-8.0.11-0.1.20060mdk.x86_64.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML
><HEAD
><TITLE
>TOAST</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK
REV="MADE"
HREF="mailto:pgsql-docs@postgresql.org"><LINK
REL="HOME"
TITLE="PostgreSQL 8.0.11 Documentation"
HREF="index.html"><LINK
REL="UP"
TITLE="Database Physical Storage"
HREF="storage.html"><LINK
REL="PREVIOUS"
TITLE="Database Physical Storage"
HREF="storage.html"><LINK
REL="NEXT"
TITLE="Database Page Layout"
HREF="storage-page-layout.html"><LINK
REL="STYLESHEET"
TYPE="text/css"
HREF="stylesheet.css"><META
HTTP-EQUIV="Content-Type"
CONTENT="text/html; charset=ISO-8859-1"><META
NAME="creation"
CONTENT="2007-02-02T03:57:22"></HEAD
><BODY
CLASS="SECT1"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="5"
ALIGN="center"
VALIGN="bottom"
>PostgreSQL 8.0.11 Documentation</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="top"
><A
HREF="storage.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="top"
><A
HREF="storage.html"
>Fast Backward</A
></TD
><TD
WIDTH="60%"
ALIGN="center"
VALIGN="bottom"
>Chapter 49. Database Physical Storage</TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="top"
><A
HREF="storage.html"
>Fast Forward</A
></TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="top"
><A
HREF="storage-page-layout.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="STORAGE-TOAST"
>49.2. TOAST</A
></H1
><A
NAME="AEN58192"
></A
><A
NAME="AEN58194"
></A
><P
>This section provides an overview of <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> (The
Oversized-Attribute Storage Technique).</P
><P
>Since <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
> uses a fixed page size (commonly
8Kb), and does not allow tuples to span multiple pages, it's not possible to
store very large field values directly.  Before <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
> 7.1
there was a hard limit of just under one page on the total amount of data that
could be put into a table row.  In release 7.1 and later, this limit is
overcome by allowing large field values to be compressed and/or broken up into
multiple physical rows.  This happens transparently to the user, with only
small impact on most of the backend code.  The technique is affectionately
known as <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> (or <SPAN
CLASS="QUOTE"
>"the best thing since sliced bread"</SPAN
>).</P
><P
>Only certain data types support <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> &mdash; there is no need to
impose the overhead on data types that cannot produce large field values.
To support <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
>, a data type must have a variable-length
(<I
CLASS="FIRSTTERM"
>varlena</I
>) representation, in which the first 32-bit word of any
stored value contains the total length of the value in bytes (including
itself).  <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> does not constrain the rest of the representation.
All the C-level functions supporting a <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
>-able data type must
be careful to handle <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
>ed input values.  (This is normally done
by invoking <CODE
CLASS="FUNCTION"
>PG_DETOAST_DATUM</CODE
> before doing anything with an input
value; but in some cases more efficient approaches are possible.)</P
><P
><ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> usurps the high-order two bits of the varlena length word,
thereby limiting the logical size of any value of a <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
>-able
data type to 1Gb (2<SUP
>30</SUP
> - 1 bytes).  When both bits are zero,
the value is an ordinary un-<ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
>ed value of the data type.  One
of these bits, if set, indicates that the value has been compressed and must
be decompressed before use.  The other bit, if set, indicates that the value
has been stored out-of-line.  In this case the remainder of the value is
actually just a pointer, and the correct data has to be found elsewhere.  When
both bits are set, the out-of-line data has been compressed too.  In each case
the length in the low-order bits of the varlena word indicates the actual size
of the datum, not the size of the logical value that would be extracted by
decompression or fetching of the out-of-line data.</P
><P
>If any of the columns of a table are <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
>-able, the table will
have an associated <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> table, whose OID is stored in the table's
<TT
CLASS="STRUCTNAME"
>pg_class</TT
>.<TT
CLASS="STRUCTFIELD"
>reltoastrelid</TT
> entry.  Out-of-line
<ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
>ed values are kept in the <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> table, as
described in more detail below.</P
><P
>The compression technique used is a fairly simple and very fast member
of the LZ family of compression techniques.  See
<TT
CLASS="FILENAME"
>src/backend/utils/adt/pg_lzcompress.c</TT
> for the details.</P
><P
>Out-of-line values are divided (after compression if used) into chunks of at
most <TT
CLASS="LITERAL"
>TOAST_MAX_CHUNK_SIZE</TT
> bytes (this value is a little less than
<TT
CLASS="LITERAL"
>BLCKSZ/4</TT
>, or about 2000 bytes by default).  Each chunk is stored
as a separate row in the <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> table for the owning table.  Every
<ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> table has the columns <TT
CLASS="STRUCTFIELD"
>chunk_id</TT
> (an OID
identifying the particular <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
>ed value),
<TT
CLASS="STRUCTFIELD"
>chunk_seq</TT
> (a sequence number for the chunk within its value),
and <TT
CLASS="STRUCTFIELD"
>chunk_data</TT
> (the actual data of the chunk).  A unique index
on <TT
CLASS="STRUCTFIELD"
>chunk_id</TT
> and <TT
CLASS="STRUCTFIELD"
>chunk_seq</TT
> provides fast
retrieval of the values.  A pointer datum representing an out-of-line
<ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
>ed value therefore needs to store the OID of the
<ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> table in which to look and the OID of the specific value
(its <TT
CLASS="STRUCTFIELD"
>chunk_id</TT
>).  For convenience, pointer datums also store the
logical datum size (original uncompressed data length) and actual stored size
(different if compression was applied).  Allowing for the varlena header word,
the total size of a <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> pointer datum is therefore 20 bytes
regardless of the actual size of the represented value.</P
><P
>The <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> code is triggered only
when a row value to be stored in a table is wider than <TT
CLASS="LITERAL"
>BLCKSZ/4</TT
>
bytes (normally 2Kb).  The <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> code will compress and/or move
field values out-of-line until the row value is shorter than
<TT
CLASS="LITERAL"
>BLCKSZ/4</TT
> bytes or no more gains can be had.  During an UPDATE
operation, values of unchanged fields are normally preserved as-is; so an
UPDATE of a row with out-of-line values incurs no <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> costs if
none of the out-of-line values change.</P
><P
>The <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> code recognizes four different strategies for storing
<ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
>-able columns:

   <P
></P
></P><UL
><LI
><P
>      <TT
CLASS="LITERAL"
>PLAIN</TT
> prevents either compression or
      out-of-line storage.  This is the only possible strategy for
      columns of non-<ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
>-able data types.
     </P
></LI
><LI
><P
>      <TT
CLASS="LITERAL"
>EXTENDED</TT
> allows both compression and out-of-line
      storage.  This is the default for most <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
>-able data types.
      Compression will be attempted first, then out-of-line storage if
      the row is still too big.
     </P
></LI
><LI
><P
>      <TT
CLASS="LITERAL"
>EXTERNAL</TT
> allows out-of-line storage but not
      compression.  Use of <TT
CLASS="LITERAL"
>EXTERNAL</TT
> will
      make substring operations on wide <TT
CLASS="TYPE"
>text</TT
> and
      <TT
CLASS="TYPE"
>bytea</TT
> columns faster (at the penalty of increased storage
      space) because these operations are optimized to fetch only the
      required parts of the out-of-line value when it is not compressed.
     </P
></LI
><LI
><P
>      <TT
CLASS="LITERAL"
>MAIN</TT
> allows compression but not out-of-line
      storage.  (Actually, out-of-line storage will still be performed
      for such columns, but only as a last resort when there is no other
      way to make the row small enough.)
     </P
></LI
></UL
><P>

Each <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
>-able data type specifies a default strategy for columns
of that data type, but the strategy for a given table column can be altered
with <TT
CLASS="COMMAND"
>ALTER TABLE SET STORAGE</TT
>.</P
><P
>This scheme has a number of advantages compared to a more straightforward
approach such as allowing row values to span pages.  Assuming that queries are
usually qualified by comparisons against relatively small key values, most of
the work of the executor will be done using the main row entry. The big values
of <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
>ed attributes will only be pulled out (if selected at all)
at the time the result set is sent to the client. Thus, the main table is much
smaller and more of its rows fit in the shared buffer cache than would be the
case without any out-of-line storage. Sort sets shrink also, and sorts will
more often be done entirely in memory. A little test showed that a table
containing typical HTML pages and their URLs was stored in about half of the
raw data size including the <ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
> table, and that the main table
contained only about 10% of the entire data (the URLs and some small HTML
pages). There was no runtime difference compared to an un-<ACRONYM
CLASS="ACRONYM"
>TOAST</ACRONYM
>ed
comparison table, in which all the HTML pages were cut down to 7Kb to fit.</P
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="storage.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="storage-page-layout.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Database Physical Storage</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="storage.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Database Page Layout</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>