Sophie

Sophie

distrib > Mandriva > 2008.1 > x86_64 > media > main-testing > by-pkgid > bab02a23fa9f3df8d66a9a3231b50245 > files > 917

postgresql8.3-docs-8.3.6-2mdv2008.1.x86_64.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML
><HEAD
><TITLE
>Warm Standby Servers for High Availability</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK
REV="MADE"
HREF="mailto:pgsql-docs@postgresql.org"><LINK
REL="HOME"
TITLE="PostgreSQL 8.3.6 Documentation"
HREF="index.html"><LINK
REL="UP"
TITLE="Backup and Restore"
HREF="backup.html"><LINK
REL="PREVIOUS"
TITLE="Continuous Archiving and Point-In-Time Recovery (PITR)"
HREF="continuous-archiving.html"><LINK
REL="NEXT"
TITLE="Migration Between Releases"
HREF="migration.html"><LINK
REL="STYLESHEET"
TYPE="text/css"
HREF="stylesheet.css"><META
HTTP-EQUIV="Content-Type"
CONTENT="text/html; charset=ISO-8859-1"><META
NAME="creation"
CONTENT="2009-02-03T04:34:16"></HEAD
><BODY
CLASS="SECT1"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="5"
ALIGN="center"
VALIGN="bottom"
>PostgreSQL 8.3.6 Documentation</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="top"
><A
HREF="continuous-archiving.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="top"
><A
HREF="backup.html"
>Fast Backward</A
></TD
><TD
WIDTH="60%"
ALIGN="center"
VALIGN="bottom"
>Chapter 24. Backup and Restore</TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="top"
><A
HREF="backup.html"
>Fast Forward</A
></TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="top"
><A
HREF="migration.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="WARM-STANDBY"
>24.4. Warm Standby Servers for High Availability</A
></H1
><A
NAME="AEN28747"
></A
><A
NAME="AEN28749"
></A
><A
NAME="AEN28751"
></A
><A
NAME="AEN28753"
></A
><A
NAME="AEN28755"
></A
><A
NAME="AEN28757"
></A
><A
NAME="AEN28759"
></A
><P
>   Continuous archiving can be used to create a <I
CLASS="FIRSTTERM"
>high
   availability</I
> (HA) cluster configuration with one or more
   <I
CLASS="FIRSTTERM"
>standby servers</I
> ready to take over operations if the
   primary server fails. This capability is widely referred to as
   <I
CLASS="FIRSTTERM"
>warm standby</I
> or <I
CLASS="FIRSTTERM"
>log shipping</I
>.
  </P
><P
>   The primary and standby server work together to provide this capability,
   though the servers are only loosely coupled. The primary server operates
   in continuous archiving mode, while each standby server operates in
   continuous recovery mode, reading the WAL files from the primary. No
   changes to the database tables are required to enable this capability,
   so it offers low administration overhead in comparison with some other
   replication approaches. This configuration also has relatively low
   performance impact on the primary server.
  </P
><P
>   Directly moving WAL records from one database server to another
   is typically described as log shipping. <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
>
   implements file-based log shipping, which means that WAL records are
   transferred one file (WAL segment) at a time. WAL files (16MB) can be
   shipped easily and cheaply over any distance, whether it be to an
   adjacent system, another system on the same site or another system on
   the far side of the globe. The bandwidth required for this technique
   varies according to the transaction rate of the primary server.
   Record-based log shipping is also possible with custom-developed
   procedures, as discussed in <A
HREF="warm-standby.html#WARM-STANDBY-RECORD"
>Section 24.4.4</A
>.
  </P
><P
>   It should be noted that the log shipping is asynchronous, i.e. the WAL
   records are shipped after transaction commit. As a result there is a
   window for data loss should the primary server suffer a catastrophic
   failure: transactions not yet shipped will be lost.  The length of the
   window of data loss can be limited by use of the
   <TT
CLASS="VARNAME"
>archive_timeout</TT
> parameter, which can be set as low
   as a few seconds if required.  However such low settings will
   substantially increase the bandwidth requirements for file shipping.
   If you need a window of less than a minute or so, it's probably better
   to look into record-based log shipping.
  </P
><P
>   The standby server is not available for access, since it is continually
   performing recovery processing. Recovery performance is sufficiently
   good that the standby will typically be only moments away from full
   availability once it has been activated. As a result, we refer to this
   capability as a warm standby configuration that offers high
   availability. Restoring a server from an archived base backup and
   rollforward will take considerably longer, so that technique only
   offers a solution for disaster recovery, not high availability.
  </P
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="WARM-STANDBY-PLANNING"
>24.4.1. Planning</A
></H2
><P
>    It is usually wise to create the primary and standby servers
    so that they are as similar as possible, at least from the
    perspective of the database server.  In particular, the path names
    associated with tablespaces will be passed across as-is, so both
    primary and standby servers must have the same mount paths for
    tablespaces if that feature is used.  Keep in mind that if
    <A
HREF="sql-createtablespace.html"
><I
>CREATE TABLESPACE</I
></A
>
    is executed on the primary, any new mount point needed for it must
    be created on both the primary and all standby servers before the command
    is executed. Hardware need not be exactly the same, but experience shows
    that maintaining two identical systems is easier than maintaining two
    dissimilar ones over the lifetime of the application and system.
    In any case the hardware architecture must be the same &mdash; shipping
    from, say, a 32-bit to a 64-bit system will not work.
   </P
><P
>    In general, log shipping between servers running different major
    <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
> release
    levels will not be possible. It is the policy of the PostgreSQL Global
    Development Group not to make changes to disk formats during minor release
    upgrades, so it is likely that running different minor release levels
    on primary and standby servers will work successfully. However, no
    formal support for that is offered and you are advised to keep primary
    and standby servers at the same release level as much as possible.
    When updating to a new minor release, the safest policy is to update
    the standby servers first &mdash; a new minor release is more likely
    to be able to read WAL files from a previous minor release than vice
    versa.
   </P
><P
>    There is no special mode required to enable a standby server. The
    operations that occur on both primary and standby servers are entirely
    normal continuous archiving and recovery tasks. The only point of
    contact between the two database servers is the archive of WAL files
    that both share: primary writing to the archive, standby reading from
    the archive. Care must be taken to ensure that WAL archives for separate
    primary servers do not become mixed together or confused. The archive
    need not be large, if it is only required for the standby operation.
   </P
><P
>    The magic that makes the two loosely coupled servers work together is
    simply a <TT
CLASS="VARNAME"
>restore_command</TT
> used on the standby that waits
    for the next WAL file to become available from the primary. The
    <TT
CLASS="VARNAME"
>restore_command</TT
> is specified in the
    <TT
CLASS="FILENAME"
>recovery.conf</TT
> file on the standby server. Normal recovery
    processing would request a file from the WAL archive, reporting failure
    if the file was unavailable.  For standby processing it is normal for
    the next file to be unavailable, so we must be patient and wait for
    it to appear. A waiting <TT
CLASS="VARNAME"
>restore_command</TT
> can be written as
    a custom script that loops after polling for the existence of the next
    WAL file. There must also be some way to trigger failover, which should
    interrupt the <TT
CLASS="VARNAME"
>restore_command</TT
>, break the loop and return
    a file-not-found error to the standby server. This ends recovery and
    the standby will then come up as a normal server.
   </P
><P
>    Pseudocode for a suitable <TT
CLASS="VARNAME"
>restore_command</TT
> is:
</P><PRE
CLASS="PROGRAMLISTING"
>triggered = false;
while (!NextWALFileReady() &amp;&amp; !triggered)
{
    sleep(100000L);         /* wait for ~0.1 sec */
    if (CheckForExternalTrigger())
        triggered = true;
}
if (!triggered)
        CopyWALFileForRecovery();</PRE
><P>
   </P
><P
>    A working example of a waiting <TT
CLASS="VARNAME"
>restore_command</TT
> is provided
    as a <TT
CLASS="FILENAME"
>contrib</TT
> module named <SPAN
CLASS="APPLICATION"
>pg_standby</SPAN
>. This
    example can be extended as needed to support specific configurations or
    environments.
   </P
><P
>    <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
> does not provide the system
    software required to identify a failure on the primary and notify
    the standby system and then the standby database server. Many such
    tools exist and are well integrated with other aspects required for
    successful failover, such as IP address migration.
   </P
><P
>    The means for triggering failover is an important part of planning and
    design. The <TT
CLASS="VARNAME"
>restore_command</TT
> is executed in full once
    for each WAL file. The process running the <TT
CLASS="VARNAME"
>restore_command</TT
>
    is therefore created and dies for each file, so there is no daemon
    or server process and so we cannot use signals and a signal
    handler. A more permanent notification is required to trigger the
    failover. It is possible to use a simple timeout facility,
    especially if used in conjunction with a known
    <TT
CLASS="VARNAME"
>archive_timeout</TT
> setting on the primary. This is
    somewhat error prone since a network problem or busy primary server might
    be sufficient to initiate failover. A notification mechanism such
    as the explicit creation of a trigger file is less error prone, if
    this can be arranged.
   </P
><P
>    The size of the WAL archive can be minimized by using the <TT
CLASS="LITERAL"
>%r</TT
>
    option of the <TT
CLASS="VARNAME"
>restore_command</TT
>. This option specifies the
    last archive file name that needs to be kept to allow the recovery to
    restart correctly. This can be used to truncate the archive once
    files are no longer required, if the archive is writable from the
    standby server.
   </P
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="WARM-STANDBY-CONFIG"
>24.4.2. Implementation</A
></H2
><P
>    The short procedure for configuring a standby server is as follows. For
    full details of each step, refer to previous sections as noted.
    <P
></P
></P><OL
TYPE="1"
><LI
><P
>       Set up primary and standby systems as near identically as
       possible, including two identical copies of
       <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
> at the same release level.
      </P
></LI
><LI
><P
>       Set up continuous archiving from the primary to a WAL archive located
       in a directory on the standby server. Ensure that
       <A
HREF="runtime-config-wal.html#GUC-ARCHIVE-MODE"
>archive_mode</A
>,
       <A
HREF="runtime-config-wal.html#GUC-ARCHIVE-COMMAND"
>archive_command</A
> and
       <A
HREF="runtime-config-wal.html#GUC-ARCHIVE-TIMEOUT"
>archive_timeout</A
>
       are set appropriately on the primary
       (see <A
HREF="continuous-archiving.html#BACKUP-ARCHIVING-WAL"
>Section 24.3.1</A
>).
      </P
></LI
><LI
><P
>       Make a base backup of the primary server (see <A
HREF="continuous-archiving.html#BACKUP-BASE-BACKUP"
>Section 24.3.2</A
>), and load this data onto the standby.
      </P
></LI
><LI
><P
>       Begin recovery on the standby server from the local WAL
       archive, using a <TT
CLASS="FILENAME"
>recovery.conf</TT
> that specifies a
       <TT
CLASS="VARNAME"
>restore_command</TT
> that waits as described
       previously (see <A
HREF="continuous-archiving.html#BACKUP-PITR-RECOVERY"
>Section 24.3.3</A
>).
      </P
></LI
></OL
><P>
   </P
><P
>    Recovery treats the WAL archive as read-only, so once a WAL file has
    been copied to the standby system it can be copied to tape at the same
    time as it is being read by the standby database server.
    Thus, running a standby server for high availability can be performed at
    the same time as files are stored for longer term disaster recovery
    purposes.
   </P
><P
>    For testing purposes, it is possible to run both primary and standby
    servers on the same system. This does not provide any worthwhile
    improvement in server robustness, nor would it be described as HA.
   </P
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="WARM-STANDBY-FAILOVER"
>24.4.3. Failover</A
></H2
><P
>    If the primary server fails then the standby server should begin
    failover procedures.
   </P
><P
>    If the standby server fails then no failover need take place. If the
    standby server can be restarted, even some time later, then the recovery
    process can also be immediately restarted, taking advantage of
    restartable recovery. If the standby server cannot be restarted, then a
    full new standby server instance should be created.
   </P
><P
>    If the primary server fails and then immediately restarts, you must have
    a mechanism for informing it that it is no longer the primary. This is
    sometimes known as STONITH (Shoot the Other Node In The Head), which is
    necessary to avoid situations where both systems think they are the
    primary, which will lead to confusion and ultimately data loss.
   </P
><P
>    Many failover systems use just two systems, the primary and the standby,
    connected by some kind of heartbeat mechanism to continually verify the
    connectivity between the two and the viability of the primary. It is
    also possible to use a third system (called a witness server) to prevent
    some cases of inappropriate failover, but the additional complexity
    might not be worthwhile unless it is set up with sufficient care and
    rigorous testing.
   </P
><P
>    Once failover to the standby occurs, we have only a
    single server in operation. This is known as a degenerate state.
    The former standby is now the primary, but the former primary is down
    and might stay down.  To return to normal operation we must
    fully recreate a standby server,
    either on the former primary system when it comes up, or on a third,
    possibly new, system. Once complete the primary and standby can be
    considered to have switched roles. Some people choose to use a third
    server to provide backup to the new primary until the new standby
    server is recreated,
    though clearly this complicates the system configuration and
    operational processes.
   </P
><P
>    So, switching from primary to standby server can be fast but requires
    some time to re-prepare the failover cluster. Regular switching from
    primary to standby is useful, since it allows regular downtime on
    each system for maintenance. This also serves as a test of the
    failover mechanism to ensure that it will really work when you need it.
    Written administration procedures are advised.
   </P
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="WARM-STANDBY-RECORD"
>24.4.4. Record-based Log Shipping</A
></H2
><P
>    <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
> directly supports file-based
    log shipping as described above. It is also possible to implement
    record-based log shipping, though this requires custom development.
   </P
><P
>    An external program can call the <CODE
CLASS="FUNCTION"
>pg_xlogfile_name_offset()</CODE
>
    function (see <A
HREF="functions-admin.html"
>Section 9.23</A
>)
    to find out the file name and the exact byte offset within it of
    the current end of WAL.  It can then access the WAL file directly
    and copy the data from the last known end of WAL through the current end
    over to the standby server(s).  With this approach, the window for data
    loss is the polling cycle time of the copying program, which can be very
    small, but there is no wasted bandwidth from forcing partially-used
    segment files to be archived.  Note that the standby servers'
    <TT
CLASS="VARNAME"
>restore_command</TT
> scripts still deal in whole WAL files,
    so the incrementally copied data is not ordinarily made available to
    the standby servers.  It is of use only when the primary dies &mdash;
    then the last partial WAL file is fed to the standby before allowing
    it to come up.  So correct implementation of this process requires
    cooperation of the <TT
CLASS="VARNAME"
>restore_command</TT
> script with the data
    copying program.
   </P
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="BACKUP-INCREMENTAL-UPDATED"
>24.4.5. Incrementally Updated Backups</A
></H2
><A
NAME="AEN28844"
></A
><A
NAME="AEN28846"
></A
><P
>    In a warm standby configuration, it is possible to offload the expense of
    taking periodic base backups from the primary server; instead base backups
    can be made by backing
    up a standby server's files.  This concept is generally known as
    incrementally updated backups, log change accumulation, or more simply,
    change accumulation.
   </P
><P
>    If we take a backup of the standby server's data directory while it is processing
    logs shipped from the primary, we will be able to reload that data and
    restart the standby's recovery process from the last restart point.
    We no longer need to keep WAL files from before the restart point.
    If we need to recover, it will be faster to recover from the incrementally
    updated backup than from the original base backup.
   </P
><P
>    Since the standby server is not <SPAN
CLASS="QUOTE"
>"live"</SPAN
>, it is not possible to
    use <CODE
CLASS="FUNCTION"
>pg_start_backup()</CODE
> and <CODE
CLASS="FUNCTION"
>pg_stop_backup()</CODE
>
    to manage the backup process; it will be up to you to determine how
    far back you need to keep WAL segment files to have a recoverable
    backup.  You can do this by running <SPAN
CLASS="APPLICATION"
>pg_controldata</SPAN
>
    on the standby server to inspect the control file and determine the
    current checkpoint WAL location, or by using the
    <TT
CLASS="VARNAME"
>log_restartpoints</TT
> option to print values to the server log.
   </P
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="continuous-archiving.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="migration.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Continuous Archiving and Point-In-Time Recovery (PITR)</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="backup.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Migration Between Releases</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>