<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML ><HEAD ><TITLE >Character Set Support</TITLE ><META NAME="GENERATOR" CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK REV="MADE" HREF="mailto:pgsql-docs@postgresql.org"><LINK REL="HOME" TITLE="PostgreSQL 8.0.11 Documentation" HREF="index.html"><LINK REL="UP" TITLE="Localization" HREF="charset.html"><LINK REL="PREVIOUS" TITLE="Localization" HREF="charset.html"><LINK REL="NEXT" TITLE="Routine Database Maintenance Tasks" HREF="maintenance.html"><LINK REL="STYLESHEET" TYPE="text/css" HREF="stylesheet.css"><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"><META NAME="creation" CONTENT="2007-02-02T03:57:22"></HEAD ><BODY CLASS="SECT1" ><DIV CLASS="NAVHEADER" ><TABLE SUMMARY="Header navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TH COLSPAN="5" ALIGN="center" VALIGN="bottom" >PostgreSQL 8.0.11 Documentation</TH ></TR ><TR ><TD WIDTH="10%" ALIGN="left" VALIGN="top" ><A HREF="charset.html" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="10%" ALIGN="left" VALIGN="top" ><A HREF="charset.html" >Fast Backward</A ></TD ><TD WIDTH="60%" ALIGN="center" VALIGN="bottom" >Chapter 20. Localization</TD ><TD WIDTH="10%" ALIGN="right" VALIGN="top" ><A HREF="charset.html" >Fast Forward</A ></TD ><TD WIDTH="10%" ALIGN="right" VALIGN="top" ><A HREF="maintenance.html" ACCESSKEY="N" >Next</A ></TD ></TR ></TABLE ><HR ALIGN="LEFT" WIDTH="100%"></DIV ><DIV CLASS="SECT1" ><H1 CLASS="SECT1" ><A NAME="MULTIBYTE" >20.2. Character Set Support</A ></H1 ><A NAME="AEN20500" ></A ><P > The character set support in <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > allows you to store text in a variety of character sets, including single-byte character sets such as the ISO 8859 series and multiple-byte character sets such as <ACRONYM CLASS="ACRONYM" >EUC</ACRONYM > (Extended Unix Code), Unicode, and Mule internal code. All character sets can be used transparently throughout the server. (If you use extension functions from other sources, it depends on whether they wrote their code correctly.) The default character set is selected while initializing your <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > database cluster using <TT CLASS="COMMAND" >initdb</TT >. It can be overridden when you create a database using <TT CLASS="COMMAND" >createdb</TT > or by using the SQL command <TT CLASS="COMMAND" >CREATE DATABASE</TT >. So you can have multiple databases each with a different character set. </P ><DIV CLASS="SECT2" ><H2 CLASS="SECT2" ><A NAME="MULTIBYTE-CHARSET-SUPPORTED" >20.2.1. Supported Character Sets</A ></H2 ><P > <A HREF="multibyte.html#CHARSET-TABLE" >Table 20-1</A > shows the character sets available for use in the server. </P ><DIV CLASS="TABLE" ><A NAME="CHARSET-TABLE" ></A ><P ><B >Table 20-1. Server Character Sets</B ></P ><TABLE BORDER="1" CLASS="CALSTABLE" ><COL><COL><THEAD ><TR ><TH >Name</TH ><TH >Description</TH ></TR ></THEAD ><TBODY ><TR ><TD ><TT CLASS="LITERAL" >SQL_ASCII</TT ></TD ><TD ><ACRONYM CLASS="ACRONYM" >ASCII</ACRONYM ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_JP</TT ></TD ><TD >Japanese <ACRONYM CLASS="ACRONYM" >EUC</ACRONYM ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_CN</TT ></TD ><TD >Chinese <ACRONYM CLASS="ACRONYM" >EUC</ACRONYM ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_KR</TT ></TD ><TD >Korean <ACRONYM CLASS="ACRONYM" >EUC</ACRONYM ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >JOHAB</TT ></TD ><TD >Korean <ACRONYM CLASS="ACRONYM" >EUC</ACRONYM > (Hangle base)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_TW</TT ></TD ><TD >Taiwan <ACRONYM CLASS="ACRONYM" >EUC</ACRONYM ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >UNICODE</TT ></TD ><TD >Unicode (<ACRONYM CLASS="ACRONYM" >UTF</ACRONYM >-8)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >MULE_INTERNAL</TT ></TD ><TD >Mule internal code</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN1</TT ></TD ><TD >ISO 8859-1/<ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 94 (Latin alphabet no.1)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN2</TT ></TD ><TD >ISO 8859-2/<ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 94 (Latin alphabet no.2)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN3</TT ></TD ><TD >ISO 8859-3/<ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 94 (Latin alphabet no.3)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN4</TT ></TD ><TD >ISO 8859-4/<ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 94 (Latin alphabet no.4)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN5</TT ></TD ><TD >ISO 8859-9/<ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 128 (Latin alphabet no.5)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN6</TT ></TD ><TD >ISO 8859-10/<ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 144 (Latin alphabet no.6)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN7</TT ></TD ><TD >ISO 8859-13 (Latin alphabet no.7)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN8</TT ></TD ><TD >ISO 8859-14 (Latin alphabet no.8)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN9</TT ></TD ><TD >ISO 8859-15 (Latin alphabet no.9)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN10</TT ></TD ><TD >ISO 8859-16/<ACRONYM CLASS="ACRONYM" >ASRO</ACRONYM > SR 14111 (Latin alphabet no.10)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_5</TT ></TD ><TD >ISO 8859-5/<ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 113 (Latin/Cyrillic)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_6</TT ></TD ><TD >ISO 8859-6/<ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 114 (Latin/Arabic)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_7</TT ></TD ><TD >ISO 8859-7/<ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 118 (Latin/Greek)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_8</TT ></TD ><TD >ISO 8859-8/<ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 121 (Latin/Hebrew)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >KOI8</TT ></TD ><TD ><ACRONYM CLASS="ACRONYM" >KOI</ACRONYM >8-R(U)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ALT</TT ></TD ><TD >Windows CP866</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN874</TT ></TD ><TD >Windows CP874 (Thai)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1250</TT ></TD ><TD >Windows CP1250</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN</TT ></TD ><TD >Windows CP1251</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1256</TT ></TD ><TD >Windows CP1256 (Arabic)</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >TCVN</TT ></TD ><TD ><ACRONYM CLASS="ACRONYM" >TCVN</ACRONYM >-5712/Windows CP1258 (Vietnamese)</TD ></TR ></TBODY ></TABLE ></DIV ><DIV CLASS="IMPORTANT" ><BLOCKQUOTE CLASS="IMPORTANT" ><P ><B >Important: </B > Before <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > 7.2, <TT CLASS="LITERAL" >LATIN5</TT > mistakenly meant ISO 8859-5. From 7.2 on, <TT CLASS="LITERAL" >LATIN5</TT > means ISO 8859-9. If you have a <TT CLASS="LITERAL" >LATIN5</TT > database created on 7.1 or earlier and want to migrate to 7.2 or later, you should be careful about this change. </P ></BLOCKQUOTE ></DIV ><P > Not all <ACRONYM CLASS="ACRONYM" >API</ACRONYM >s support all the listed character sets. For example, the <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > JDBC driver does not support <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >LATIN6</TT >, <TT CLASS="LITERAL" >LATIN8</TT >, and <TT CLASS="LITERAL" >LATIN10</TT >. </P ></DIV ><DIV CLASS="SECT2" ><H2 CLASS="SECT2" ><A NAME="AEN20670" >20.2.2. Setting the Character Set</A ></H2 ><P > <TT CLASS="COMMAND" >initdb</TT > defines the default character set for a <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > cluster. For example, </P><PRE CLASS="SCREEN" >initdb -E EUC_JP</PRE ><P> sets the default character set (encoding) to <TT CLASS="LITERAL" >EUC_JP</TT > (Extended Unix Code for Japanese). You can use <TT CLASS="OPTION" >--encoding</TT > instead of <TT CLASS="OPTION" >-E</TT > if you prefer to type longer option strings. If no <TT CLASS="OPTION" >-E</TT > or <TT CLASS="OPTION" >--encoding</TT > option is given, <TT CLASS="LITERAL" >SQL_ASCII</TT > is used. </P ><P > You can create a database with a different character set: </P><PRE CLASS="SCREEN" >createdb -E EUC_KR korean</PRE ><P> This will create a database named <TT CLASS="LITERAL" >korean</TT > that uses the character set <TT CLASS="LITERAL" >EUC_KR</TT >. Another way to accomplish this is to use this SQL command: </P><PRE CLASS="PROGRAMLISTING" >CREATE DATABASE korean WITH ENCODING 'EUC_KR';</PRE ><P> The encoding for a database is stored in the system catalog <TT CLASS="LITERAL" >pg_database</TT >. You can see that by using the <TT CLASS="OPTION" >-l</TT > option or the <TT CLASS="COMMAND" >\l</TT > command of <TT CLASS="COMMAND" >psql</TT >. </P><PRE CLASS="SCREEN" >$ <KBD CLASS="USERINPUT" >psql -l</KBD > List of databases Database | Owner | Encoding ---------------+---------+--------------- euc_cn | t-ishii | EUC_CN euc_jp | t-ishii | EUC_JP euc_kr | t-ishii | EUC_KR euc_tw | t-ishii | EUC_TW mule_internal | t-ishii | MULE_INTERNAL regression | t-ishii | SQL_ASCII template1 | t-ishii | EUC_JP test | t-ishii | EUC_JP unicode | t-ishii | UNICODE (9 rows)</PRE ><P> </P ><DIV CLASS="IMPORTANT" ><BLOCKQUOTE CLASS="IMPORTANT" ><P ><B >Important: </B > Although you can specify any encoding you want for a database, it is unwise to choose an encoding that is not what is expected by the locale you have selected. The <TT CLASS="LITERAL" >LC_COLLATE</TT > and <TT CLASS="LITERAL" >LC_CTYPE</TT > settings imply a particular encoding, and locale-dependent operations (such as sorting) are likely to misinterpret data that is in an incompatible encoding. </P ><P > Since these locale settings are frozen by <TT CLASS="COMMAND" >initdb</TT >, the apparent flexibility to use different encodings in different databases of a cluster is more theoretical than real. It is likely that these mechanisms will be revisited in future versions of <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN >. </P ><P > One way to use multiple encodings safely is to set the locale to <TT CLASS="LITERAL" >C</TT > or <TT CLASS="LITERAL" >POSIX</TT > during <TT CLASS="COMMAND" >initdb</TT >, thus disabling any real locale awareness. </P ></BLOCKQUOTE ></DIV ></DIV ><DIV CLASS="SECT2" ><H2 CLASS="SECT2" ><A NAME="AEN20704" >20.2.3. Automatic Character Set Conversion Between Server and Client</A ></H2 ><P > <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > supports automatic character set conversion between server and client for certain character sets. The conversion information is stored in the <TT CLASS="LITERAL" >pg_conversion</TT > system catalog. You can create a new conversion by using the SQL command <TT CLASS="COMMAND" >CREATE CONVERSION</TT >. <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > comes with some predefined conversions. They are listed in <A HREF="multibyte.html#MULTIBYTE-TRANSLATION-TABLE" >Table 20-2</A >. </P ><DIV CLASS="TABLE" ><A NAME="MULTIBYTE-TRANSLATION-TABLE" ></A ><P ><B >Table 20-2. Client/Server Character Set Conversions</B ></P ><TABLE BORDER="1" CLASS="CALSTABLE" ><COL><COL><THEAD ><TR ><TH >Server Character Set</TH ><TH >Available Client Character Sets</TH ></TR ></THEAD ><TBODY ><TR ><TD ><TT CLASS="LITERAL" >SQL_ASCII</TT ></TD ><TD ><TT CLASS="LITERAL" >SQL_ASCII</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_JP</TT ></TD ><TD ><TT CLASS="LITERAL" >EUC_JP</TT >, <TT CLASS="LITERAL" >SJIS</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_CN</TT ></TD ><TD ><TT CLASS="LITERAL" >EUC_CN</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_KR</TT ></TD ><TD ><TT CLASS="LITERAL" >EUC_KR</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >JOHAB</TT ></TD ><TD ><TT CLASS="LITERAL" >JOHAB</TT >, <TT CLASS="LITERAL" >UNICODE</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_TW</TT ></TD ><TD ><TT CLASS="LITERAL" >EUC_TW</TT >, <TT CLASS="LITERAL" >BIG5</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN1</TT ></TD ><TD ><TT CLASS="LITERAL" >LATIN1</TT >, <TT CLASS="LITERAL" >UNICODE</TT > <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN2</TT ></TD ><TD ><TT CLASS="LITERAL" >LATIN2</TT >, <TT CLASS="LITERAL" >WIN1250</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN3</TT ></TD ><TD ><TT CLASS="LITERAL" >LATIN3</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN4</TT ></TD ><TD ><TT CLASS="LITERAL" >LATIN4</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN5</TT ></TD ><TD ><TT CLASS="LITERAL" >LATIN5</TT >, <TT CLASS="LITERAL" >UNICODE</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN6</TT ></TD ><TD ><TT CLASS="LITERAL" >LATIN6</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN7</TT ></TD ><TD ><TT CLASS="LITERAL" >LATIN7</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN8</TT ></TD ><TD ><TT CLASS="LITERAL" >LATIN8</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN9</TT ></TD ><TD ><TT CLASS="LITERAL" >LATIN9</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN10</TT ></TD ><TD ><TT CLASS="LITERAL" >LATIN10</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_5</TT ></TD ><TD ><TT CLASS="LITERAL" >ISO_8859_5</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >WIN</TT >, <TT CLASS="LITERAL" >ALT</TT >, <TT CLASS="LITERAL" >KOI8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_6</TT ></TD ><TD ><TT CLASS="LITERAL" >ISO_8859_6</TT >, <TT CLASS="LITERAL" >UNICODE</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_7</TT ></TD ><TD ><TT CLASS="LITERAL" >ISO_8859_7</TT >, <TT CLASS="LITERAL" >UNICODE</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_8</TT ></TD ><TD ><TT CLASS="LITERAL" >ISO_8859_8</TT >, <TT CLASS="LITERAL" >UNICODE</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >UNICODE</TT ></TD ><TD > <TT CLASS="LITERAL" >EUC_JP</TT >, <TT CLASS="LITERAL" >SJIS</TT >, <TT CLASS="LITERAL" >EUC_KR</TT >, <TT CLASS="LITERAL" >UHC</TT >, <TT CLASS="LITERAL" >JOHAB</TT >, <TT CLASS="LITERAL" >EUC_CN</TT >, <TT CLASS="LITERAL" >GBK</TT >, <TT CLASS="LITERAL" >EUC_TW</TT >, <TT CLASS="LITERAL" >BIG5</TT >, <TT CLASS="LITERAL" >LATIN1</TT > to <TT CLASS="LITERAL" >LATIN10</TT >, <TT CLASS="LITERAL" >ISO_8859_5</TT >, <TT CLASS="LITERAL" >ISO_8859_6</TT >, <TT CLASS="LITERAL" >ISO_8859_7</TT >, <TT CLASS="LITERAL" >ISO_8859_8</TT >, <TT CLASS="LITERAL" >WIN</TT >, <TT CLASS="LITERAL" >ALT</TT >, <TT CLASS="LITERAL" >KOI8</TT >, <TT CLASS="LITERAL" >WIN1256</TT >, <TT CLASS="LITERAL" >TCVN</TT >, <TT CLASS="LITERAL" >WIN874</TT >, <TT CLASS="LITERAL" >GB18030</TT >, <TT CLASS="LITERAL" >WIN1250</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >MULE_INTERNAL</TT ></TD ><TD ><TT CLASS="LITERAL" >EUC_JP</TT >, <TT CLASS="LITERAL" >SJIS</TT >, <TT CLASS="LITERAL" >EUC_KR</TT >, <TT CLASS="LITERAL" >EUC_CN</TT >, <TT CLASS="LITERAL" >EUC_TW</TT >, <TT CLASS="LITERAL" >BIG5</TT >, <TT CLASS="LITERAL" >LATIN1</TT > to <TT CLASS="LITERAL" >LATIN5</TT >, <TT CLASS="LITERAL" >WIN</TT >, <TT CLASS="LITERAL" >ALT</TT >, <TT CLASS="LITERAL" >WIN1250</TT >, <TT CLASS="LITERAL" >BIG5</TT >, <TT CLASS="LITERAL" >ISO_8859_5</TT >, <TT CLASS="LITERAL" >KOI8</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >KOI8</TT ></TD ><TD ><TT CLASS="LITERAL" >ISO_8859_5</TT >, <TT CLASS="LITERAL" >WIN</TT >, <TT CLASS="LITERAL" >ALT</TT >, <TT CLASS="LITERAL" >KOI8</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ALT</TT ></TD ><TD ><TT CLASS="LITERAL" >ISO_8859_5</TT >, <TT CLASS="LITERAL" >WIN</TT >, <TT CLASS="LITERAL" >ALT</TT >, <TT CLASS="LITERAL" >KOI8</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN874</TT ></TD ><TD ><TT CLASS="LITERAL" >WIN874</TT >, <TT CLASS="LITERAL" >UNICODE</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1250</TT ></TD ><TD ><TT CLASS="LITERAL" >LATIN2</TT >, <TT CLASS="LITERAL" >WIN1250</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN</TT ></TD ><TD ><TT CLASS="LITERAL" >ISO_8859_5</TT >, <TT CLASS="LITERAL" >WIN</TT >, <TT CLASS="LITERAL" >ALT</TT >, <TT CLASS="LITERAL" >KOI8</TT >, <TT CLASS="LITERAL" >UNICODE</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1256</TT ></TD ><TD ><TT CLASS="LITERAL" >WIN1256</TT >, <TT CLASS="LITERAL" >UNICODE</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >TCVN</TT ></TD ><TD ><TT CLASS="LITERAL" >TCVN</TT >, <TT CLASS="LITERAL" >UNICODE</TT > </TD ></TR ></TBODY ></TABLE ></DIV ><P > To enable the automatic character set conversion, you have to tell <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > the character set (encoding) you would like to use in the client. There are several ways to accomplish this: <P ></P ></P><UL ><LI ><P > Using the <TT CLASS="COMMAND" >\encoding</TT > command in <SPAN CLASS="APPLICATION" >psql</SPAN >. <TT CLASS="COMMAND" >\encoding</TT > allows you to change client encoding on the fly. For example, to change the encoding to <TT CLASS="LITERAL" >SJIS</TT >, type: </P><PRE CLASS="PROGRAMLISTING" >\encoding SJIS</PRE ><P> </P ></LI ><LI ><P > Using <SPAN CLASS="APPLICATION" >libpq</SPAN > functions. <TT CLASS="COMMAND" >\encoding</TT > actually calls <CODE CLASS="FUNCTION" >PQsetClientEncoding()</CODE > for its purpose. </P><PRE CLASS="SYNOPSIS" >int PQsetClientEncoding(PGconn *<TT CLASS="REPLACEABLE" ><I >conn</I ></TT >, const char *<TT CLASS="REPLACEABLE" ><I >encoding</I ></TT >);</PRE ><P> where <TT CLASS="REPLACEABLE" ><I >conn</I ></TT > is a connection to the server, and <TT CLASS="REPLACEABLE" ><I >encoding</I ></TT > is the encoding you want to use. If the function successfully sets the encoding, it returns 0, otherwise -1. The current encoding for this connection can be determined by using: </P><PRE CLASS="SYNOPSIS" >int PQclientEncoding(const PGconn *<TT CLASS="REPLACEABLE" ><I >conn</I ></TT >);</PRE ><P> Note that it returns the encoding ID, not a symbolic string such as <TT CLASS="LITERAL" >EUC_JP</TT >. To convert an encoding ID to an encoding name, you can use: </P><PRE CLASS="SYNOPSIS" >char *pg_encoding_to_char(int <TT CLASS="REPLACEABLE" ><I >encoding_id</I ></TT >);</PRE ><P> </P ></LI ><LI ><P > Using <TT CLASS="COMMAND" >SET client_encoding TO</TT >. Setting the client encoding can be done with this SQL command: </P><PRE CLASS="PROGRAMLISTING" >SET CLIENT_ENCODING TO '<TT CLASS="REPLACEABLE" ><I >value</I ></TT >';</PRE ><P> Also you can use the more standard SQL syntax <TT CLASS="LITERAL" >SET NAMES</TT > for this purpose: </P><PRE CLASS="PROGRAMLISTING" >SET NAMES '<TT CLASS="REPLACEABLE" ><I >value</I ></TT >';</PRE ><P> To query the current client encoding: </P><PRE CLASS="PROGRAMLISTING" >SHOW client_encoding;</PRE ><P> To return to the default encoding: </P><PRE CLASS="PROGRAMLISTING" >RESET client_encoding;</PRE ><P> </P ></LI ><LI ><P > Using <TT CLASS="ENVAR" >PGCLIENTENCODING</TT >. If the environment variable <TT CLASS="ENVAR" >PGCLIENTENCODING</TT > is defined in the client's environment, that client encoding is automatically selected when a connection to the server is made. (This can subsequently be overridden using any of the other methods mentioned above.) </P ></LI ><LI ><P > Using the configuration variable <A HREF="runtime-config.html#GUC-CLIENT-ENCODING" >client_encoding</A >. If the <TT CLASS="VARNAME" >client_encoding</TT > variable is set, that client encoding is automatically selected when a connection to the server is made. (This can subsequently be overridden using any of the other methods mentioned above.) </P ></LI ></UL ><P> </P ><P > If the conversion of a particular character is not possible — suppose you chose <TT CLASS="LITERAL" >EUC_JP</TT > for the server and <TT CLASS="LITERAL" >LATIN1</TT > for the client, then some Japanese characters cannot be converted to <TT CLASS="LITERAL" >LATIN1</TT > — it is transformed to its hexadecimal byte values in parentheses, e.g., <TT CLASS="LITERAL" >(826C)</TT >. </P ></DIV ><DIV CLASS="SECT2" ><H2 CLASS="SECT2" ><A NAME="AEN21010" >20.2.4. Further Reading</A ></H2 ><P > These are good sources to start learning about various kinds of encoding systems. <P ></P ></P><DIV CLASS="VARIABLELIST" ><DL ><DT ><A HREF="ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf" TARGET="_top" >ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf</A ></DT ><DD ><P > Detailed explanations of <TT CLASS="LITERAL" >EUC_JP</TT >, <TT CLASS="LITERAL" >EUC_CN</TT >, <TT CLASS="LITERAL" >EUC_KR</TT >, <TT CLASS="LITERAL" >EUC_TW</TT > appear in section 3.2. </P ></DD ><DT ><A HREF="http://www.unicode.org/" TARGET="_top" >http://www.unicode.org/</A ></DT ><DD ><P > The web site of the Unicode Consortium </P ></DD ><DT >RFC 2044</DT ><DD ><P > <ACRONYM CLASS="ACRONYM" >UTF</ACRONYM >-8 is defined here. </P ></DD ></DL ></DIV ><P> </P ></DIV ></DIV ><DIV CLASS="NAVFOOTER" ><HR ALIGN="LEFT" WIDTH="100%"><TABLE SUMMARY="Footer navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" ><A HREF="charset.html" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" ><A HREF="index.html" ACCESSKEY="H" >Home</A ></TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" ><A HREF="maintenance.html" ACCESSKEY="N" >Next</A ></TD ></TR ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" >Localization</TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" ><A HREF="charset.html" ACCESSKEY="U" >Up</A ></TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" >Routine Database Maintenance Tasks</TD ></TR ></TABLE ></DIV ></BODY ></HTML >