<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><title>Chapter 9. Internationalization and Localization</title><link rel="stylesheet" href="mysql-html.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.69.1"><link rel="start" href="index.html" title="MySQL 5.0 Reference Manual"><link rel="up" href="index.html" title="MySQL 5.0 Reference Manual"><link rel="prev" href="language-structure.html" title="Chapter 8. Language Structure"><link rel="next" href="data-types.html" title="Chapter 10. Data Types"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter 9. Internationalization and Localization</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="language-structure.html">Prev</a> </td><th width="60%" align="center"> </th><td width="20%" align="right"> <a accesskey="n" href="data-types.html">Next</a></td></tr></table><hr></div><div class="chapter" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="internationalization-localization"></a>Chapter 9. Internationalization and Localization</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="internationalization-localization.html#charset">9.1. Character Set Support</a></span></dt><dd><dl><dt><span class="section"><a href="internationalization-localization.html#charset-general">9.1.1. Character Sets and Collations in General</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-mysql">9.1.2. Character Sets and Collations in MySQL</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-syntax">9.1.3. Specifying Character Sets and Collations</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-connection">9.1.4. Connection Character Sets and Collations</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-collations">9.1.5. Collation Issues</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-repertoire">9.1.6. String Repertoire</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-operations">9.1.7. Operations Affected by Character Set Support</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-unicode">9.1.8. Unicode Support</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-metadata">9.1.9. UTF-8 for Metadata</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-conversion">9.1.10. Column Character Set Conversion</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-charsets">9.1.11. Character Sets and Collations That MySQL Supports</a></span></dt></dl></dd><dt><span class="section"><a href="internationalization-localization.html#charset-configuration">9.2. The Character Set Used for Data and Sorting</a></span></dt><dd><dl><dt><span class="section"><a href="internationalization-localization.html#german-character-set">9.2.1. Using the German Character Set</a></span></dt></dl></dd><dt><span class="section"><a href="internationalization-localization.html#error-message-language">9.3. Setting the Error Message Language</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#adding-character-set">9.4. Adding a New Character Set</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#character-arrays">9.5. The Character Definition Arrays</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#string-collating">9.6. String Collating Support</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#multi-byte-characters">9.7. Multi-Byte Character Support</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#problems-with-character-sets">9.8. Problems With Character Sets</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#time-zone-support">9.9. MySQL Server Time Zone Support</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#locale-support">9.10. MySQL Server Locale Support</a></span></dt></dl></div><a class="indexterm" name="id2741492"></a><a class="indexterm" name="id2741501"></a><p> This chapter covers issues of internationalization (MySQL's capabilities for adapting to local use) and localization (selecting particular local conventions): </p><div class="itemizedlist"><ul type="disc"><li><p> MySQL support for character sets in SQL statements. </p></li><li><p> How to configure the server to support different character sets. </p></li><li><p> Selecting the language for error messages. </p></li><li><p> How to set the server's time zone and enable per-connection time zone support. </p></li><li><p> Selecting the locale for day and month names. </p></li></ul></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="charset"></a>9.1. Character Set Support</h2></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#charset-general">9.1.1. Character Sets and Collations in General</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-mysql">9.1.2. Character Sets and Collations in MySQL</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-syntax">9.1.3. Specifying Character Sets and Collations</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-connection">9.1.4. Connection Character Sets and Collations</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-collations">9.1.5. Collation Issues</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-repertoire">9.1.6. String Repertoire</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-operations">9.1.7. Operations Affected by Character Set Support</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-unicode">9.1.8. Unicode Support</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-metadata">9.1.9. UTF-8 for Metadata</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-conversion">9.1.10. Column Character Set Conversion</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-charsets">9.1.11. Character Sets and Collations That MySQL Supports</a></span></dt></dl></div><a class="indexterm" name="id2741566"></a><a class="indexterm" name="id2741575"></a><a class="indexterm" name="id2741584"></a><a class="indexterm" name="id2741593"></a><p> MySQL includes character set support that enables you to store data using a variety of character sets and perform comparisons according to a variety of collations. You can specify character sets at the server, database, table, and column level. MySQL supports the use of character sets for the <code class="literal">MyISAM</code>, <code class="literal">MEMORY</code>, <code class="literal">NDBCluster</code>, and <code class="literal">InnoDB</code> storage engines. </p><p> This chapter discusses the following topics: </p><div class="itemizedlist"><ul type="disc"><li><p> What are character sets and collations? </p></li><li><p> The multiple-level default system for character set assignment </p></li><li><p> Syntax for specifying character sets and collations </p></li><li><p> Affected functions and operations </p></li><li><p> Unicode support </p></li><li><p> The character sets and collations that are available, with notes </p></li></ul></div><p> Character set issues affect not only data storage, but also communication between client programs and the MySQL server. If you want the client program to communicate with the server using a character set different from the default, you'll need to indicate which one. For example, to use the <code class="literal">utf8</code> Unicode character set, issue this statement after connecting to the server: </p><pre class="programlisting">SET NAMES 'utf8'; </pre><p> For more information about character set-related issues in client/server communication, see <a href="internationalization-localization.html#charset-connection" title="9.1.4. Connection Character Sets and Collations">Section 9.1.4, “Connection Character Sets and Collations”</a>. </p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-general"></a>9.1.1. Character Sets and Collations in General</h3></div></div></div><p> A <em class="firstterm">character set</em> is a set of symbols and encodings. A <em class="firstterm">collation</em> is a set of rules for comparing characters in a character set. Let's make the distinction clear with an example of an imaginary character set. </p><p> Suppose that we have an alphabet with four letters: “<span class="quote"><code class="literal">A</code></span>”, “<span class="quote"><code class="literal">B</code></span>”, “<span class="quote"><code class="literal">a</code></span>”, “<span class="quote"><code class="literal">b</code></span>”. We give each letter a number: “<span class="quote"><code class="literal">A</code></span>” = 0, “<span class="quote"><code class="literal">B</code></span>” = 1, “<span class="quote"><code class="literal">a</code></span>” = 2, “<span class="quote"><code class="literal">b</code></span>” = 3. The letter “<span class="quote"><code class="literal">A</code></span>” is a symbol, the number 0 is the <span class="bold"><strong>encoding</strong></span> for “<span class="quote"><code class="literal">A</code></span>”, and the combination of all four letters and their encodings is a <span class="bold"><strong>character set</strong></span>. </p><p> Suppose that we want to compare two string values, “<span class="quote"><code class="literal">A</code></span>” and “<span class="quote"><code class="literal">B</code></span>”. The simplest way to do this is to look at the encodings: 0 for “<span class="quote"><code class="literal">A</code></span>” and 1 for “<span class="quote"><code class="literal">B</code></span>”. Because 0 is less than 1, we say “<span class="quote"><code class="literal">A</code></span>” is less than “<span class="quote"><code class="literal">B</code></span>”. What we've just done is apply a collation to our character set. The collation is a set of rules (only one rule in this case): “<span class="quote">compare the encodings.</span>” We call this simplest of all possible collations a <em class="firstterm">binary</em> collation. </p><p> But what if we want to say that the lowercase and uppercase letters are equivalent? Then we would have at least two rules: (1) treat the lowercase letters “<span class="quote"><code class="literal">a</code></span>” and “<span class="quote"><code class="literal">b</code></span>” as equivalent to “<span class="quote"><code class="literal">A</code></span>” and “<span class="quote"><code class="literal">B</code></span>”; (2) then compare the encodings. We call this a <em class="firstterm">case-insensitive</em> collation. It's a little more complex than a binary collation. </p><p> In real life, most character sets have many characters: not just “<span class="quote"><code class="literal">A</code></span>” and “<span class="quote"><code class="literal">B</code></span>” but whole alphabets, sometimes multiple alphabets or eastern writing systems with thousands of characters, along with many special symbols and punctuation marks. Also in real life, most collations have many rules, not just for whether to distinguish lettercase, but also for whether to distinguish accents (an “<span class="quote">accent</span>” is a mark attached to a character as in German “<span class="quote"><code class="literal">Ö</code></span>”), and for multiple-character mappings (such as the rule that “<span class="quote"><code class="literal">Ö</code></span>” = “<span class="quote"><code class="literal">OE</code></span>” in one of the two German collations). </p><p> MySQL can do these things for you: </p><div class="itemizedlist"><ul type="disc"><li><p> Store strings using a variety of character sets </p></li><li><p> Compare strings using a variety of collations </p></li><li><p> Mix strings with different character sets or collations in the same server, the same database, or even the same table </p></li><li><p> Allow specification of character set and collation at any level </p></li></ul></div><p> In these respects, MySQL is far ahead of most other database management systems. However, to use these features effectively, you need to know what character sets and collations are available, how to change the defaults, and how they affect the behavior of string operators and functions. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-mysql"></a>9.1.2. Character Sets and Collations in MySQL</h3></div></div></div><p> The MySQL server can support multiple character sets. To list the available character sets, use the <code class="literal">SHOW CHARACTER SET</code> statement. A partial listing follows. For more complete information, see <a href="internationalization-localization.html#charset-charsets" title="9.1.11. Character Sets and Collations That MySQL Supports">Section 9.1.11, “Character Sets and Collations That MySQL Supports”</a>. </p><pre class="programlisting">mysql> <strong class="userinput"><code>SHOW CHARACTER SET;</code></strong> +----------+-----------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+-----------------------------+---------------------+--------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | | dec8 | DEC West European | dec8_swedish_ci | 1 | | cp850 | DOS West European | cp850_general_ci | 1 | | hp8 | HP West European | hp8_english_ci | 1 | | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 | | latin1 | cp1252 West European | latin1_swedish_ci | 1 | | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 | | swe7 | 7bit Swedish | swe7_swedish_ci | 1 | | ascii | US ASCII | ascii_general_ci | 1 | | ujis | EUC-JP Japanese | ujis_japanese_ci | 3 | | sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 | | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 | | tis620 | TIS620 Thai | tis620_thai_ci | 1 | | euckr | EUC-KR Korean | euckr_korean_ci | 2 | | koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 | | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 | | greek | ISO 8859-7 Greek | greek_general_ci | 1 | | cp1250 | Windows Central European | cp1250_general_ci | 1 | | gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 | | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 | ... </pre><p> Any given character set always has at least one collation. It may have several collations. To list the collations for a character set, use the <code class="literal">SHOW COLLATION</code> statement. For example, to see the collations for the <code class="literal">latin1</code> (cp1252 West European) character set, use this statement to find those collation names that begin with <code class="literal">latin1</code>: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SHOW COLLATION LIKE 'latin1%';</code></strong> +---------------------+---------+----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +---------------------+---------+----+---------+----------+---------+ | latin1_german1_ci | latin1 | 5 | | | 0 | | latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 | | latin1_danish_ci | latin1 | 15 | | | 0 | | latin1_german2_ci | latin1 | 31 | | Yes | 2 | | latin1_bin | latin1 | 47 | | Yes | 1 | | latin1_general_ci | latin1 | 48 | | | 0 | | latin1_general_cs | latin1 | 49 | | | 0 | | latin1_spanish_ci | latin1 | 94 | | | 0 | +---------------------+---------+----+---------+----------+---------+ </pre><p> The <code class="literal">latin1</code> collations have the following meanings: </p><div class="informaltable"><table border="1"><colgroup><col><col></colgroup><tbody><tr><td><span class="bold"><strong>Collation</strong></span></td><td><span class="bold"><strong>Meaning</strong></span></td></tr><tr><td><code class="literal">latin1_german1_ci</code></td><td>German DIN-1</td></tr><tr><td><code class="literal">latin1_swedish_ci</code></td><td>Swedish/Finnish</td></tr><tr><td><code class="literal">latin1_danish_ci</code></td><td>Danish/Norwegian</td></tr><tr><td><code class="literal">latin1_german2_ci</code></td><td>German DIN-2</td></tr><tr><td><code class="literal">latin1_bin</code></td><td>Binary according to <code class="literal">latin1</code> encoding</td></tr><tr><td><code class="literal">latin1_general_ci</code></td><td>Multilingual (Western European)</td></tr><tr><td><code class="literal">latin1_general_cs</code></td><td>Multilingual (ISO Western European), case sensitive</td></tr><tr><td><code class="literal">latin1_spanish_ci</code></td><td>Modern Spanish</td></tr></tbody></table></div><p> Collations have these general characteristics: </p><div class="itemizedlist"><ul type="disc"><li><p> Two different character sets cannot have the same collation. </p></li><li><p> Each character set has one collation that is the <span class="emphasis"><em>default collation</em></span>. For example, the default collation for <code class="literal">latin1</code> is <code class="literal">latin1_swedish_ci</code>. The output for <code class="literal">SHOW CHARACTER SET</code> indicates which collation is the default for each displayed character set. </p></li><li><p> There is a convention for collation names: They start with the name of the character set with which they are associated, they usually include a language name, and they end with <code class="literal">_ci</code> (case insensitive), <code class="literal">_cs</code> (case sensitive), or <code class="literal">_bin</code> (binary). </p></li></ul></div><p> In cases where a character set has multiple collations, it might not be clear which collation is most suitable for a given application. To avoid choosing the wrong collation, it can be helpful to perform some comparisons with representative data values to make sure that a given collation sorts values the way you expect. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-syntax"></a>9.1.3. Specifying Character Sets and Collations</h3></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#charset-server">9.1.3.1. Server Character Set and Collation</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-database">9.1.3.2. Database Character Set and Collation</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-table">9.1.3.3. Table Character Set and Collation</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-column">9.1.3.4. Column Character Set and Collation</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-literal">9.1.3.5. Character String Literal Character Set and Collation</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-national">9.1.3.6. National Character Set</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-examples">9.1.3.7. Examples of Character Set and Collation Assignment</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-compatibility">9.1.3.8. Compatibility with Other DBMSs</a></span></dt></dl></div><p> There are default settings for character sets and collations at four levels: server, database, table, and column. The description in the following sections may appear complex, but it has been found in practice that multiple-level defaulting leads to natural and obvious results. </p><p> <code class="literal">CHARACTER SET</code> is used in clauses that specify a character set. <code class="literal">CHARSET</code> may be used as a synonym for <code class="literal">CHARACTER SET</code>. </p><p> Character set issues affect not only data storage, but also communication between client programs and the MySQL server. If you want the client program to communicate with the server using a character set different from the default, you'll need to indicate which one. For example, to use the <code class="literal">utf8</code> Unicode character set, issue this statement after connecting to the server: </p><pre class="programlisting">SET NAMES 'utf8'; </pre><p> For more information about character set-related issues in client/server communication, see <a href="internationalization-localization.html#charset-connection" title="9.1.4. Connection Character Sets and Collations">Section 9.1.4, “Connection Character Sets and Collations”</a>. </p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-server"></a>9.1.3.1. Server Character Set and Collation</h4></div></div></div><p> MySQL Server has a server character set and a server collation. These can be set at server startup on the command line or in an option file and changed at runtime. </p><p> Initially, the server character set and collation depend on the options that you use when you start <span><strong class="command">mysqld</strong></span>. You can use <code class="option">--character-set-server</code> for the character set. Along with it, you can add <code class="option">--collation-server</code> for the collation. If you don't specify a character set, that is the same as saying <code class="option">--character-set-server=latin1</code>. If you specify only a character set (for example, <code class="literal">latin1</code>) but not a collation, that is the same as saying <code class="option">--character-set-server=latin1</code> <code class="option">--collation-server=latin1_swedish_ci</code> because <code class="literal">latin1_swedish_ci</code> is the default collation for <code class="literal">latin1</code>. Therefore, the following three commands all have the same effect: </p><pre class="programlisting">shell> <strong class="userinput"><code>mysqld</code></strong> shell> <strong class="userinput"><code>mysqld --character-set-server=latin1</code></strong> shell> <strong class="userinput"><code>mysqld --character-set-server=latin1 \</code></strong> <strong class="userinput"><code>--collation-server=latin1_swedish_ci</code></strong> </pre><p> One way to change the settings is by recompiling. If you want to change the default server character set and collation when building from sources, use: <code class="option">--with-charset</code> and <code class="option">--with-collation</code> as arguments for <span><strong class="command">configure</strong></span>. For example: </p><pre class="programlisting">shell> <strong class="userinput"><code>./configure --with-charset=latin1</code></strong> </pre><p> Or: </p><pre class="programlisting">shell> <strong class="userinput"><code>./configure --with-charset=latin1 \</code></strong> <strong class="userinput"><code>--with-collation=latin1_german1_ci</code></strong> </pre><p> Both <span><strong class="command">mysqld</strong></span> and <span><strong class="command">configure</strong></span> verify that the character set/collation combination is valid. If not, each program displays an error message and terminates. </p><p> The server character set and collation are used as default values if the database character set and collation are not specified in <code class="literal">CREATE DATABASE</code> statements. They have no other purpose. </p><p> The current server character set and collation can be determined from the values of the <code class="literal">character_set_server</code> and <code class="literal">collation_server</code> system variables. These variables can be changed at runtime. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-database"></a>9.1.3.2. Database Character Set and Collation</h4></div></div></div><p> Every database has a database character set and a database collation. The <code class="literal">CREATE DATABASE</code> and <code class="literal">ALTER DATABASE</code> statements have optional clauses for specifying the database character set and collation: </p><pre class="programlisting">CREATE DATABASE <em class="replaceable"><code>db_name</code></em> [[DEFAULT] CHARACTER SET <em class="replaceable"><code>charset_name</code></em>] [[DEFAULT] COLLATE <em class="replaceable"><code>collation_name</code></em>] ALTER DATABASE <em class="replaceable"><code>db_name</code></em> [[DEFAULT] CHARACTER SET <em class="replaceable"><code>charset_name</code></em>] [[DEFAULT] COLLATE <em class="replaceable"><code>collation_name</code></em>] </pre><p> The keyword <code class="literal">SCHEMA</code> can be used instead of <code class="literal">DATABASE</code>. </p><p> All database options are stored in a text file named <code class="filename">db.opt</code> that can be found in the database directory. </p><p> The <code class="literal">CHARACTER SET</code> and <code class="literal">COLLATE</code> clauses make it possible to create databases with different character sets and collations on the same MySQL server. </p><p> Example: </p><pre class="programlisting">CREATE DATABASE <em class="replaceable"><code>db_name</code></em> CHARACTER SET latin1 COLLATE latin1_swedish_ci; </pre><p> MySQL chooses the database character set and database collation in the following manner: </p><div class="itemizedlist"><ul type="disc"><li><p> If both <code class="literal">CHARACTER SET <em class="replaceable"><code>X</code></em></code> and <code class="literal">COLLATE <em class="replaceable"><code>Y</code></em></code> were specified, then character set <em class="replaceable"><code>X</code></em> and collation <em class="replaceable"><code>Y</code></em>. </p></li><li><p> If <code class="literal">CHARACTER SET <em class="replaceable"><code>X</code></em></code> was specified without <code class="literal">COLLATE</code>, then character set <em class="replaceable"><code>X</code></em> and its default collation. </p></li><li><p> If <code class="literal">COLLATE <em class="replaceable"><code>Y</code></em></code> was specified without <code class="literal">CHARACTER SET</code>, then the character set associated with <em class="replaceable"><code>Y</code></em> and collation <em class="replaceable"><code>Y</code></em>. </p></li><li><p> Otherwise, the server character set and server collation. </p></li></ul></div><p> The database character set and collation are used as default values if the table character set and collation are not specified in <code class="literal">CREATE TABLE</code> statements. The database character set also is used by <code class="literal">LOAD DATA INFILE</code>. The character set and collation have no other purposes. </p><p> The character set and collation for the default database can be determined from the values of the <code class="literal">character_set_database</code> and <code class="literal">collation_database</code> system variables. The server sets these variables whenever the default database changes. If there is no default database, the variables have the same value as the corresponding server-level system variables, <code class="literal">character_set_server</code> and <code class="literal">collation_server</code>. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-table"></a>9.1.3.3. Table Character Set and Collation</h4></div></div></div><p> Every table has a table character set and a table collation. The <code class="literal">CREATE TABLE</code> and <code class="literal">ALTER TABLE</code> statements have optional clauses for specifying the table character set and collation: </p><pre class="programlisting">CREATE TABLE <em class="replaceable"><code>tbl_name</code></em> (<em class="replaceable"><code>column_list</code></em>) [[DEFAULT] CHARACTER SET <em class="replaceable"><code>charset_name</code></em>] [COLLATE <em class="replaceable"><code>collation_name</code></em>]] ALTER TABLE <em class="replaceable"><code>tbl_name</code></em> [[DEFAULT] CHARACTER SET <em class="replaceable"><code>charset_name</code></em>] [COLLATE <em class="replaceable"><code>collation_name</code></em>] </pre><p> Example: </p><pre class="programlisting">CREATE TABLE t1 ( ... ) CHARACTER SET latin1 COLLATE latin1_danish_ci; </pre><p> MySQL chooses the table character set and collation in the following manner: </p><div class="itemizedlist"><ul type="disc"><li><p> If both <code class="literal">CHARACTER SET <em class="replaceable"><code>X</code></em></code> and <code class="literal">COLLATE <em class="replaceable"><code>Y</code></em></code> were specified, then character set <em class="replaceable"><code>X</code></em> and collation <em class="replaceable"><code>Y</code></em>. </p></li><li><p> If <code class="literal">CHARACTER SET <em class="replaceable"><code>X</code></em></code> was specified without <code class="literal">COLLATE</code>, then character set <em class="replaceable"><code>X</code></em> and its default collation. </p></li><li><p> If <code class="literal">COLLATE <em class="replaceable"><code>Y</code></em></code> was specified without <code class="literal">CHARACTER SET</code>, then the character set associated with <em class="replaceable"><code>Y</code></em> and collation <em class="replaceable"><code>Y</code></em>. </p></li><li><p> Otherwise, the database character set and collation. </p></li></ul></div><p> The table character set and collation are used as default values if the column character set and collation are not specified in individual column definitions. The table character set and collation are MySQL extensions; there are no such things in standard SQL. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-column"></a>9.1.3.4. Column Character Set and Collation</h4></div></div></div><p> Every “<span class="quote">character</span>” column (that is, a column of type <code class="literal">CHAR</code>, <code class="literal">VARCHAR</code>, or <code class="literal">TEXT</code>) has a column character set and a column collation. Column definition syntax for <code class="literal">CREATE TABLE</code> and <code class="literal">ALTER TABLE</code> has optional clauses for specifying the column character set and collation: </p><pre class="programlisting"><em class="replaceable"><code>col_name</code></em> {CHAR | VARCHAR | TEXT} (<em class="replaceable"><code>col_length</code></em>) [CHARACTER SET <em class="replaceable"><code>charset_name</code></em>] [COLLATE <em class="replaceable"><code>collation_name</code></em>] </pre><p> Examples: </p><pre class="programlisting">CREATE TABLE Table1 ( column1 VARCHAR(5) CHARACTER SET latin1 COLLATE latin1_german1_ci ); ALTER TABLE Table1 MODIFY column1 VARCHAR(5) CHARACTER SET latin1 COLLATE latin1_swedish_ci; </pre><p> If you convert a column from one character set to another, <code class="literal">ALTER TABLE</code> attempts to map the data values, but if the character sets are incompatible, there may be data loss. </p><p> MySQL chooses the column character set and collation in the following manner: </p><div class="itemizedlist"><ul type="disc"><li><p> If both <code class="literal">CHARACTER SET <em class="replaceable"><code>X</code></em></code> and <code class="literal">COLLATE <em class="replaceable"><code>Y</code></em></code> were specified, then character set <em class="replaceable"><code>X</code></em> and collation <em class="replaceable"><code>Y</code></em> are used. </p></li><li><p> If <code class="literal">CHARACTER SET <em class="replaceable"><code>X</code></em></code> was specified without <code class="literal">COLLATE</code>, then character set <em class="replaceable"><code>X</code></em> and its default collation are used. </p></li><li><p> If <code class="literal">COLLATE <em class="replaceable"><code>Y</code></em></code> was specified without <code class="literal">CHARACTER SET</code>, then the character set associated with <em class="replaceable"><code>Y</code></em> and collation <em class="replaceable"><code>Y</code></em>. </p></li><li><p> Otherwise, the table character set and collation are used. </p></li></ul></div><p> The <code class="literal">CHARACTER SET</code> and <code class="literal">COLLATE</code> clauses are standard SQL. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-literal"></a>9.1.3.5. Character String Literal Character Set and Collation</h4></div></div></div><p> Every character string literal has a character set and a collation. </p><p> A character string literal may have an optional character set introducer and <code class="literal">COLLATE</code> clause: </p><a class="indexterm" name="id2743422"></a><a class="indexterm" name="id2743434"></a><pre class="programlisting">[_<em class="replaceable"><code>charset_name</code></em>]'<em class="replaceable"><code>string</code></em>' [COLLATE <em class="replaceable"><code>collation_name</code></em>] </pre><p> Examples: </p><pre class="programlisting">SELECT '<em class="replaceable"><code>string</code></em>'; SELECT _latin1'<em class="replaceable"><code>string</code></em>'; SELECT _latin1'<em class="replaceable"><code>string</code></em>' COLLATE latin1_danish_ci; </pre><p> For the simple statement <code class="literal">SELECT '<em class="replaceable"><code>string</code></em>'</code>, the string has the character set and collation defined by the <code class="literal">character_set_connection</code> and <code class="literal">collation_connection</code> system variables. </p><p> The <code class="literal">_<em class="replaceable"><code>charset_name</code></em></code> expression is formally called an <span class="emphasis"><em>introducer</em></span>. It tells the parser, “<span class="quote">the string that is about to follow uses character set <em class="replaceable"><code>X</code></em>.</span>” Because this has confused people in the past, we emphasize that an introducer does not change the string to the introducer character set like <a href="functions.html#function_convert"><code class="literal">CONVERT()</code></a> would do. It does not change the string's value, although padding may occur. The introducer is just a signal. An introducer is also legal before standard hex literal and numeric hex literal notation (<code class="literal">x'<em class="replaceable"><code>literal</code></em>'</code> and <code class="literal">0x<em class="replaceable"><code>nnnn</code></em></code>). </p><p> Examples: </p><pre class="programlisting">SELECT _latin1 x'AABBCC'; SELECT _latin1 0xAABBCC; </pre><p> MySQL determines a literal's character set and collation in the following manner: </p><div class="itemizedlist"><ul type="disc"><li><p> If both <em class="replaceable"><code>_X</code></em> and <code class="literal">COLLATE <em class="replaceable"><code>Y</code></em></code> were specified, then character set <em class="replaceable"><code>X</code></em> and collation <em class="replaceable"><code>Y</code></em> are used. </p></li><li><p> If <em class="replaceable"><code>_X</code></em> is specified but <code class="literal">COLLATE</code> is not specified, then character set <em class="replaceable"><code>X</code></em> and its default collation are used. </p></li><li><p> Otherwise, the character set and collation given by the <code class="literal">character_set_connection</code> and <code class="literal">collation_connection</code> system variables are used. </p></li></ul></div><p> Examples: </p><div class="itemizedlist"><ul type="disc"><li><p> A string with <code class="literal">latin1</code> character set and <code class="literal">latin1_german1_ci</code> collation: </p><pre class="programlisting">SELECT _latin1'Müller' COLLATE latin1_german1_ci; </pre></li><li><p> A string with <code class="literal">latin1</code> character set and its default collation (that is, <code class="literal">latin1_swedish_ci</code>): </p><pre class="programlisting">SELECT _latin1'Müller'; </pre></li><li><p> A string with the connection default character set and collation: </p><pre class="programlisting">SELECT 'Müller'; </pre></li></ul></div><p> Character set introducers and the <code class="literal">COLLATE</code> clause are implemented according to standard SQL specifications. </p><p> An introducer indicates the character set for the following string, but does not change now how the parser performs escape processing within the string. Escapes are always interpreted by the parser according to the character set given by <code class="literal">character_set_connection</code>. </p><p> The following examples show that escape processsing occurs using <code class="literal">character_set_connection</code> even in the presence of an introducer. The examples use <code class="literal">SET NAMES</code> (which changes <code class="literal">character_set_connection</code>, as discussed in <a href="internationalization-localization.html#charset-connection" title="9.1.4. Connection Character Sets and Collations">Section 9.1.4, “Connection Character Sets and Collations”</a>), and display the resulting strings using the <a href="functions.html#function_hex"><code class="literal">HEX()</code></a> function so that the exact string contents can be seen. </p><p> Example 1: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SET NAMES latin1;</code></strong> Query OK, 0 rows affected (0.01 sec) mysql> <strong class="userinput"><code>SELECT HEX('à\n'), HEX(_sjis'à\n');</code></strong> +------------+-----------------+ | HEX('à\n') | HEX(_sjis'à\n') | +------------+-----------------+ | E00A | E00A | +------------+-----------------+ 1 row in set (0.00 sec) </pre><p> Here, “<span class="quote"><code class="literal">à</code></span>” (hex value <code class="literal">E0</code>) is followed by “<span class="quote"><code class="literal">\n</code></span>”, the escape sequence for newline. The escape sequence is interpreted using the <code class="literal">character_set_connection</code> value of <code class="literal">latin1</code> to produce a literal newline (hex value <code class="literal">0A</code>). This happens even for the second string. That is, the introducer of <code class="literal">_sjis</code> does not affect the parser's escape processing. </p><p> Example 2: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SET NAMES sjis;</code></strong> Query OK, 0 rows affected (0.00 sec) mysql> <strong class="userinput"><code>SELECT HEX('à\n'), HEX(_latin1'à\n');</code></strong> +------------+-------------------+ | HEX('à\n') | HEX(_latin1'à\n') | +------------+-------------------+ | E05C6E | E05C6E | +------------+-------------------+ 1 row in set (0.04 sec) </pre><p> Here, <code class="literal">character_set_connection</code> is <code class="literal">sjis</code>, a character set in which the sequence of “<span class="quote"><code class="literal">à</code></span>” followed by “<span class="quote"><code class="literal">\</code></span>” (hex values <code class="literal">05</code> and <code class="literal">5C</code>) is a valid multi-byte character. Hence, the first two bytes of the string are interpreted as a single <code class="literal">sjis</code> character, and the “<span class="quote"><code class="literal">\</code></span>” is not intrepreted as an escape character. The following “<span class="quote"><code class="literal">n</code></span>” (hex value <code class="literal">6E</code>) is not interpreted as part of an escape sequence. This is true even for the second string; the introducer of <code class="literal">_latin1</code> does not affect escape processing. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-national"></a>9.1.3.6. National Character Set</h4></div></div></div><p> Standard SQL defines <code class="literal">NCHAR</code> or <code class="literal">NATIONAL CHAR</code> as a way to indicate that a <code class="literal">CHAR</code> column should use some predefined character set. MySQL 5.0 uses <code class="literal">utf8</code> as this predefined character set. For example, these data type declarations are equivalent: </p><pre class="programlisting">CHAR(10) CHARACTER SET utf8 NATIONAL CHARACTER(10) NCHAR(10) </pre><p> As are these: </p><pre class="programlisting">VARCHAR(10) CHARACTER SET utf8 NATIONAL VARCHAR(10) NCHAR VARCHAR(10) NATIONAL CHARACTER VARYING(10) NATIONAL CHAR VARYING(10) </pre><p> You can use <code class="literal">N'<em class="replaceable"><code>literal</code></em>'</code> (or <code class="literal">n'<em class="replaceable"><code>literal</code></em>'</code>) to create a string in the national character set. These statements are equivalent: </p><pre class="programlisting">SELECT N'some text'; SELECT n'some text'; SELECT _utf8'some text'; </pre><p> For information on upgrading character sets to MySQL 5.0 from versions prior to 4.1, see the <em class="citetitle">MySQL 3.23, 4.0, 4.1 Reference Manual</em>. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-examples"></a>9.1.3.7. Examples of Character Set and Collation Assignment</h4></div></div></div><p> The following examples show how MySQL determines default character set and collation values. </p><p> <span class="bold"><strong>Example 1: Table and Column Definition</strong></span> </p><pre class="programlisting">CREATE TABLE t1 ( c1 CHAR(10) CHARACTER SET latin1 COLLATE latin1_german1_ci ) DEFAULT CHARACTER SET latin2 COLLATE latin2_bin; </pre><p> Here we have a column with a <code class="literal">latin1</code> character set and a <code class="literal">latin1_german1_ci</code> collation. The definition is explicit, so that's straightforward. Notice that there is no problem with storing a <code class="literal">latin1</code> column in a <code class="literal">latin2</code> table. </p><p> <span class="bold"><strong>Example 2: Table and Column Definition</strong></span> </p><pre class="programlisting">CREATE TABLE t1 ( c1 CHAR(10) CHARACTER SET latin1 ) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci; </pre><p> This time we have a column with a <code class="literal">latin1</code> character set and a default collation. Although it might seem natural, the default collation is not taken from the table level. Instead, because the default collation for <code class="literal">latin1</code> is always <code class="literal">latin1_swedish_ci</code>, column <code class="literal">c1</code> has a collation of <code class="literal">latin1_swedish_ci</code> (not <code class="literal">latin1_danish_ci</code>). </p><p> <span class="bold"><strong>Example 3: Table and Column Definition</strong></span> </p><pre class="programlisting">CREATE TABLE t1 ( c1 CHAR(10) ) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci; </pre><p> We have a column with a default character set and a default collation. In this circumstance, MySQL checks the table level to determine the column character set and collation. Consequently, the character set for column <code class="literal">c1</code> is <code class="literal">latin1</code> and its collation is <code class="literal">latin1_danish_ci</code>. </p><p> <span class="bold"><strong>Example 4: Database, Table, and Column Definition</strong></span> </p><pre class="programlisting">CREATE DATABASE d1 DEFAULT CHARACTER SET latin2 COLLATE latin2_czech_ci; USE d1; CREATE TABLE t1 ( c1 CHAR(10) ); </pre><p> We create a column without specifying its character set and collation. We're also not specifying a character set and a collation at the table level. In this circumstance, MySQL checks the database level to determine the table settings, which thereafter become the column settings.) Consequently, the character set for column <code class="literal">c1</code> is <code class="literal">latin2</code> and its collation is <code class="literal">latin2_czech_ci</code>. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-compatibility"></a>9.1.3.8. Compatibility with Other DBMSs</h4></div></div></div><p> For MaxDB compatibility these two statements are the same: </p><pre class="programlisting">CREATE TABLE t1 (f1 CHAR(<em class="replaceable"><code>N</code></em>) UNICODE); CREATE TABLE t1 (f1 CHAR(<em class="replaceable"><code>N</code></em>) CHARACTER SET ucs2); </pre></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-connection"></a>9.1.4. Connection Character Sets and Collations</h3></div></div></div><a class="indexterm" name="id2744345"></a><a class="indexterm" name="id2744357"></a><p> Several character set and collation system variables relate to a client's interaction with the server. Some of these have been mentioned in earlier sections: </p><div class="itemizedlist"><ul type="disc"><li><p> The server character set and collation can be determined from the values of the <code class="literal">character_set_server</code> and <code class="literal">collation_server</code> system variables. </p></li><li><p> The character set and collation of the default database can be determined from the values of the <code class="literal">character_set_database</code> and <code class="literal">collation_database</code> system variables. </p></li></ul></div><p> Additional character set and collation system variables are involved in handling traffic for the connection between a client and the server. Every client has connection-related character set and collation system variables. </p><p> Consider what a “<span class="quote">connection</span>” is: It's what you make when you connect to the server. The client sends SQL statements, such as queries, over the connection to the server. The server sends responses, such as result sets, over the connection back to the client. This leads to several questions about character set and collation handling for client connections, each of which can be answered in terms of system variables: </p><div class="itemizedlist"><ul type="disc"><li><p> What character set is the statement in when it leaves the client? </p><p> The server takes the <code class="literal">character_set_client</code> system variable to be the character set in which statements are sent by the client. </p></li><li><p> What character set should the server translate a statement to after receiving it? </p><p> For this, the server uses the <code class="literal">character_set_connection</code> and <code class="literal">collation_connection</code> system variables. It converts statements sent by the client from <code class="literal">character_set_client</code> to <code class="literal">character_set_connection</code> (except for string literals that have an introducer such as <code class="literal">_latin1</code> or <code class="literal">_utf8</code>). <code class="literal">collation_connection</code> is important for comparisons of literal strings. For comparisons of strings with column values, <code class="literal">collation_connection</code> does not matter because columns have their own collation, which has a higher collation precedence. </p></li><li><p> What character set should the server translate to before shipping result sets or error messages back to the client? </p><p> The <code class="literal">character_set_results</code> system variable indicates the character set in which the server returns query results to the client. This includes result data such as column values, and result metadata such as column names. </p></li></ul></div><p> You can fine-tune the settings for these variables, or you can depend on the defaults (in which case, you can skip the rest of this section). </p><p> There are two statements that affect the connection character sets: </p><pre class="programlisting">SET NAMES '<em class="replaceable"><code>charset_name</code></em>' SET CHARACTER SET <em class="replaceable"><code>charset_name</code></em> </pre><p> <code class="literal">SET NAMES</code> indicates what character set the client will use to send SQL statements to the server. Thus, <code class="literal">SET NAMES 'cp1251'</code> tells the server “<span class="quote">future incoming messages from this client are in character set <code class="literal">cp1251</code>.</span>” It also specifies the character set that the server should use for sending results back to the client. (For example, it indicates what character set to use for column values if you use a <code class="literal">SELECT</code> statement.) </p><p> A <code class="literal">SET NAMES '<em class="replaceable"><code>x</code></em>'</code> statement is equivalent to these three statements: </p><pre class="programlisting">SET character_set_client = <em class="replaceable"><code>x</code></em>; SET character_set_results = <em class="replaceable"><code>x</code></em>; SET character_set_connection = <em class="replaceable"><code>x</code></em>; </pre><p> Setting <code class="literal">character_set_connection</code> to <em class="replaceable"><code>x</code></em> also sets <code class="literal">collation_connection</code> to the default collation for <em class="replaceable"><code>x</code></em>. It is not necessary to set that collation explicitly. To specify a particular collation for the character sets, use the optional <code class="literal">COLLATE</code> clause: </p><pre class="programlisting">SET NAMES '<em class="replaceable"><code>charset_name</code></em>' COLLATE '<em class="replaceable"><code>collation_name</code></em>' </pre><p> <code class="literal">SET CHARACTER SET</code> is similar to <code class="literal">SET NAMES</code> but sets <code class="literal">character_set_connection</code> and <code class="literal">collation_connection</code> to <code class="literal">character_set_database</code> and <code class="literal">collation_database</code>. A <code class="literal">SET CHARACTER SET <em class="replaceable"><code>x</code></em></code> statement is equivalent to these three statements: </p><pre class="programlisting">SET character_set_client = <em class="replaceable"><code>x</code></em>; SET character_set_results = <em class="replaceable"><code>x</code></em>; SET collation_connection = @@collation_database; </pre><p> Setting <code class="literal">collation_connection</code> also sets <code class="literal">character_set_connection</code> to the character set associated with the collation (equivalent to executing <code class="literal">SET character_set_connection = @@character_set_database</code>). It is not necessary to set <code class="literal">character_set_connection</code> explicitly. </p><p> When a client connects, it sends to the server the name of the character set that it wants to use. The server uses the name to set the <code class="literal">character_set_client</code>, <code class="literal">character_set_results</code>, and <code class="literal">character_set_connection</code> system variables. In effect, the server performs a <code class="literal">SET NAMES</code> operation using the character set name. </p><p> With the <span><strong class="command">mysql</strong></span> client, it is not necessary to execute <code class="literal">SET NAMES</code> every time you start up if you want to use a character set different from the default. You can add the <code class="option">--default-character-set</code> option setting to your <span><strong class="command">mysql</strong></span> statement line, or in your option file. For example, the following option file setting changes the three character set variables set to <code class="literal">koi8r</code> each time you invoke <span><strong class="command">mysql</strong></span>: </p><pre class="programlisting">[mysql] default-character-set=koi8r </pre><p> If you are using the <span><strong class="command">mysql</strong></span> client with auto-reconnect enabled (which is not recommended), it is preferable to use the <code class="literal">charset</code> command rather than <code class="literal">SET NAMES</code>. For example: </p><pre class="programlisting">mysql> <strong class="userinput"><code>charset utf8</code></strong> Charset changed </pre><p> The <code class="literal">charset</code> command issues a <code class="literal">SET NAMES</code> statement, and also changes the default character set that is used if <span><strong class="command">mysql</strong></span> reconnects after the connection has dropped. </p><p> Example: Suppose that <code class="literal">column1</code> is defined as <code class="literal">CHAR(5) CHARACTER SET latin2</code>. If you do not say <code class="literal">SET NAMES</code> or <code class="literal">SET CHARACTER SET</code>, then for <code class="literal">SELECT column1 FROM t</code>, the server sends back all the values for <code class="literal">column1</code> using the character set that the client specified when it connected. On the other hand, if you say <code class="literal">SET NAMES 'latin1'</code> or <code class="literal">SET CHARACTER SET latin1</code> before issuing the <code class="literal">SELECT</code> statement, the server converts the <code class="literal">latin2</code> values to <code class="literal">latin1</code> just before sending results back. Conversion may be lossy if there are characters that are not in both character sets. </p><p> If you do not want the server to perform any conversion of result sets, set <code class="literal">character_set_results</code> to <code class="literal">NULL</code>: </p><pre class="programlisting">SET character_set_results = NULL; </pre><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p> Currently, UCS-2 cannot be used as a client character set, which means that <code class="literal">SET NAMES 'ucs2'</code> does not work. </p></div><p> To see the values of the character set and collation system variables that apply to your connection, use these statements: </p><pre class="programlisting">SHOW VARIABLES LIKE 'character_set%'; SHOW VARIABLES LIKE 'collation%'; </pre><p> You must also consider the environment within which your MySQL application executes. For example, if you will send statements using UTF-8 test taken from a file that you create in an editor, you should edit the file with the locale of your environment set to UTF-8 so that the file's encoding is correct and so that the operating system handles it correctly. For a script that executes in a Web environment, the script must handle the character encoding properly for its interaction with the MySQL server, and it must generate pages that correctly indicate the encoding so that browsers know now to display the content of the pages. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-collations"></a>9.1.5. Collation Issues</h3></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#charset-collate">9.1.5.1. Using <code class="literal">COLLATE</code> in SQL Statements</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-collate-precedence">9.1.5.2. <code class="literal">COLLATE</code> Clause Precedence</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-binary-op">9.1.5.3. <code class="literal">BINARY</code> Operator</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-collate-tricky">9.1.5.4. Some Special Cases Where the Collation Determination Is Tricky</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-collation-charset">9.1.5.5. Collations Must Be for the Right Character Set</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-collation-effect">9.1.5.6. Examples of the Effect of Collation</a></span></dt></dl></div><p> The following sections discuss various aspects of character set collations. </p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-collate"></a>9.1.5.1. Using <code class="literal">COLLATE</code> in SQL Statements</h4></div></div></div><p> With the <code class="literal">COLLATE</code> clause, you can override whatever the default collation is for a comparison. <code class="literal">COLLATE</code> may be used in various parts of SQL statements. Here are some examples: </p><div class="itemizedlist"><ul type="disc"><li><p> With <code class="literal">ORDER BY</code>: </p><pre class="programlisting">SELECT k FROM t1 ORDER BY k COLLATE latin1_german2_ci; </pre></li><li><p> With <code class="literal">AS</code>: </p><pre class="programlisting">SELECT k COLLATE latin1_german2_ci AS k1 FROM t1 ORDER BY k1; </pre></li><li><p> With <code class="literal">GROUP BY</code>: </p><pre class="programlisting">SELECT k FROM t1 GROUP BY k COLLATE latin1_german2_ci; </pre></li><li><p> With aggregate functions: </p><pre class="programlisting">SELECT MAX(k COLLATE latin1_german2_ci) FROM t1; </pre></li><li><p> With <code class="literal">DISTINCT</code>: </p><pre class="programlisting">SELECT DISTINCT k COLLATE latin1_german2_ci FROM t1; </pre></li><li><p> With <code class="literal">WHERE</code>: </p><pre class="programlisting"> SELECT * FROM t1 WHERE _latin1 'Müller' COLLATE latin1_german2_ci = k; </pre><pre class="programlisting"> SELECT * FROM t1 WHERE k LIKE _latin1 'Müller' COLLATE latin1_german2_ci; </pre></li><li><p> With <code class="literal">HAVING</code>: </p><pre class="programlisting">SELECT k FROM t1 GROUP BY k HAVING k = _latin1 'Müller' COLLATE latin1_german2_ci; </pre></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-collate-precedence"></a>9.1.5.2. <code class="literal">COLLATE</code> Clause Precedence</h4></div></div></div><p> The <code class="literal">COLLATE</code> clause has high precedence (higher than <a href="functions.html#operator_or"><code class="literal">||</code></a>), so the following two expressions are equivalent: </p><pre class="programlisting">x || y COLLATE z x || (y COLLATE z) </pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-binary-op"></a>9.1.5.3. <code class="literal">BINARY</code> Operator</h4></div></div></div><p> The <code class="literal">BINARY</code> operator casts the string following it to a binary string. This is an easy way to force a comparison to be done byte by byte rather than character by character. <code class="literal">BINARY</code> also causes trailing spaces to be significant. </p><pre class="programlisting">mysql> <strong class="userinput"><code>SELECT 'a' = 'A';</code></strong> -> 1 mysql> <strong class="userinput"><code>SELECT BINARY 'a' = 'A';</code></strong> -> 0 mysql> <strong class="userinput"><code>SELECT 'a' = 'a ';</code></strong> -> 1 mysql> <strong class="userinput"><code>SELECT BINARY 'a' = 'a ';</code></strong> -> 0 </pre><p> <code class="literal">BINARY <em class="replaceable"><code>str</code></em></code> is shorthand for <a href="functions.html#function_cast"><code class="literal">CAST(<em class="replaceable"><code>str</code></em> AS BINARY)</code></a>. </p><p> The <code class="literal">BINARY</code> attribute in character column definitions has a different effect. A character column defined with the <code class="literal">BINARY</code> attribute is assigned the binary collation of the column's character set. Every character set has a binary collation. For example, the binary collation for the <code class="literal">latin1</code> character set is <code class="literal">latin1_bin</code>, so if the table default character set is <code class="literal">latin1</code>, these two column definitions are equivalent: </p><pre class="programlisting">CHAR(10) BINARY CHAR(10) CHARACTER SET latin1 COLLATE latin1_bin </pre><p> The effect of <code class="literal">BINARY</code> as a column attribute differs from its effect prior to MySQL 4.1. Formerly, <code class="literal">BINARY</code> resulted in a column that was treated as a binary string. A binary string is a string of bytes that has no character set or collation, which differs from a non-binary character string that has a binary collation. For both types of strings, comparisons are based on the numeric values of the string unit, but for non-binary strings the unit is the character and some character sets allow multi-byte characters. <a href="data-types.html#binary-varbinary" title="10.4.2. The BINARY and VARBINARY Types">Section 10.4.2, “The <code class="literal">BINARY</code> and <code class="literal">VARBINARY</code> Types”</a>. </p><p> The use of <code class="literal">CHARACTER SET binary</code> in the definition of a <code class="literal">CHAR</code>, <code class="literal">VARCHAR</code>, or <code class="literal">TEXT</code> column causes the column to be treated as a binary data type. For example, the following pairs of definitions are equivalent: </p><pre class="programlisting">CHAR(10) CHARACTER SET binary BINARY(10) VARCHAR(10) CHARACTER SET binary VARBINARY(10) TEXT CHARACTER SET binary BLOB </pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-collate-tricky"></a>9.1.5.4. Some Special Cases Where the Collation Determination Is Tricky</h4></div></div></div><p> In the great majority of statements, it is obvious what collation MySQL uses to resolve a comparison operation. For example, in the following cases, it should be clear that the collation is the collation of column <code class="literal">x</code>: </p><pre class="programlisting">SELECT x FROM T ORDER BY x; SELECT x FROM T WHERE x = x; SELECT DISTINCT x FROM T; </pre><p> However, when multiple operands are involved, there can be ambiguity. For example: </p><pre class="programlisting">SELECT x FROM T WHERE x = 'Y'; </pre><p> Should this query use the collation of the column <code class="literal">x</code>, or of the string literal <code class="literal">'Y'</code>? </p><p> Standard SQL resolves such questions using what used to be called “<span class="quote">coercibility</span>” rules. Basically, this means: Both <code class="literal">x</code> and <code class="literal">'Y'</code> have collations, so which collation takes precedence? This can be difficult to resolve, but the following rules cover most situations: </p><div class="itemizedlist"><ul type="disc"><li><p> An explicit <code class="literal">COLLATE</code> clause has a coercibility of 0. (Not coercible at all.) </p></li><li><p> The concatenation of two strings with different collations has a coercibility of 1. </p></li><li><p> The collation of a column or a stored routine parameter or local variable has a coercibility of 2. </p></li><li><p> A “<span class="quote">system constant</span>” (the string returned by functions such as <a href="functions.html#function_user"><code class="literal">USER()</code></a> or <a href="functions.html#function_version"><code class="literal">VERSION()</code></a>) has a coercibility of 3. </p></li><li><p> A literal's collation has a coercibility of 4. </p></li><li><p> <code class="literal">NULL</code> or an expression that is derived from <code class="literal">NULL</code> has a coercibility of 5. </p></li></ul></div><p> The preceding coercibility values are current as of MySQL 5.0.3. In MySQL 5.0 prior to 5.0.3, there is no system constant or ignorable coercibility. Functions such as <a href="functions.html#function_user"><code class="literal">USER()</code></a> have a coercibility of 2 rather than 3, and literals have a coercibility of 3 rather than 4. </p><p> Those rules resolve ambiguities in the following manner: </p><div class="itemizedlist"><ul type="disc"><li><p> Use the collation with the lowest coercibility value. </p></li><li><p> If both sides have the same coercibility, then: </p><div class="itemizedlist"><ul type="circle"><li><p> If both sides are Unicode, or both sides are not Unicode, it is an error. </p></li><li><p> If one of the sides has a Unicode character set, and another side has a non-Unicode character set, the side with Unicode character set wins, and automatic character set conversion is applied to the non-Unicode side. For example, the following statement will not return an error: </p><pre class="programlisting">SELECT CONCAT(utf8_column, latin1_column) FROM t1; </pre><p> It will return a result, and the character set of the result will be <code class="literal">utf8</code>. The collation of the result will be the collation of <code class="literal">utf8_column</code>. Values of <code class="literal">latin1_column</code> will be automatically converted to <code class="literal">utf8</code> before concatenating. </p></li></ul></div></li></ul></div><p> Although automatic conversion is not in the SQL standard, the SQL standard document does say that every character set is (in terms of supported characters) a “<span class="quote">subset</span>” of Unicode. Because it is a well-known principle that “<span class="quote">what applies to a superset can apply to a subset,</span>” we believe that a collation for Unicode can apply for comparisons with non-Unicode strings. </p><p> Examples: </p><div class="informaltable"><table border="1"><colgroup><col><col></colgroup><tbody><tr><td><code class="literal">column1 = 'A'</code></td><td>Use collation of <code class="literal">column1</code></td></tr><tr><td><code class="literal">column1 = 'A' COLLATE x</code></td><td>Use collation of <code class="literal">'A' COLLATE x</code></td></tr><tr><td><code class="literal">column1 COLLATE x = 'A' COLLATE y</code></td><td>Error</td></tr></tbody></table></div><p> The <a href="functions.html#function_coercibility"><code class="literal">COERCIBILITY()</code></a> function can be used to determine the coercibility of a string expression: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SELECT COERCIBILITY('A' COLLATE latin1_swedish_ci);</code></strong> -> 0 mysql> <strong class="userinput"><code>SELECT COERCIBILITY(VERSION());</code></strong> -> 3 mysql> <strong class="userinput"><code>SELECT COERCIBILITY('A');</code></strong> -> 4 </pre><p> See <a href="functions.html#information-functions" title="11.10.3. Information Functions">Section 11.10.3, “Information Functions”</a>. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-collation-charset"></a>9.1.5.5. Collations Must Be for the Right Character Set</h4></div></div></div><p> Each character set has one or more collations, but each collation is associated with one and only one character set. Therefore, the following statement causes an error message because the <code class="literal">latin2_bin</code> collation is not legal with the <code class="literal">latin1</code> character set: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SELECT _latin1 'x' COLLATE latin2_bin;</code></strong> ERROR 1253 (42000): COLLATION 'latin2_bin' is not valid for CHARACTER SET 'latin1' </pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-collation-effect"></a>9.1.5.6. Examples of the Effect of Collation</h4></div></div></div><p> <span class="bold"><strong>Example 1: Sorting German Umlauts</strong></span> </p><p> Suppose that column <code class="literal">X</code> in table <code class="literal">T</code> has these <code class="literal">latin1</code> column values: </p><pre class="programlisting">Muffler Müller MX Systems MySQL </pre><p> Suppose also that the column values are retrieved using the following statement: </p><pre class="programlisting">SELECT X FROM T ORDER BY X COLLATE <em class="replaceable"><code>collation_name</code></em>; </pre><p> The following table shows the resulting order of the values if we use <code class="literal">ORDER BY</code> with different collations: </p><div class="informaltable"><table border="1"><colgroup><col><col><col></colgroup><tbody><tr><td><code class="literal">latin1_swedish_ci</code></td><td><code class="literal">latin1_german1_ci</code></td><td><code class="literal">latin1_german2_ci</code></td></tr><tr><td>Muffler</td><td>Muffler</td><td>Müller</td></tr><tr><td>MX Systems</td><td>Müller</td><td>Muffler</td></tr><tr><td>Müller</td><td>MX Systems</td><td>MX Systems</td></tr><tr><td>MySQL</td><td>MySQL</td><td>MySQL</td></tr></tbody></table></div><p> The character that causes the different sort orders in this example is the U with two dots over it (<code class="literal">ü</code>), which the Germans call “<span class="quote">U-umlaut.</span>” </p><div class="itemizedlist"><ul type="disc"><li><p> The first column shows the result of the <code class="literal">SELECT</code> using the Swedish/Finnish collating rule, which says that U-umlaut sorts with Y. </p></li><li><p> The second column shows the result of the <code class="literal">SELECT</code> using the German DIN-1 rule, which says that U-umlaut sorts with U. </p></li><li><p> The third column shows the result of the <code class="literal">SELECT</code> using the German DIN-2 rule, which says that U-umlaut sorts with UE. </p></li></ul></div><p> <span class="bold"><strong>Example 2: Searching for German Umlauts</strong></span> </p><p> Suppose that you have three tables that differ only by the character set and collation used: </p><pre class="programlisting">mysql> <strong class="userinput"><code>CREATE TABLE german1 (</code></strong> -> <strong class="userinput"><code> c CHAR(10)</code></strong> -> <strong class="userinput"><code>) CHARACTER SET latin1 COLLATE latin1_german1_ci;</code></strong> mysql> <strong class="userinput"><code>CREATE TABLE german2 (</code></strong> -> <strong class="userinput"><code> c CHAR(10)</code></strong> -> <strong class="userinput"><code>) CHARACTER SET latin1 COLLATE latin1_german2_ci;</code></strong> mysql> <strong class="userinput"><code>CREATE TABLE germanutf8 (</code></strong> -> <strong class="userinput"><code> c CHAR(10)</code></strong> -> <strong class="userinput"><code>) CHARACTER SET utf8 COLLATE utf8_unicode_ci;</code></strong> </pre><p> Each table contains two records: </p><pre class="programlisting">mysql> <strong class="userinput"><code>INSERT INTO german1 VALUES ('Bar'), ('Bär');</code></strong> mysql> <strong class="userinput"><code>INSERT INTO german2 VALUES ('Bar'), ('Bär');</code></strong> mysql> <strong class="userinput"><code>INSERT INTO germanutf8 VALUES ('Bar'), ('Bär');</code></strong> </pre><p> Two of the above collations have an <code class="literal">A = Ä</code> equality, and one has no such equality (<code class="literal">latin1_german2_ci</code>). For that reason, you'll get these results in comparisons: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SELECT * FROM german1 WHERE c = 'Bär';</code></strong> +------+ | c | +------+ | Bar | | Bär | +------+ mysql> <strong class="userinput"><code>SELECT * FROM german2 WHERE c = 'Bär';</code></strong> +------+ | c | +------+ | Bär | +------+ mysql> <strong class="userinput"><code>SELECT * FROM germanutf8 WHERE c = 'Bär';</code></strong> +------+ | c | +------+ | Bar | | Bär | +------+ </pre><p> This is not a bug but rather a consequence of the sorting that <code class="literal">latin1_german1_ci</code> or <code class="literal">utf8_unicode_ci</code> do (the sorting shown is done according to the German DIN 5007 standard). </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-repertoire"></a>9.1.6. String Repertoire</h3></div></div></div><p> As of MySQL 5.0.48, string expressions have an attribute known as <em class="firstterm">repertoire</em>, which can have two values: </p><div class="itemizedlist"><ul type="disc"><li><p> <code class="literal">ASCII</code>: The expression can contain only characters in the Unicode range <code class="literal">U+0000</code> to <code class="literal">U+007F</code>. </p></li><li><p> <code class="literal">UNICODE</code>: The expression can contain characters in the Unicode range <code class="literal">U+0000</code> to <code class="literal">U+FFFF</code>. </p></li></ul></div><p> The <code class="literal">ASCII</code> range is a subset of <code class="literal">UNICODE</code> range, so a string with <code class="literal">ASCII</code> repertoire can be converted safely without loss of information to the character set of any string with <code class="literal">UNICODE</code> repertoire or to a character set that is a superset of <code class="literal">ASCII</code>. (All MySQL character sets are supersets of <code class="literal">ASCII</code> with the exception of <code class="literal">swe7</code>, which reuses some punctuation characters for Swedish accented characters.) The use of repertoire enables character set conversion in expressions for many cases where MySQL would otherwise return an “<span class="quote">illegal mix of collations</span>” error. </p><p> The following discussion provides examples of expressions and their repertoires, and describes how the use of repertoire changes string expression evaluation: </p><div class="itemizedlist"><ul type="disc"><li><p> The repertoire for string constants depends on string content: </p><pre class="programlisting">SET NAMES utf8; SELECT 'abc'; SELECT _utf8'def'; SELECT N'MySQL'; </pre><p> Although the character set is <code class="literal">utf8</code> in each of the preceding cases, the strings do not actually contain any characters outside the ASCII range, so their repertoire is <code class="literal">ASCII</code> rather than <code class="literal">UNICODE</code>. </p></li><li><p> Columns having the <code class="literal">ascii</code> character set have <code class="literal">ASCII</code> repertoire because of their character set. In the following table, <code class="literal">c1</code> has <code class="literal">ASCII</code> repertoire: </p><pre class="programlisting">CREATE TABLE t1 (c1 CHAR(1) CHARACTER SET ascii); </pre><p> The following example illustrates how repertoire enables a result to be determined in a case where an error occurs without repertoire: </p><pre class="programlisting">CREATE TABLE t1 ( c1 CHAR(1) CHARACTER SET latin1, c2 CHAR(1) CHARACTER SET ascii ); INSERT INTO t1 VALUES ('a','b'); SELECT CONCAT(c1,c2) FROM t1; </pre><p> Without repertoire, this error occurs: </p><pre class="programlisting">ERROR 1267 (HY000): Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (ascii_general_ci,IMPLICIT) for operation 'concat' </pre><p> Using repertoire, subset to superset (<code class="literal">ascii</code> to <code class="literal">latin1</code>) conversion can occur and a result is returned: </p><pre class="programlisting">+---------------+ | CONCAT(c1,c2) | +---------------+ | ab | +---------------+ </pre></li><li><p> Functions with one string argument inherit the repertoire of their argument. The result of <a href="functions.html#function_upper"><code class="literal">UPPER(_utf8'<em class="replaceable"><code>abc</code></em>')</code></a> has <code class="literal">ASCII</code> repertoire, because its argument has <code class="literal">ASCII</code> repertoire. </p></li><li><p> For functions that return a string but do not have string arguments and use <code class="literal">character_set_connection</code> as the result character set, the result repertoire is <code class="literal">ASCII</code> if <code class="literal">character_set_connection</code> is <code class="literal">ascii</code>, and <code class="literal">UNICODE</code> otherwise: </p><pre class="programlisting">FORMAT(<em class="replaceable"><code>numeric_column</code></em>, 4); </pre><p> Use of repertoire changes how MySQL evaluates the following example: </p><pre class="programlisting">SET NAMES ascii; CREATE TABLE t1 (a INT, b VARCHAR(10) CHARACTER SET latin1); INSERT INTO t1 VALUES (1,'b'); SELECT CONCAT(FORMAT(a, 4), b) FROM t1; </pre><p> Without repertoire, this error occurs: </p><pre class="programlisting">ERROR 1267 (HY000): Illegal mix of collations (ascii_general_ci,COERCIBLE) and (latin1_swedish_ci,IMPLICIT) for operation 'concat' </pre><p> With repertoire, a result is returned: </p><pre class="programlisting">+-------------------------+ | CONCAT(FORMAT(a, 4), b) | +-------------------------+ | 1.0000b | +-------------------------+ </pre></li><li><p> Functions with two or more string arguments use the “<span class="quote">widest</span>” argument repertoire for the result repertoire (<code class="literal">UNICODE</code> is wider than <code class="literal">ASCII</code>). Consider the following <a href="functions.html#function_concat"><code class="literal">CONCAT()</code></a> calls: </p><pre class="programlisting">CONCAT(_ucs2 0x0041, _ucs2 0x0042) CONCAT(_ucs2 0x0041, _ucs2 0x00C2) </pre><p> For the first call, the repertoire is <code class="literal">ASCII</code> because both arguments are within the range of the <code class="literal">ascii</code> character set. For the second call, the repertoire is <code class="literal">UNICODE</code> because the second argument is outside the <code class="literal">ascii</code> character set range. </p></li><li><p> The repertoire for function return values is determined based only on the repertoire of the arguments that affect the result's character set and collation. </p><pre class="programlisting">IF(column1 < column2, 'smaller', 'greater') </pre><p> The result repertoire is <code class="literal">ASCII</code> because the two string arguments (the second argument and the third argument) both have <code class="literal">ASCII</code> repertoire. The first argument does not matter for the result repertoire, even if the expression uses string values. </p></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-operations"></a>9.1.7. Operations Affected by Character Set Support</h3></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#charset-result">9.1.7.1. Result Strings</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-convert">9.1.7.2. <code class="literal">CONVERT()</code> and <code class="literal">CAST()</code></a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-show">9.1.7.3. <code class="literal">SHOW</code> Statements and <code class="literal">INFORMATION_SCHEMA</code></a></span></dt></dl></div><p> This section describes operations that take character set information into account. </p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-result"></a>9.1.7.1. Result Strings</h4></div></div></div><p> MySQL has many operators and functions that return a string. This section answers the question: What is the character set and collation of such a string? </p><p> For simple functions that take string input and return a string result as output, the output's character set and collation are the same as those of the principal input value. For example, <a href="functions.html#function_upper"><code class="literal">UPPER(<em class="replaceable"><code>X</code></em>)</code></a> returns a string whose character string and collation are the same as that of <em class="replaceable"><code>X</code></em>. The same applies for <a href="functions.html#function_instr"><code class="literal">INSTR()</code></a>, <a href="functions.html#function_lcase"><code class="literal">LCASE()</code></a>, <a href="functions.html#function_lower"><code class="literal">LOWER()</code></a>, <a href="functions.html#function_ltrim"><code class="literal">LTRIM()</code></a>, <a href="functions.html#function_mid"><code class="literal">MID()</code></a>, <a href="functions.html#function_repeat"><code class="literal">REPEAT()</code></a>, <a href="functions.html#function_replace"><code class="literal">REPLACE()</code></a>, <a href="functions.html#function_reverse"><code class="literal">REVERSE()</code></a>, <a href="functions.html#function_right"><code class="literal">RIGHT()</code></a>, <a href="functions.html#function_rpad"><code class="literal">RPAD()</code></a>, <a href="functions.html#function_rtrim"><code class="literal">RTRIM()</code></a>, <a href="functions.html#function_soundex"><code class="literal">SOUNDEX()</code></a>, <a href="functions.html#function_substring"><code class="literal">SUBSTRING()</code></a>, <a href="functions.html#function_trim"><code class="literal">TRIM()</code></a>, <a href="functions.html#function_ucase"><code class="literal">UCASE()</code></a>, and <a href="functions.html#function_upper"><code class="literal">UPPER()</code></a>. </p><p> Note: The <a href="functions.html#function_replace"><code class="literal">REPLACE()</code></a> function, unlike all other functions, always ignores the collation of the string input and performs a case-sensitive comparison. </p><p> If a string input or function result is a binary string, the string has no character set or collation. This can be check by using the <a href="functions.html#function_charset"><code class="literal">CHARSET()</code></a> and <a href="functions.html#function_collation"><code class="literal">COLLATION()</code></a> functions, both of which return <code class="literal">binary</code> to indicate that their argument is a binary string: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SELECT CHARSET(BINARY 'a'), COLLATION(BINARY 'a');</code></strong> +---------------------+-----------------------+ | CHARSET(BINARY 'a') | COLLATION(BINARY 'a') | +---------------------+-----------------------+ | binary | binary | +---------------------+-----------------------+ </pre><p> For operations that combine multiple string inputs and return a single string output, the “<span class="quote">aggregation rules</span>” of standard SQL apply for determining the collation of the result: </p><div class="itemizedlist"><ul type="disc"><li><p> If an explicit <code class="literal">COLLATE <em class="replaceable"><code>X</code></em></code> occurs, use <em class="replaceable"><code>X</code></em>. </p></li><li><p> If explicit <code class="literal">COLLATE <em class="replaceable"><code>X</code></em></code> and <code class="literal">COLLATE <em class="replaceable"><code>Y</code></em></code> occur, raise an error. </p></li><li><p> Otherwise, if all collations are <em class="replaceable"><code>X</code></em>, use <em class="replaceable"><code>X</code></em>. </p></li><li><p> Otherwise, the result has no collation. </p></li></ul></div><p> For example, with <code class="literal">CASE ... WHEN a THEN b WHEN b THEN c COLLATE <em class="replaceable"><code>X</code></em> END</code>, the resulting collation is <em class="replaceable"><code>X</code></em>. The same applies for <code class="literal">UNION</code>, <a href="functions.html#operator_or"><code class="literal">||</code></a>, <a href="functions.html#function_concat"><code class="literal">CONCAT()</code></a>, <a href="functions.html#function_elt"><code class="literal">ELT()</code></a>, <a href="functions.html#function_greatest"><code class="literal">GREATEST()</code></a>, <a href="functions.html#function_if"><code class="literal">IF()</code></a>, and <a href="functions.html#function_least"><code class="literal">LEAST()</code></a>. </p><p> For operations that convert to character data, the character set and collation of the strings that result from the operations are defined by the <code class="literal">character_set_connection</code> and <code class="literal">collation_connection</code> system variables. This applies only to <a href="functions.html#function_cast"><code class="literal">CAST()</code></a>, <a href="functions.html#function_conv"><code class="literal">CONV()</code></a>, <a href="functions.html#function_format"><code class="literal">FORMAT()</code></a>, <a href="functions.html#function_hex"><code class="literal">HEX()</code></a>, <a href="functions.html#function_space"><code class="literal">SPACE()</code></a>. Before MySQL 5.0.15, it also applies to <a href="functions.html#function_char"><code class="literal">CHAR()</code></a>. </p><p> If you are uncertain about the character set or collation of the result returned by a string function, you can use the <a href="functions.html#function_charset"><code class="literal">CHARSET()</code></a> or <a href="functions.html#function_collation"><code class="literal">COLLATION()</code></a> function to find out: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SELECT USER(), CHARSET(USER()), COLLATION(USER());</code></strong> +----------------+-----------------+-------------------+ | USER() | CHARSET(USER()) | COLLATION(USER()) | +----------------+-----------------+-------------------+ | test@localhost | utf8 | utf8_general_ci | +----------------+-----------------+-------------------+ </pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-convert"></a>9.1.7.2. <a href="functions.html#function_convert"><code class="literal">CONVERT()</code></a> and <a href="functions.html#function_cast"><code class="literal">CAST()</code></a></h4></div></div></div><p> <a href="functions.html#function_convert"><code class="literal">CONVERT()</code></a> provides a way to convert data between different character sets. The syntax is: </p><pre class="programlisting">CONVERT(<em class="replaceable"><code>expr</code></em> USING <em class="replaceable"><code>transcoding_name</code></em>) </pre><p> In MySQL, transcoding names are the same as the corresponding character set names. </p><p> Examples: </p><pre class="programlisting">SELECT CONVERT(_latin1'Müller' USING utf8); INSERT INTO utf8table (utf8column) SELECT CONVERT(latin1field USING utf8) FROM latin1table; </pre><p> <a href="functions.html#function_convert"><code class="literal">CONVERT(... USING ...)</code></a> is implemented according to the standard SQL specification. </p><p> You may also use <a href="functions.html#function_cast"><code class="literal">CAST()</code></a> to convert a string to a different character set. The syntax is: </p><pre class="programlisting">CAST(<em class="replaceable"><code>character_string</code></em> AS <em class="replaceable"><code>character_data_type</code></em> CHARACTER SET <em class="replaceable"><code>charset_name</code></em>) </pre><p> Example: </p><pre class="programlisting">SELECT CAST(_latin1'test' AS CHAR CHARACTER SET utf8); </pre><p> If you use <a href="functions.html#function_cast"><code class="literal">CAST()</code></a> without specifying <code class="literal">CHARACTER SET</code>, the resulting character set and collation are defined by the <code class="literal">character_set_connection</code> and <code class="literal">collation_connection</code> system variables. If you use <a href="functions.html#function_cast"><code class="literal">CAST()</code></a> with <code class="literal">CHARACTER SET X</code>, the resulting character set and collation are <code class="literal">X</code> and the default collation of <code class="literal">X</code>. </p><p> You may not use a <code class="literal">COLLATE</code> clause inside a <a href="functions.html#function_cast"><code class="literal">CAST()</code></a>, but you may use it outside. That is, <a href="functions.html#function_cast"><code class="literal">CAST(... COLLATE ...)</code></a> is illegal, but <a href="functions.html#function_cast"><code class="literal">CAST(...) COLLATE ...</code></a> is legal. </p><p> Example: </p><pre class="programlisting">SELECT CAST(_latin1'test' AS CHAR CHARACTER SET utf8) COLLATE utf8_bin; </pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-show"></a>9.1.7.3. <code class="literal">SHOW</code> Statements and <code class="literal">INFORMATION_SCHEMA</code></h4></div></div></div><p> Several <code class="literal">SHOW</code> statements provide additional character set information. These include <code class="literal">SHOW CHARACTER SET</code>, <code class="literal">SHOW COLLATION</code>, <code class="literal">SHOW CREATE DATABASE</code>, <code class="literal">SHOW CREATE TABLE</code> and <code class="literal">SHOW COLUMNS</code>. These statements are described here briefly. For more information, see <a href="sql-syntax.html#show" title="12.5.4. SHOW Syntax">Section 12.5.4, “<code class="literal">SHOW</code> Syntax”</a>. </p><p> <code class="literal">INFORMATION_SCHEMA</code> has several tables that contain information similar to that displayed by the <code class="literal">SHOW</code> statements. For example, the <code class="literal">CHARACTER_SETS</code> and <code class="literal">COLLATIONS</code> tables contain the information displayed by <code class="literal">SHOW CHARACTER SET</code> and <code class="literal">SHOW COLLATION</code>. <a href="information-schema.html" title="Chapter 21. INFORMATION_SCHEMA Tables">Chapter 21, <i><code class="literal">INFORMATION_SCHEMA</code> Tables</i></a>. </p><p> The <code class="literal">SHOW CHARACTER SET</code> command shows all available character sets. It takes an optional <code class="literal">LIKE</code> clause that indicates which character set names to match. For example: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SHOW CHARACTER SET LIKE 'latin%';</code></strong> +---------+-----------------------------+-------------------+--------+ | Charset | Description | Default collation | Maxlen | +---------+-----------------------------+-------------------+--------+ | latin1 | cp1252 West European | latin1_swedish_ci | 1 | | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 | | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 | | latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 | +---------+-----------------------------+-------------------+--------+ </pre><p> The output from <code class="literal">SHOW COLLATION</code> includes all available character sets. It takes an optional <code class="literal">LIKE</code> clause that indicates which collation names to match. For example: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SHOW COLLATION LIKE 'latin1%';</code></strong> +-------------------+---------+----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +-------------------+---------+----+---------+----------+---------+ | latin1_german1_ci | latin1 | 5 | | | 0 | | latin1_swedish_ci | latin1 | 8 | Yes | Yes | 0 | | latin1_danish_ci | latin1 | 15 | | | 0 | | latin1_german2_ci | latin1 | 31 | | Yes | 2 | | latin1_bin | latin1 | 47 | | Yes | 0 | | latin1_general_ci | latin1 | 48 | | | 0 | | latin1_general_cs | latin1 | 49 | | | 0 | | latin1_spanish_ci | latin1 | 94 | | | 0 | +-------------------+---------+----+---------+----------+---------+ </pre><p> <code class="literal">SHOW CREATE DATABASE</code> displays the <code class="literal">CREATE DATABASE</code> statement that creates a given database: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SHOW CREATE DATABASE test;</code></strong> +----------+-----------------------------------------------------------------+ | Database | Create Database | +----------+-----------------------------------------------------------------+ | test | CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET latin1 */ | +----------+-----------------------------------------------------------------+ </pre><p> If no <code class="literal">COLLATE</code> clause is shown, the default collation for the character set applies. </p><p> <code class="literal">SHOW CREATE TABLE</code> is similar, but displays the <code class="literal">CREATE TABLE</code> statement to create a given table. The column definitions indicate any character set specifications, and the table options include character set information. </p><p> The <code class="literal">SHOW COLUMNS</code> statement displays the collations of a table's columns when invoked as <code class="literal">SHOW FULL COLUMNS</code>. Columns with <code class="literal">CHAR</code>, <code class="literal">VARCHAR</code>, or <code class="literal">TEXT</code> data types have collations. Numeric and other non-character types have no collation (indicated by <code class="literal">NULL</code> as the <code class="literal">Collation</code> value). For example: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SHOW FULL COLUMNS FROM person\G</code></strong> *************************** 1. row *************************** Field: id Type: smallint(5) unsigned Collation: NULL Null: NO Key: PRI Default: NULL Extra: auto_increment Privileges: select,insert,update,references Comment: *************************** 2. row *************************** Field: name Type: char(60) Collation: latin1_swedish_ci Null: NO Key: Default: Extra: Privileges: select,insert,update,references Comment: </pre><p> The character set is not part of the display but is implied by the collation name. </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-unicode"></a>9.1.8. Unicode Support</h3></div></div></div><p> MySQL 5.0 supports two character sets for storing Unicode data: </p><div class="itemizedlist"><ul type="disc"><li><p> <code class="literal">ucs2</code>, the UCS-2 Unicode character set. </p></li><li><p> <code class="literal">utf8</code>, the UTF-8 encoding of the Unicode character set. </p></li></ul></div><p> In UCS-2 (binary Unicode representation), every character is represented by a two-byte Unicode code with the most significant byte first. For example: <code class="literal">LATIN CAPITAL LETTER A</code> has the code <code class="literal">0x0041</code> and it is stored as a two-byte sequence: <code class="literal">0x00 0x41</code>. <code class="literal">CYRILLIC SMALL LETTER YERU</code> (Unicode <code class="literal">0x044B</code>) is stored as a two-byte sequence: <code class="literal">0x04 0x4B</code>. For Unicode characters and their codes, please refer to the <a href="http://www.unicode.org/" target="_top">Unicode Home Page</a>. </p><p> The MySQL implementation of UCS-2 stores characters in big-endian byte order and does not use a byte order mark (BOM) at the beginning of UCS-2 values. Other database systems might use little-ending byte order or a BOM, in which case conversion of UCS-2 values will need to be performed when transferring data between those systems and MySQL. </p><p> Currently, UCS-2 cannot be used as a client character set, which means that <code class="literal">SET NAMES 'ucs2'</code> does not work. </p><p> UTF-8 (Unicode Transformation Format with 8-bit units) is an alternative way to store Unicode data. It is implemented according to RFC 3629. The idea of UTF-8 is that various Unicode characters are encoded using byte sequences of different lengths: </p><div class="itemizedlist"><ul type="disc"><li><p> Basic Latin letters, digits, and punctuation signs use one byte. </p></li><li><p> Most European and Middle East script letters fit into a two-byte sequence: extended Latin letters (with tilde, macron, acute, grave and other accents), Cyrillic, Greek, Armenian, Hebrew, Arabic, Syriac, and others. </p></li><li><p> Korean, Chinese, and Japanese ideographs use three-byte sequences. </p></li></ul></div><p> RFC 3629 describes encoding sequences that take from one to four bytes. Currently, MySQL support for UTF-8 does not include four-byte sequences. (An older standard for UTF-8 encoding is given by RFC 2279, which describes UTF-8 sequences that take from one to six bytes. RFC 3629 renders RFC 2279 obsolete; for this reason, sequences with five and six bytes are no longer used.) </p><p> MySQL uses no BOM for UTF-8 values. </p><p> <span class="bold"><strong>Tip</strong></span>: To save space with UTF-8, use <code class="literal">VARCHAR</code> instead of <code class="literal">CHAR</code>. Otherwise, MySQL must reserve three bytes for each character in a <code class="literal">CHAR CHARACTER SET utf8</code> column because that is the maximum possible length. For example, MySQL must reserve 30 bytes for a <code class="literal">CHAR(10) CHARACTER SET utf8</code> column. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-metadata"></a>9.1.9. UTF-8 for Metadata</h3></div></div></div><p> <em class="firstterm">Metadata</em> is “<span class="quote">the data about the data.</span>” Anything that <span class="emphasis"><em>describes</em></span> the database — as opposed to being the <span class="emphasis"><em>contents</em></span> of the database — is metadata. Thus column names, database names, usernames, version names, and most of the string results from <code class="literal">SHOW</code> are metadata. This is also true of the contents of tables in <code class="literal">INFORMATION_SCHEMA</code>, because those tables by definition contain information about database objects. </p><p> Representation of metadata must satisfy these requirements: </p><div class="itemizedlist"><ul type="disc"><li><p> All metadata must be in the same character set. Otherwise, neither the <code class="literal">SHOW</code> commands nor <code class="literal">SELECT</code> statements for tables in <code class="literal">INFORMATION_SCHEMA</code> would work properly because different rows in the same column of the results of these operations would be in different character sets. </p></li><li><p> Metadata must include all characters in all languages. Otherwise, users would not be able to name columns and tables using their own languages. </p></li></ul></div><p> To satisfy both requirements, MySQL stores metadata in a Unicode character set, namely UTF-8. This does not cause any disruption if you never use accented or non-Latin characters. But if you do, you should be aware that metadata is in UTF-8. </p><p> The metadata requirements mean that the return values of the <a href="functions.html#function_user"><code class="literal">USER()</code></a>, <a href="functions.html#function_current-user"><code class="literal">CURRENT_USER()</code></a>, <a href="functions.html#function_session-user"><code class="literal">SESSION_USER()</code></a>, <a href="functions.html#function_system-user"><code class="literal">SYSTEM_USER()</code></a>, <a href="functions.html#function_database"><code class="literal">DATABASE()</code></a>, and <a href="functions.html#function_version"><code class="literal">VERSION()</code></a> functions have the UTF-8 character set by default. </p><p> The server sets the <code class="literal">character_set_system</code> system variable to the name of the metadata character set: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SHOW VARIABLES LIKE 'character_set_system';</code></strong> +----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | character_set_system | utf8 | +----------------------+-------+ </pre><p> Storage of metadata using Unicode does <span class="emphasis"><em>not</em></span> mean that the server returns headers of columns and the results of <code class="literal">DESCRIBE</code> functions in the <code class="literal">character_set_system</code> character set by default. When you use <code class="literal">SELECT column1 FROM t</code>, the name <code class="literal">column1</code> itself is returned from the server to the client in the character set determined by the value of the <code class="literal">character_set_results</code> system variable, which has a default value of <code class="literal">latin1</code>. If you want the server to pass metadata results back in a different character set, use the <code class="literal">SET NAMES</code> statement to force the server to perform character set conversion. <code class="literal">SET NAMES</code> sets the <code class="literal">character_set_results</code> and other related system variables. (See <a href="internationalization-localization.html#charset-connection" title="9.1.4. Connection Character Sets and Collations">Section 9.1.4, “Connection Character Sets and Collations”</a>.) Alternatively, a client program can perform the conversion after receiving the result from the server. It is more efficient for the client perform the conversion, but this option is not always available for all clients. </p><p> If <code class="literal">character_set_results</code> is set to <code class="literal">NULL</code>, no conversion is performed and the server returns metadata using its original character set (the set indicated by <code class="literal">character_set_system</code>). </p><p> Error messages returned from the server to the client are converted to the client character set automatically, as with metadata. </p><p> If you are using (for example) the <a href="functions.html#function_user"><code class="literal">USER()</code></a> function for comparison or assignment within a single statement, don't worry. MySQL performs some automatic conversion for you. </p><pre class="programlisting">SELECT * FROM Table1 WHERE USER() = latin1_column; </pre><p> This works because the contents of <code class="literal">latin1_column</code> are automatically converted to UTF-8 before the comparison. </p><pre class="programlisting">INSERT INTO Table1 (latin1_column) SELECT USER(); </pre><p> This works because the contents of <a href="functions.html#function_user"><code class="literal">USER()</code></a> are automatically converted to <code class="literal">latin1</code> before the assignment. Automatic conversion is not fully implemented yet, but should work correctly in a later version. </p><p> Although automatic conversion is not in the SQL standard, the SQL standard document does say that every character set is (in terms of supported characters) a “<span class="quote">subset</span>” of Unicode. Because it is a well-known principle that “<span class="quote">what applies to a superset can apply to a subset,</span>” we believe that a collation for Unicode can apply for comparisons with non-Unicode strings. For more information about coercion of strings, see <a href="internationalization-localization.html#charset-collate-tricky" title="9.1.5.4. Some Special Cases Where the Collation Determination Is Tricky">Section 9.1.5.4, “Some Special Cases Where the Collation Determination Is Tricky”</a>. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-conversion"></a>9.1.10. Column Character Set Conversion</h3></div></div></div><p> To convert a binary or non-binary string column to use a particular character set, use <code class="literal">ALTER TABLE</code>. For successful conversion to occur, one of the following conditions must apply: </p><div class="itemizedlist"><ul type="disc"><li><p> If the column has a binary data type (<code class="literal">BINARY</code>, <code class="literal">VARBINARY</code>, <code class="literal">BLOB</code>), all the values that it contains must be encoded using a single character set (the character set you're converting the column to). If you use a binary column to store information in multiple character sets, MySQL has no way to know which values use which character set and cannot convert the data properly. </p></li><li><p> If the column has a non-binary data type (<code class="literal">CHAR</code>, <code class="literal">VARCHAR</code>, <code class="literal">TEXT</code>), its contents should be encoded in the column's character set, not some other character set. If the contents are encoded in a different character set, you can convert the column to use a binary data type first, and then to a non-binary column with the desired character set. </p></li></ul></div><p> Suppose that a table <code class="literal">t</code> has a binary column named <code class="literal">col1</code> defined as <code class="literal">BINARY(50)</code>. Assuming that the information in the column is encoded using a single character set, you can convert it to a non-binary column that has that character set. For example, if <code class="literal">col1</code> contains binary data representing characters in the <code class="literal">greek</code> character set, you can convert it as follows: </p><pre class="programlisting">ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET greek; </pre><p> Suppose that table <code class="literal">t</code> has a non-binary column named <code class="literal">col1</code> defined as <code class="literal">CHAR(50) CHARACTER SET latin1</code> but you want to convert it to use <code class="literal">utf8</code> so that you can store values from many languages. The following statement accomplishes this: </p><pre class="programlisting">ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8; </pre><p> Conversion may be lossy if the column contains characters that are not in both character sets. </p><p> A special case occurs if you have old tables from MySQL 4.0 or earlier where a non-binary column contains values that actually are encoded in a character set different from the server's default character set. For example, an application might have stored <code class="literal">sjis</code> values in a column, even though MySQL's default character set was <code class="literal">latin1</code>. It is possible to convert the column to use the proper character set but an additional step is required. Suppose that the server's default character set was <code class="literal">latin1</code> and <code class="literal">col1</code> is defined as <code class="literal">CHAR(50)</code> but its contents are <code class="literal">sjis</code> values. The first step is to convert the column to a binary data type, which removes the existing character set information without performing any character conversion: </p><pre class="programlisting">ALTER TABLE t MODIFY col1 BINARY(50); </pre><p> The next step is to convert the column to a non-binary data type with the proper character set: </p><pre class="programlisting">ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET sjis; </pre><p> This procedure requires that the table not have been modified already with statements such as <code class="literal">INSERT</code> or <code class="literal">UPDATE</code> after an upgrade to MySQL 4.1 or later. In that case, MySQL would store new values in the column using <code class="literal">latin1</code>, and the column will contain a mix of <code class="literal">sjis</code> and <code class="literal">latin1</code> values and cannot be converted properly. </p><p> If you specified attributes when creating a column initially, you should also specify them when altering the table with <code class="literal">ALTER TABLE</code>. For example, if you specified <code class="literal">NOT NULL</code> and an explicit <code class="literal">DEFAULT</code> value, you should also provide them in the <code class="literal">ALTER TABLE</code> statement. Otherwise, the resulting column definition will not include those attributes. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-charsets"></a>9.1.11. Character Sets and Collations That MySQL Supports</h3></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#charset-unicode-sets">9.1.11.1. Unicode Character Sets</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-we-sets">9.1.11.2. West European Character Sets</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-ce-sets">9.1.11.3. Central European Character Sets</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-se-me-sets">9.1.11.4. South European and Middle East Character Sets</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-baltic-sets">9.1.11.5. Baltic Character Sets</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-cyrillic-sets">9.1.11.6. Cyrillic Character Sets</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-asian-sets">9.1.11.7. Asian Character Sets</a></span></dt></dl></div><p> MySQL supports 70+ collations for 30+ character sets. This section indicates which character sets MySQL supports. There is one subsection for each group of related character sets. For each character set, the allowable collations are listed. </p><p> You can always list the available character sets and their default collations with the <code class="literal">SHOW CHARACTER SET</code> statement: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SHOW CHARACTER SET;</code></strong> +----------+-----------------------------+---------------------+ | Charset | Description | Default collation | +----------+-----------------------------+---------------------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | | dec8 | DEC West European | dec8_swedish_ci | | cp850 | DOS West European | cp850_general_ci | | hp8 | HP West European | hp8_english_ci | | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | | latin1 | cp1252 West European | latin1_swedish_ci | | latin2 | ISO 8859-2 Central European | latin2_general_ci | | swe7 | 7bit Swedish | swe7_swedish_ci | | ascii | US ASCII | ascii_general_ci | | ujis | EUC-JP Japanese | ujis_japanese_ci | | sjis | Shift-JIS Japanese | sjis_japanese_ci | | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | | tis620 | TIS620 Thai | tis620_thai_ci | | euckr | EUC-KR Korean | euckr_korean_ci | | koi8u | KOI8-U Ukrainian | koi8u_general_ci | | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | | greek | ISO 8859-7 Greek | greek_general_ci | | cp1250 | Windows Central European | cp1250_general_ci | | gbk | GBK Simplified Chinese | gbk_chinese_ci | | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | | armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | | utf8 | UTF-8 Unicode | utf8_general_ci | | ucs2 | UCS-2 Unicode | ucs2_general_ci | | cp866 | DOS Russian | cp866_general_ci | | keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | | macce | Mac Central European | macce_general_ci | | macroman | Mac West European | macroman_general_ci | | cp852 | DOS Central European | cp852_general_ci | | latin7 | ISO 8859-13 Baltic | latin7_general_ci | | cp1251 | Windows Cyrillic | cp1251_general_ci | | cp1256 | Windows Arabic | cp1256_general_ci | | cp1257 | Windows Baltic | cp1257_general_ci | | binary | Binary pseudo charset | binary | | geostd8 | GEOSTD8 Georgian | geostd8_general_ci | | cp932 | SJIS for Windows Japanese | cp932_japanese_ci | | eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | +----------+-----------------------------+---------------------+ </pre><p> In cases where a character set has multiple collations, it might not be clear which collation is most suitable for a given application. To avoid choosing the wrong collation, it can be helpful to perform some comparisons with representative data values to make sure that a given collation sorts values the way you expect. </p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-unicode-sets"></a>9.1.11.1. Unicode Character Sets</h4></div></div></div><p> MySQL has two Unicode character sets. You can store text in about 650 languages using these character sets. </p><div class="itemizedlist"><ul type="disc"><li><p> <code class="literal">ucs2</code> (UCS-2 Unicode) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">ucs2_bin</code> </p></li><li><p> <code class="literal">ucs2_czech_ci</code> </p></li><li><p> <code class="literal">ucs2_danish_ci</code> </p></li><li><p> <code class="literal">ucs2_esperanto_ci</code> </p></li><li><p> <code class="literal">ucs2_estonian_ci</code> </p></li><li><p> <code class="literal">ucs2_general_ci</code> (default) </p></li><li><p> <code class="literal">ucs2_hungarian_ci</code> </p></li><li><p> <code class="literal">ucs2_icelandic_ci</code> </p></li><li><p> <code class="literal">ucs2_latvian_ci</code> </p></li><li><p> <code class="literal">ucs2_lithuanian_ci</code> </p></li><li><p> <code class="literal">ucs2_persian_ci</code> </p></li><li><p> <code class="literal">ucs2_polish_ci</code> </p></li><li><p> <code class="literal">ucs2_roman_ci</code> </p></li><li><p> <code class="literal">ucs2_romanian_ci</code> </p></li><li><p> <code class="literal">ucs2_slovak_ci</code> </p></li><li><p> <code class="literal">ucs2_slovenian_ci</code> </p></li><li><p> <code class="literal">ucs2_spanish2_ci</code> </p></li><li><p> <code class="literal">ucs2_spanish_ci</code> </p></li><li><p> <code class="literal">ucs2_swedish_ci</code> </p></li><li><p> <code class="literal">ucs2_turkish_ci</code> </p></li><li><p> <code class="literal">ucs2_unicode_ci</code> </p></li></ul></div></li><li><p> <code class="literal">utf8</code> (UTF-8 Unicode) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">utf8_bin</code> </p></li><li><p> <code class="literal">utf8_czech_ci</code> </p></li><li><p> <code class="literal">utf8_danish_ci</code> </p></li><li><p> <code class="literal">utf8_esperanto_ci</code> </p></li><li><p> <code class="literal">utf8_estonian_ci</code> </p></li><li><p> <code class="literal">utf8_general_ci</code> (default) </p></li><li><p> <code class="literal">utf8_hungarian_ci</code> </p></li><li><p> <code class="literal">utf8_icelandic_ci</code> </p></li><li><p> <code class="literal">utf8_latvian_ci</code> </p></li><li><p> <code class="literal">utf8_lithuanian_ci</code> </p></li><li><p> <code class="literal">utf8_persian_ci</code> </p></li><li><p> <code class="literal">utf8_polish_ci</code> </p></li><li><p> <code class="literal">utf8_roman_ci</code> </p></li><li><p> <code class="literal">utf8_romanian_ci</code> </p></li><li><p> <code class="literal">utf8_slovak_ci</code> </p></li><li><p> <code class="literal">utf8_slovenian_ci</code> </p></li><li><p> <code class="literal">utf8_spanish2_ci</code> </p></li><li><p> <code class="literal">utf8_spanish_ci</code> </p></li><li><p> <code class="literal">utf8_swedish_ci</code> </p></li><li><p> <code class="literal">utf8_turkish_ci</code> </p></li><li><p> <code class="literal">utf8_unicode_ci</code> </p></li></ul></div></li></ul></div><p> The MySQL implementation of UCS-2 stores characters in big-endian byte order and does not use a byte order mark (BOM) at the beginning of UCS-2 values. Other database systems might use little-ending byte order or a BOM, in which case conversion of UCS-2 values will need to be performed when transferring data between those systems and MySQL. </p><p> Note that in the <code class="literal">ucs2_roman_ci</code> and <code class="literal">utf8_roman_ci</code> collations, <code class="literal">I</code> and <code class="literal">J</code> compare as equals, and <code class="literal">U</code> and <code class="literal">V</code> compare as equals. </p><p> The <code class="literal">ucs2_esperanto_ci</code> and <code class="literal">utf8_esperanto_ci</code> collations were added in MySQL 5.0.13. The <code class="literal">ucs2_hungarian_ci</code> and <code class="literal">utf8_hungarian_ci</code> collations were added in MySQL 5.0.19. </p><a class="indexterm" name="id2749600"></a><p> MySQL implements the <code class="literal">utf8_unicode_ci</code> collation according to the Unicode Collation Algorithm (UCA) described at <a href="http://www.unicode.org/reports/tr10/" target="_top">http://www.unicode.org/reports/tr10/</a>. The collation uses the version-4.0.0 UCA weight keys: <a href="http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt" target="_top">http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt</a>. The following discussion uses <code class="literal">utf8_unicode_ci</code>, but it is also true for <code class="literal">ucs2_unicode_ci</code>. </p><p> Currently, the <code class="literal">utf8_unicode_ci</code> collation has only partial support for the Unicode Collation Algorithm. Some characters are not supported yet. Also, combining marks are not fully supported. This affects primarily Vietnamese, Yoruba, and some smaller languages such as Navajo. </p><p> For any Unicode character set, operations performed using the <code class="literal">_general_ci</code> collation are faster than those for the <code class="literal">_unicode_ci</code> collation. For example, comparisons for the <code class="literal">utf8_general_ci</code> collation are faster, but slightly less correct, than comparisons for <code class="literal">utf8_unicode_ci</code>. The reason for this is that <code class="literal">utf8_unicode_ci</code> supports mappings such as expansions; that is, when one character compares as equal to combinations of other characters. For example, in German and some other languages “<span class="quote"><code class="literal">ß</code></span>” is equal to “<span class="quote"><code class="literal">ss</code></span>”. <code class="literal">utf8_unicode_ci</code> also supports contractions and ignorable characters. <code class="literal">utf8_general_ci</code> is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters. </p><p> To further illustrate, the following equalities hold in both <code class="literal">utf8_general_ci</code> and <code class="literal">utf8_unicode_ci</code> (for the effect this has in comparisons or when doing searches, see <a href="internationalization-localization.html#charset-collation-effect" title="9.1.5.6. Examples of the Effect of Collation">Section 9.1.5.6, “Examples of the Effect of Collation”</a>): </p><pre class="programlisting">Ä = A Ö = O Ü = U </pre><p> A difference between the collations is that this is true for <code class="literal">utf8_general_ci</code>: </p><pre class="programlisting">ß = s </pre><p> Whereas this is true for <code class="literal">utf8_unicode_ci</code>: </p><pre class="programlisting">ß = ss </pre><p> MySQL implements language-specific collations for the <code class="literal">utf8</code> character set only if the ordering with <code class="literal">utf8_unicode_ci</code> does not work well for a language. For example, <code class="literal">utf8_unicode_ci</code> works fine for German and French, so there is no need to create special <code class="literal">utf8</code> collations for these two languages. </p><p> <code class="literal">utf8_general_ci</code> also is satisfactory for both German and French, except that “<span class="quote"><code class="literal">ß</code></span>” is equal to “<span class="quote"><code class="literal">s</code></span>”, and not to “<span class="quote"><code class="literal">ss</code></span>”. If this is acceptable for your application, then you should use <code class="literal">utf8_general_ci</code> because it is faster. Otherwise, use <code class="literal">utf8_unicode_ci</code> because it is more accurate. </p><p> <code class="literal">utf8_swedish_ci</code>, like other <code class="literal">utf8</code> language-specific collations, is derived from <code class="literal">utf8_unicode_ci</code> with additional language rules. For example, in Swedish, the following relationship holds, which is not something expected by a German or French speaker: </p><pre class="programlisting">Ü = Y < Ö </pre><p> The <code class="literal">utf8_spanish_ci</code> and <code class="literal">utf8_spanish2_ci</code> collations correspond to modern Spanish and traditional Spanish, respectively. In both collations, “<span class="quote"><code class="literal">ñ</code></span>” (n-tilde) is a separate letter between “<span class="quote"><code class="literal">n</code></span>” and “<span class="quote"><code class="literal">o</code></span>”. In addition, for traditional Spanish, “<span class="quote"><code class="literal">ch</code></span>” is a separate letter between “<span class="quote"><code class="literal">c</code></span>” and “<span class="quote"><code class="literal">d</code></span>”, and “<span class="quote"><code class="literal">ll</code></span>” is a separate letter between “<span class="quote"><code class="literal">l</code></span>” and “<span class="quote"><code class="literal">m</code></span>” </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-we-sets"></a>9.1.11.2. West European Character Sets</h4></div></div></div><p> Western European character sets cover most West European languages, such as French, Spanish, Catalan, Basque, Portuguese, Italian, Albanian, Dutch, German, Danish, Swedish, Norwegian, Finnish, Faroese, Icelandic, Irish, Scottish, and English. </p><div class="itemizedlist"><ul type="disc"><li><p> <code class="literal">ascii</code> (US ASCII) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">ascii_bin</code> </p></li><li><p> <code class="literal">ascii_general_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">cp850</code> (DOS West European) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">cp850_bin</code> </p></li><li><p> <code class="literal">cp850_general_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">dec8</code> (DEC Western European) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">dec8_bin</code> </p></li><li><p> <code class="literal">dec8_swedish_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">hp8</code> (HP Western European) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">hp8_bin</code> </p></li><li><p> <code class="literal">hp8_english_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">latin1</code> (cp1252 West European) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">latin1_bin</code> </p></li><li><p> <code class="literal">latin1_danish_ci</code> </p></li><li><p> <code class="literal">latin1_general_ci</code> </p></li><li><p> <code class="literal">latin1_general_cs</code> </p></li><li><p> <code class="literal">latin1_german1_ci</code> </p></li><li><p> <code class="literal">latin1_german2_ci</code> </p></li><li><p> <code class="literal">latin1_spanish_ci</code> </p></li><li><p> <code class="literal">latin1_swedish_ci</code> (default) </p></li></ul></div><p> <code class="literal">latin1</code> is the default character set. MySQL's <code class="literal">latin1</code> is the same as the Windows <code class="literal">cp1252</code> character set. This means it is the same as the official <code class="literal">ISO 8859-1</code> or IANA (Internet Assigned Numbers Authority) <code class="literal">latin1</code>, except that IANA <code class="literal">latin1</code> treats the code points between <code class="literal">0x80</code> and <code class="literal">0x9f</code> as “<span class="quote">undefined,</span>” whereas <code class="literal">cp1252</code>, and therefore MySQL's <code class="literal">latin1</code>, assign characters for those positions. For example, <code class="literal">0x80</code> is the Euro sign. For the “<span class="quote">undefined</span>” entries in <code class="literal">cp1252</code>, MySQL translates <code class="literal">0x81</code> to Unicode <code class="literal">0x0081</code>, <code class="literal">0x8d</code> to <code class="literal">0x008d</code>, <code class="literal">0x8f</code> to <code class="literal">0x008f</code>, <code class="literal">0x90</code> to <code class="literal">0x0090</code>, and <code class="literal">0x9d</code> to <code class="literal">0x009d</code>. </p><p> The <code class="literal">latin1_swedish_ci</code> collation is the default that probably is used by the majority of MySQL customers. Although it is frequently said that it is based on the Swedish/Finnish collation rules, there are Swedes and Finns who disagree with this statement. </p><p> The <code class="literal">latin1_german1_ci</code> and <code class="literal">latin1_german2_ci</code> collations are based on the DIN-1 and DIN-2 standards, where DIN stands for <span class="foreignphrase"><em class="foreignphrase">Deutsches Institut für Normung</em></span> (the German equivalent of ANSI). DIN-1 is called the “<span class="quote">dictionary collation</span>” and DIN-2 is called the “<span class="quote">phone book collation.</span>” For an example of the effect this has in comparisons or when doing searches, see <a href="internationalization-localization.html#charset-collation-effect" title="9.1.5.6. Examples of the Effect of Collation">Section 9.1.5.6, “Examples of the Effect of Collation”</a>. </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">latin1_german1_ci</code> (dictionary) rules: </p><pre class="programlisting">Ä = A Ö = O Ü = U ß = s </pre></li><li><p> <code class="literal">latin1_german2_ci</code> (phone-book) rules: </p><pre class="programlisting">Ä = AE Ö = OE Ü = UE ß = ss </pre></li></ul></div><p> For an example of the effect this has in comparisons or when doing searches, see <a href="internationalization-localization.html#charset-collation-effect" title="9.1.5.6. Examples of the Effect of Collation">Section 9.1.5.6, “Examples of the Effect of Collation”</a>. </p><p> In the <code class="literal">latin1_spanish_ci</code> collation, “<span class="quote"><code class="literal">ñ</code></span>” (n-tilde) is a separate letter between “<span class="quote"><code class="literal">n</code></span>” and “<span class="quote"><code class="literal">o</code></span>”. </p></li><li><p> <code class="literal">macroman</code> (Mac West European) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">macroman_bin</code> </p></li><li><p> <code class="literal">macroman_general_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">swe7</code> (7bit Swedish) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">swe7_bin</code> </p></li><li><p> <code class="literal">swe7_swedish_ci</code> (default) </p></li></ul></div></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-ce-sets"></a>9.1.11.3. Central European Character Sets</h4></div></div></div><p> MySQL provides some support for character sets used in the Czech Republic, Slovakia, Hungary, Romania, Slovenia, Croatia, Poland, and Serbia (Latin). </p><div class="itemizedlist"><ul type="disc"><li><p> <code class="literal">cp1250</code> (Windows Central European) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">cp1250_bin</code> </p></li><li><p> <code class="literal">cp1250_croatian_ci</code> </p></li><li><p> <code class="literal">cp1250_czech_cs</code> </p></li><li><p> <code class="literal">cp1250_general_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">cp852</code> (DOS Central European) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">cp852_bin</code> </p></li><li><p> <code class="literal">cp852_general_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">keybcs2</code> (DOS Kamenicky Czech-Slovak) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">keybcs2_bin</code> </p></li><li><p> <code class="literal">keybcs2_general_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">latin2</code> (ISO 8859-2 Central European) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">latin2_bin</code> </p></li><li><p> <code class="literal">latin2_croatian_ci</code> </p></li><li><p> <code class="literal">latin2_czech_cs</code> </p></li><li><p> <code class="literal">latin2_general_ci</code> (default) </p></li><li><p> <code class="literal">latin2_hungarian_ci</code> </p></li></ul></div></li><li><p> <code class="literal">macce</code> (Mac Central European) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">macce_bin</code> </p></li><li><p> <code class="literal">macce_general_ci</code> (default) </p></li></ul></div></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-se-me-sets"></a>9.1.11.4. South European and Middle East Character Sets</h4></div></div></div><p> South European and Middle Eastern character sets supported by MySQL include Armenian, Arabic, Georgian, Greek, Hebrew, and Turkish. </p><div class="itemizedlist"><ul type="disc"><li><p> <code class="literal">armscii8</code> (ARMSCII-8 Armenian) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">armscii8_bin</code> </p></li><li><p> <code class="literal">armscii8_general_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">cp1256</code> (Windows Arabic) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">cp1256_bin</code> </p></li><li><p> <code class="literal">cp1256_general_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">geostd8</code> (GEOSTD8 Georgian) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">geostd8_bin</code> </p></li><li><p> <code class="literal">geostd8_general_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">greek</code> (ISO 8859-7 Greek) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">greek_bin</code> </p></li><li><p> <code class="literal">greek_general_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">hebrew</code> (ISO 8859-8 Hebrew) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">hebrew_bin</code> </p></li><li><p> <code class="literal">hebrew_general_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">latin5</code> (ISO 8859-9 Turkish) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">latin5_bin</code> </p></li><li><p> <code class="literal">latin5_turkish_ci</code> (default) </p></li></ul></div></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-baltic-sets"></a>9.1.11.5. Baltic Character Sets</h4></div></div></div><p> The Baltic character sets cover Estonian, Latvian, and Lithuanian languages. </p><div class="itemizedlist"><ul type="disc"><li><p> <code class="literal">cp1257</code> (Windows Baltic) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">cp1257_bin</code> </p></li><li><p> <code class="literal">cp1257_general_ci</code> (default) </p></li><li><p> <code class="literal">cp1257_lithuanian_ci</code> </p></li></ul></div></li><li><p> <code class="literal">latin7</code> (ISO 8859-13 Baltic) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">latin7_bin</code> </p></li><li><p> <code class="literal">latin7_estonian_cs</code> </p></li><li><p> <code class="literal">latin7_general_ci</code> (default) </p></li><li><p> <code class="literal">latin7_general_cs</code> </p></li></ul></div></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-cyrillic-sets"></a>9.1.11.6. Cyrillic Character Sets</h4></div></div></div><p> The Cyrillic character sets and collations are for use with Belarusian, Bulgarian, Russian, Ukrainian, and Serbian (Cyrillic) languages. </p><div class="itemizedlist"><ul type="disc"><li><p> <code class="literal">cp1251</code> (Windows Cyrillic) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">cp1251_bin</code> </p></li><li><p> <code class="literal">cp1251_bulgarian_ci</code> </p></li><li><p> <code class="literal">cp1251_general_ci</code> (default) </p></li><li><p> <code class="literal">cp1251_general_cs</code> </p></li><li><p> <code class="literal">cp1251_ukrainian_ci</code> </p></li></ul></div></li><li><p> <code class="literal">cp866</code> (DOS Russian) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">cp866_bin</code> </p></li><li><p> <code class="literal">cp866_general_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">koi8r</code> (KOI8-R Relcom Russian) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">koi8r_bin</code> </p></li><li><p> <code class="literal">koi8r_general_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">koi8u</code> (KOI8-U Ukrainian) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">koi8u_bin</code> </p></li><li><p> <code class="literal">koi8u_general_ci</code> (default) </p></li></ul></div></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-asian-sets"></a>9.1.11.7. Asian Character Sets</h4></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#charset-cp932">9.1.11.7.1. The <code class="literal">cp932</code> Character Set</a></span></dt></dl></div><p> The Asian character sets that we support include Chinese, Japanese, Korean, and Thai. These can be complicated. For example, the Chinese sets must allow for thousands of different characters. See <a href="internationalization-localization.html#charset-cp932" title="9.1.11.7.1. The cp932 Character Set">Section 9.1.11.7.1, “The <code class="literal">cp932</code> Character Set”</a>, for additional information about the <code class="literal">cp932</code> and <code class="literal">sjis</code> character sets. </p><p> For answers to some common questions and problems relating support for Asian character sets in MySQL, see <a href="faqs.html#faqs-cjk" title="A.11. MySQL 5.0 FAQ — MySQL Chinese, Japanese, and Korean Character Sets">Section A.11, “MySQL 5.0 FAQ — MySQL Chinese, Japanese, and Korean Character Sets”</a>. </p><div class="itemizedlist"><ul type="disc"><li><p> <code class="literal">big5</code> (Big5 Traditional Chinese) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">big5_bin</code> </p></li><li><p> <code class="literal">big5_chinese_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">cp932</code> (SJIS for Windows Japanese) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">cp932_bin</code> </p></li><li><p> <code class="literal">cp932_japanese_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">eucjpms</code> (UJIS for Windows Japanese) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">eucjpms_bin</code> </p></li><li><p> <code class="literal">eucjpms_japanese_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">euckr</code> (EUC-KR Korean) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">euckr_bin</code> </p></li><li><p> <code class="literal">euckr_korean_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">gb2312</code> (GB2312 Simplified Chinese) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">gb2312_bin</code> </p></li><li><p> <code class="literal">gb2312_chinese_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">gbk</code> (GBK Simplified Chinese) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">gbk_bin</code> </p></li><li><p> <code class="literal">gbk_chinese_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">sjis</code> (Shift-JIS Japanese) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">sjis_bin</code> </p></li><li><p> <code class="literal">sjis_japanese_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">tis620</code> (TIS620 Thai) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">tis620_bin</code> </p></li><li><p> <code class="literal">tis620_thai_ci</code> (default) </p></li></ul></div></li><li><p> <code class="literal">ujis</code> (EUC-JP Japanese) collations: </p><div class="itemizedlist"><ul type="circle"><li><p> <code class="literal">ujis_bin</code> </p></li><li><p> <code class="literal">ujis_japanese_ci</code> (default) </p></li></ul></div></li></ul></div><div class="section" lang="en"><div class="titlepage"><div><div><h5 class="title"><a name="charset-cp932"></a>9.1.11.7.1. The <code class="literal">cp932</code> Character Set</h5></div></div></div><p> <span class="bold"><strong>Why is <code class="literal">cp932</code> needed?</strong></span> </p><p> In MySQL, the <code class="literal">sjis</code> character set corresponds to the <code class="literal">Shift_JIS</code> character set defined by IANA, which supports JIS X0201 and JIS X0208 characters. (See <a href="http://www.iana.org/assignments/character-sets" target="_top">http://www.iana.org/assignments/character-sets</a>.) </p><p> However, the meaning of “<span class="quote">SHIFT JIS</span>” as a descriptive term has become very vague and it often includes the extensions to <code class="literal">Shift_JIS</code> that are defined by various vendors. </p><p> For example, “<span class="quote">SHIFT JIS</span>” used in Japanese Windows environments is a Microsoft extension of <code class="literal">Shift_JIS</code> and its exact name is <code class="literal">Microsoft Windows Codepage : 932</code> or <code class="literal">cp932</code>. In addition to the characters supported by <code class="literal">Shift_JIS</code>, <code class="literal">cp932</code> supports extension characters such as NEC special characters, NEC selected — IBM extended characters, and IBM extended characters. </p><p> Many Japanese users have experienced problems using these extension characters. These problems stem from the following factors: </p><div class="itemizedlist"><ul type="disc"><li><p> MySQL automatically converts character sets. </p></li><li><p> Character sets are converted via Unicode (<code class="literal">ucs2</code>). </p></li><li><p> The <code class="literal">sjis</code> character set does not support the conversion of these extension characters. </p></li><li><p> There are several conversion rules from so-called “<span class="quote">SHIFT JIS</span>” to Unicode, and some characters are converted to Unicode differently depending on the conversion rule. MySQL supports only one of these rules (described later). </p></li></ul></div><p> The MySQL <code class="literal">cp932</code> character set is designed to solve these problems. It is available as of MySQL 5.0.3. </p><p> Because MySQL supports character set conversion, it is important to separate IANA <code class="literal">Shift_JIS</code> and <code class="literal">cp932</code> into two different character sets because they provide different conversion rules. </p><p> <span class="bold"><strong>How does <code class="literal">cp932</code> differ from <code class="literal">sjis</code>?</strong></span> </p><p> The <code class="literal">cp932</code> character set differs from <code class="literal">sjis</code> in the following ways: </p><div class="itemizedlist"><ul type="disc"><li><p> <code class="literal">cp932</code> supports NEC special characters, NEC selected — IBM extended characters, and IBM selected characters. </p></li><li><p> Some <code class="literal">cp932</code> characters have two different code points, both of which convert to the same Unicode code point. When converting from Unicode back to <code class="literal">cp932</code>, one of the code points must be selected. For this “<span class="quote">round trip conversion,</span>” the rule recommended by Microsoft is used. (See <a href="http://support.microsoft.com/kb/170559/EN-US/" target="_top">http://support.microsoft.com/kb/170559/EN-US/</a>.) </p><p> The conversion rule works like this: </p><div class="itemizedlist"><ul type="circle"><li><p> If the character is in both JIS X 0208 and NEC special characters, use the code point of JIS X 0208. </p></li><li><p> If the character is in both NEC special characters and IBM selected characters, use the code point of NEC special characters. </p></li><li><p> If the character is in both IBM selected characters and NEC selected — IBM extended characters, use the code point of IBM extended characters. </p></li></ul></div><p> The table shown at <a href="http://www.microsoft.com/globaldev/reference/dbcs/932.htm" target="_top">http://www.microsoft.com/globaldev/reference/dbcs/932.htm</a> provides information about the Unicode values of <code class="literal">cp932</code> characters. For <code class="literal">cp932</code> table entries with characters under which a four-digit number appears, the number represents the corresponding Unicode (<code class="literal">ucs2</code>) encoding. For table entries with an underlined two-digit value appears, there is a range of <code class="literal">cp932</code> character values that begin with those two digits. Clicking such a table entry takes you to a page that displays the Unicode value for each of the <code class="literal">cp932</code> characters that begin with those digits. </p><p> The following links are of special interest. They correspond to the encodings for the following sets of characters: </p><div class="itemizedlist"><ul type="circle"><li><p> NEC special characters: </p><pre class="programlisting"><a href="http://www.microsoft.com/globaldev/reference/dbcs/932/932_87.htm" target="_top">http://www.microsoft.com/globaldev/reference/dbcs/932/932_87.htm</a> </pre></li><li><p> NEC selected — IBM extended characters: </p><pre class="programlisting"><a href="http://www.microsoft.com/globaldev/reference/dbcs/932/932_ED.htm" target="_top">http://www.microsoft.com/globaldev/reference/dbcs/932/932_ED.htm</a> <a href="http://www.microsoft.com/globaldev/reference/dbcs/932/932_EE.htm" target="_top">http://www.microsoft.com/globaldev/reference/dbcs/932/932_EE.htm</a> </pre></li><li><p> IBM selected characters: </p><pre class="programlisting"><a href="http://www.microsoft.com/globaldev/reference/dbcs/932/932_FA.htm" target="_top">http://www.microsoft.com/globaldev/reference/dbcs/932/932_FA.htm</a> <a href="http://www.microsoft.com/globaldev/reference/dbcs/932/932_FB.htm" target="_top">http://www.microsoft.com/globaldev/reference/dbcs/932/932_FB.htm</a> <a href="http://www.microsoft.com/globaldev/reference/dbcs/932/932_FC.htm" target="_top">http://www.microsoft.com/globaldev/reference/dbcs/932/932_FC.htm</a> </pre></li></ul></div></li><li><p> Starting from version 5.0.3, <code class="literal">cp932</code> supports conversion of user-defined characters in combination with <code class="literal">eucjpms</code>, and solves the problems with <code class="literal">sjis</code>/<code class="literal">ujis</code> conversion. For details, please refer to <a href="http://www.opengroup.or.jp/jvc/cde/sjis-euc-e.html" target="_top">http://www.opengroup.or.jp/jvc/cde/sjis-euc-e.html</a>. </p></li></ul></div><p> For some characters, conversion to and from <code class="literal">ucs2</code> is different for <code class="literal">sjis</code> and <code class="literal">cp932</code>. The following tables illustrate these differences. </p><p> Conversion to <code class="literal">ucs2</code>: </p><div class="informaltable"><table border="1"><colgroup><col><col><col></colgroup><tbody><tr><td><span class="bold"><strong><code class="literal">sjis</code>/<code class="literal">cp932</code> Value</strong></span></td><td><span class="bold"><strong><code class="literal">sjis</code> -> <code class="literal">ucs2</code> Conversion</strong></span></td><td><span class="bold"><strong><code class="literal">cp932</code> -> <code class="literal">ucs2</code> Conversion</strong></span></td></tr><tr><td>5C</td><td>005C</td><td>005C</td></tr><tr><td>7E</td><td>007E</td><td>007E</td></tr><tr><td>815C</td><td>2015</td><td>2015</td></tr><tr><td>815F</td><td>005C</td><td>FF3C</td></tr><tr><td>8160</td><td>301C</td><td>FF5E</td></tr><tr><td>8161</td><td>2016</td><td>2225</td></tr><tr><td>817C</td><td>2212</td><td>FF0D</td></tr><tr><td>8191</td><td>00A2</td><td>FFE0</td></tr><tr><td>8192</td><td>00A3</td><td>FFE1</td></tr><tr><td>81CA</td><td>00AC</td><td>FFE2</td></tr></tbody></table></div><p> Conversion from <code class="literal">ucs2</code>: </p><div class="informaltable"><table border="1"><colgroup><col><col><col></colgroup><tbody><tr><td><span class="bold"><strong><code class="literal">ucs2</code> value</strong></span></td><td><span class="bold"><strong><code class="literal">ucs2</code> -> <code class="literal">sjis</code> Conversion</strong></span></td><td><span class="bold"><strong><code class="literal">ucs2</code> -> <code class="literal">cp932</code> Conversion</strong></span></td></tr><tr><td>005C</td><td>815F</td><td>5C</td></tr><tr><td>007E</td><td>7E</td><td>7E</td></tr><tr><td>00A2</td><td>8191</td><td>3F</td></tr><tr><td>00A3</td><td>8192</td><td>3F</td></tr><tr><td>00AC</td><td>81CA</td><td>3F</td></tr><tr><td>2015</td><td>815C</td><td>815C</td></tr><tr><td>2016</td><td>8161</td><td>3F</td></tr><tr><td>2212</td><td>817C</td><td>3F</td></tr><tr><td>2225</td><td>3F</td><td>8161</td></tr><tr><td>301C</td><td>8160</td><td>3F</td></tr><tr><td>FF0D</td><td>3F</td><td>817C</td></tr><tr><td>FF3C</td><td>3F</td><td>815F</td></tr><tr><td>FF5E</td><td>3F</td><td>8160</td></tr><tr><td>FFE0</td><td>3F</td><td>8191</td></tr><tr><td>FFE1</td><td>3F</td><td>8192</td></tr><tr><td>FFE2</td><td>3F</td><td>81CA</td></tr></tbody></table></div><p> Users of any Japanese character sets should be aware that using <code class="option">--character-set-client-handshake</code> (or <code class="option">--skip-character-set-client-handshake</code>) has an important effect. See <a href="server-administration.html#server-options" title="5.2.2. Command Options">Section 5.2.2, “Command Options”</a>. </p></div></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="charset-configuration"></a>9.2. The Character Set Used for Data and Sorting</h2></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#german-character-set">9.2.1. Using the German Character Set</a></span></dt></dl></div><a class="indexterm" name="id2753133"></a><a class="indexterm" name="id2753142"></a><a class="indexterm" name="id2753155"></a><p> By default, MySQL uses the <code class="literal">latin1</code> (cp1252 West European) character set and the <code class="literal">latin1_swedish_ci</code> collation that sorts according to Swedish/Finnish rules. These defaults are suitable for the United States and most of Western Europe. </p><p> All MySQL binary distributions are compiled with <code class="option">--with-extra-charsets=complex</code>. This adds code to all standard programs that enables them to handle <code class="literal">latin1</code> and all multi-byte character sets within the binary. Other character sets are loaded from a character-set definition file when needed. </p><p> The character set determines what characters are allowed in identifiers. The collation determines how strings are sorted by the <code class="literal">ORDER BY</code> and <code class="literal">GROUP BY</code> clauses of the <code class="literal">SELECT</code> statement. </p><p> You can change the default server character set and collation with the <code class="option">--character-set-server</code> and <code class="option">--collation-server</code> options when you start the server. The collation must be a legal collation for the default character set. (Use the <code class="literal">SHOW COLLATION</code> statement to determine which collations are available for each character set.) See <a href="server-administration.html#server-options" title="5.2.2. Command Options">Section 5.2.2, “Command Options”</a>. </p><p> The character sets available depend on the <code class="option">--with-charset=<em class="replaceable"><code>charset_name</code></em></code> and <code class="option">--with-extra-charsets=<em class="replaceable"><code>list-of-charsets</code></em> | complex | all | none</code> options to <span><strong class="command">configure</strong></span>, and the character set configuration files listed in <code class="filename"><em class="replaceable"><code>SHAREDIR</code></em>/charsets/Index</code>. See <a href="installing.html#configure-options" title="2.4.15.2. Typical configure Options">Section 2.4.15.2, “Typical <span><strong class="command">configure</strong></span> Options”</a>. </p><p> If you change the character set when running MySQL, that may also change the sort order. Consequently, you must run <span><strong class="command">myisamchk -r -q --set-collation=<em class="replaceable"><code>collation_name</code></em></strong></span> on all <code class="literal">MyISAM</code> tables, or your indexes may not be ordered correctly. </p><p> When a client connects to a MySQL server, the server indicates to the client what the server's default character set is. The client switches to this character set for this connection. </p><p> You should use <a href="apis.html#mysql-real-escape-string" title="23.2.3.53. mysql_real_escape_string()"><code class="literal">mysql_real_escape_string()</code></a> when escaping strings for an SQL query. <a href="apis.html#mysql-real-escape-string" title="23.2.3.53. mysql_real_escape_string()"><code class="literal">mysql_real_escape_string()</code></a> is identical to the old <a href="apis.html#mysql-escape-string" title="23.2.3.16. mysql_escape_string()"><code class="literal">mysql_escape_string()</code></a> function, except that it takes the <code class="literal">MYSQL</code> connection handle as the first parameter so that the appropriate character set can be taken into account when escaping characters. </p><p> If the client is compiled with paths that differ from where the server is installed and the user who configured MySQL didn't include all character sets in the MySQL binary, you must tell the client where it can find the additional character sets it needs if the server runs with a different character set from the client. You can do this by specifying a <code class="option">--character-sets-dir</code> option to indicate the path to the directory in which the dynamic MySQL character sets are stored. For example, you can put the following in an option file: </p><pre class="programlisting">[client] character-sets-dir=/usr/local/mysql/share/mysql/charsets </pre><p> You can force the client to use specific character set as follows: </p><pre class="programlisting">[client] default-character-set=<em class="replaceable"><code>charset_name</code></em> </pre><p> This is normally unnecessary, however. </p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="german-character-set"></a>9.2.1. Using the German Character Set</h3></div></div></div><p> In MySQL 5.0, character set and collation are specified separately. This means that if you want German sort order, you should select the <code class="literal">latin1</code> character set and either the <code class="literal">latin1_german1_ci</code> or <code class="literal">latin1_german2_ci</code> collation. For example, to start the server with the <code class="literal">latin1_german1_ci</code> collation, use the <code class="option">--character-set-server=latin1</code> and <code class="option">--collation-server=latin1_german1_ci</code> options. </p><p> For information on the differences between these two collations, see <a href="internationalization-localization.html#charset-we-sets" title="9.1.11.2. West European Character Sets">Section 9.1.11.2, “West European Character Sets”</a>. </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="error-message-language"></a>9.3. Setting the Error Message Language</h2></div></div></div><a class="indexterm" name="id2753473"></a><a class="indexterm" name="id2753486"></a><a class="indexterm" name="id2753498"></a><a class="indexterm" name="id2753511"></a><p> By default, <span><strong class="command">mysqld</strong></span> produces error messages in English, but they can also be displayed in any of these other languages: Czech, Danish, Dutch, Estonian, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Norwegian-ny, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, or Swedish. </p><p> To start <span><strong class="command">mysqld</strong></span> with a particular language for error messages, use the <code class="option">--language</code> or <code class="option">-L</code> option. The option value can be a language name or the full path to the error message file. For example: </p><pre class="programlisting">shell> <strong class="userinput"><code>mysqld --language=swedish</code></strong> </pre><p> Or: </p><pre class="programlisting">shell> <strong class="userinput"><code>mysqld --language=/usr/local/share/swedish</code></strong> </pre><p> The language name should be specified in lowercase. </p><p> By default, the language files are located in the <code class="filename">share/<em class="replaceable"><code>LANGUAGE</code></em></code> directory under the MySQL base directory. </p><p> You can also change the content of the error messages produced by the server. Details can be found in the MySQL Internals manual, available at <a href="http://forge.mysql.com/wiki/MySQL_Internals_Error_Messages" target="_top">http://forge.mysql.com/wiki/MySQL_Internals_Error_Messages</a>. If you upgrade to a newer version of MySQL after changing the error messages, remember to repeat your changes after the upgrade. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="adding-character-set"></a>9.4. Adding a New Character Set</h2></div></div></div><a class="indexterm" name="id2753627"></a><a class="indexterm" name="id2753639"></a><p> This section discusses the procedure for adding a new character set to MySQL. You must have a MySQL source distribution to use these instructions. To choose the proper procedure, determine whether the character set is simple or complex: </p><div class="itemizedlist"><ul type="disc"><li><p> If the character set does not need to use special string collating routines for sorting and does not need multi-byte character support, it is simple. </p></li><li><p> If it needs either of those features, it is complex. </p></li></ul></div><p> For example, <code class="literal">latin1</code> and <code class="literal">danish</code> are simple character sets, whereas <code class="literal">big5</code> and <code class="literal">czech</code> are complex character sets. </p><p> In the following instructions, the name of the character set is represented by <em class="replaceable"><code>MYSET</code></em>. </p><p> For a simple character set, do the following: </p><div class="orderedlist"><ol type="1"><li><p> Add <em class="replaceable"><code>MYSET</code></em> to the end of the <code class="filename">sql/share/charsets/Index</code> file. Assign a unique number to it. </p></li><li><p> Create the file <code class="filename">sql/share/charsets/<em class="replaceable"><code>MYSET</code></em>.conf</code>. (You can use a copy of <code class="filename">sql/share/charsets/latin1.conf</code> as the basis for this file.) </p><p> The syntax for the file is very simple: </p><div class="itemizedlist"><ul type="disc"><li><p> Comments start with a “<span class="quote"><code class="literal">#</code></span>” character and continue to the end of the line. </p></li><li><p> Words are separated by arbitrary amounts of whitespace. </p></li><li><p> When defining the character set, every word must be a number in hexadecimal format. </p></li><li><p> The <code class="literal">ctype</code> array takes up the first 257 words. The <code class="literal">to_lower[]</code>, <code class="literal">to_upper[]</code> and <code class="literal">sort_order[]</code> arrays take up 256 words each after that. </p></li></ul></div><p> See <a href="internationalization-localization.html#character-arrays" title="9.5. The Character Definition Arrays">Section 9.5, “The Character Definition Arrays”</a>. </p></li><li><p> Add the character set name to the <code class="literal">CHARSETS_AVAILABLE</code> and <code class="literal">COMPILED_CHARSETS</code> lists in <code class="filename">configure.in</code>. </p></li><li><p> Reconfigure, recompile, and test. </p></li></ol></div><p> For a complex character set, do the following: </p><div class="orderedlist"><ol type="1"><li><p> Create the file <code class="filename">strings/ctype-<em class="replaceable"><code>MYSET</code></em>.c</code> in the MySQL source distribution. </p></li><li><p> Add <em class="replaceable"><code>MYSET</code></em> to the end of the <code class="filename">sql/share/charsets/Index</code> file. Assign a unique number to it. </p></li><li><p> Look at one of the existing <code class="filename">ctype-*.c</code> files (such as <code class="filename">strings/ctype-big5.c</code>) to see what needs to be defined. Note that the arrays in your file must have names like <code class="literal">ctype_<em class="replaceable"><code>MYSET</code></em></code>, <code class="literal">to_lower_<em class="replaceable"><code>MYSET</code></em></code>, and so on. These correspond to the arrays for a simple character set. See <a href="internationalization-localization.html#character-arrays" title="9.5. The Character Definition Arrays">Section 9.5, “The Character Definition Arrays”</a>. </p></li><li><p> Near the top of the file, place a special comment like this: </p><pre class="programlisting">/* * This comment is parsed by configure to create ctype.c, * so don't change it unless you know what you are doing. * * .configure. number_<em class="replaceable"><code>MYSET</code></em>=<em class="replaceable"><code>MYNUMBER</code></em> * .configure. strxfrm_multiply_<em class="replaceable"><code>MYSET</code></em>=<em class="replaceable"><code>N</code></em> * .configure. mbmaxlen_<em class="replaceable"><code>MYSET</code></em>=<em class="replaceable"><code>N</code></em> */ </pre><p> The <span><strong class="command">configure</strong></span> program uses this comment to include the character set into the MySQL library automatically. </p><p> The <code class="literal">strxfrm_multiply</code> and <code class="literal">mbmaxlen</code> lines are explained in the following sections. You need include them only if you need the string collating functions or the multi-byte character set functions, respectively. </p></li><li><p> You should then create some of the following functions: </p><div class="itemizedlist"><ul type="disc"><li><p> <code class="literal">my_strncoll_<em class="replaceable"><code>MYSET</code></em>()</code> </p></li><li><p> <code class="literal">my_strcoll_<em class="replaceable"><code>MYSET</code></em>()</code> </p></li><li><p> <code class="literal">my_strxfrm_<em class="replaceable"><code>MYSET</code></em>()</code> </p></li><li><p> <code class="literal">my_like_range_<em class="replaceable"><code>MYSET</code></em>()</code> </p></li></ul></div><p> See <a href="internationalization-localization.html#string-collating" title="9.6. String Collating Support">Section 9.6, “String Collating Support”</a>. </p></li><li><p> Add the character set name to the <code class="literal">CHARSETS_AVAILABLE</code> and <code class="literal">COMPILED_CHARSETS</code> lists in <code class="literal">configure.in</code>. </p></li><li><p> Reconfigure, recompile, and test. </p></li></ol></div><p> The <code class="filename">sql/share/charsets/README</code> file includes additional instructions. </p><p> If you want to have the character set included in the MySQL distribution, mail a patch to the MySQL <code class="literal">internals</code> mailing list. See <a href="introduction.html#mailing-lists" title="1.6.1. MySQL Mailing Lists">Section 1.6.1, “MySQL Mailing Lists”</a>. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="character-arrays"></a>9.5. The Character Definition Arrays</h2></div></div></div><p> <code class="literal">to_lower[]</code> and <code class="literal">to_upper[]</code> are simple arrays that hold the lowercase and uppercase characters corresponding to each member of the character set. For example: </p><pre class="programlisting">to_lower['A'] should contain 'a' to_upper['a'] should contain 'A' </pre><p> <code class="literal">sort_order[]</code> is a map indicating how characters should be ordered for comparison and sorting purposes. Quite often (but not for all character sets) this is the same as <code class="literal">to_upper[]</code>, which means that sorting is case-insensitive. MySQL sorts characters based on the values of <code class="literal">sort_order[]</code> elements. For more complicated sorting rules, see the discussion of string collating in <a href="internationalization-localization.html#string-collating" title="9.6. String Collating Support">Section 9.6, “String Collating Support”</a>. </p><p> <code class="literal">ctype[]</code> is an array of bit values, with one element for one character. (Note that <code class="literal">to_lower[]</code>, <code class="literal">to_upper[]</code>, and <code class="literal">sort_order[]</code> are indexed by character value, but <code class="literal">ctype[]</code> is indexed by character value + 1. This is an old legacy convention for handling <code class="literal">EOF</code>.) </p><p> You can find the following bitmask definitions in <code class="filename">m_ctype.h</code>: </p><pre class="programlisting">#define _U 01 /* Uppercase */ #define _L 02 /* Lowercase */ #define _N 04 /* Numeral (digit) */ #define _S 010 /* Spacing character */ #define _P 020 /* Punctuation */ #define _C 040 /* Control character */ #define _B 0100 /* Blank */ #define _X 0200 /* heXadecimal digit */ </pre><p> The <code class="literal">ctype[]</code> entry for each character should be the union of the applicable bitmask values that describe the character. For example, <code class="literal">'A'</code> is an uppercase character (<code class="literal">_U</code>) as well as a hexadecimal digit (<code class="literal">_X</code>), so <code class="literal">ctype['A'+1]</code> should contain the value: </p><pre class="programlisting">_U + _X = 01 + 0200 = 0201 </pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="string-collating"></a>9.6. String Collating Support</h2></div></div></div><a class="indexterm" name="id2754373"></a><a class="indexterm" name="id2754385"></a><p> If the sorting rules for your language are too complex to be handled with the simple <code class="literal">sort_order[]</code> table, you need to use the string collating functions. </p><p> The best documentation for this is the existing character sets. Look at the <code class="literal">big5</code>, <code class="literal">czech</code>, <code class="literal">gbk</code>, <code class="literal">sjis</code>, and <code class="literal">tis160</code> character sets for examples. </p><p> You must specify the <code class="literal">strxfrm_multiply_<em class="replaceable"><code>MYSET</code></em>=<em class="replaceable"><code>N</code></em></code> value in the special comment at the top of the file. <em class="replaceable"><code>N</code></em> should be set to the maximum ratio the strings may grow during <code class="literal">my_strxfrm_<em class="replaceable"><code>MYSET</code></em></code> (it must be a positive integer). </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="multi-byte-characters"></a>9.7. Multi-Byte Character Support</h2></div></div></div><a class="indexterm" name="id2754481"></a><a class="indexterm" name="id2754493"></a><p> If you want to add support for a new character set that includes multi-byte characters, you need to use the multi-byte character functions. </p><p> The best documentation for this is the existing character sets. Look at the <code class="literal">euc_kr</code>, <code class="literal">gb2312</code>, <code class="literal">gbk</code>, <code class="literal">sjis</code>, and <code class="literal">ujis</code> character sets for examples. These are implemented in the <code class="filename">ctype-<em class="replaceable"><code>charset_name</code></em>.c</code> files in the <code class="filename">strings</code> directory. </p><p> You must specify the <code class="literal">mbmaxlen_<em class="replaceable"><code>MYSET</code></em>=<em class="replaceable"><code>N</code></em></code> value in the special comment at the top of the source file. <em class="replaceable"><code>N</code></em> should be set to the size in bytes of the largest character in the set. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="problems-with-character-sets"></a>9.8. Problems With Character Sets</h2></div></div></div><p> If you try to use a character set that is not compiled into your binary, you might run into the following problems: </p><div class="itemizedlist"><ul type="disc"><li><p> Your program uses an incorrect path to determine where the character sets are stored. (Default <code class="filename">/usr/local/mysql/share/mysql/charsets</code>). This can be fixed by using the <code class="option">--character-sets-dir</code> option when you run the program in question. </p></li><li><p> The character set is a multi-byte character set that cannot be loaded dynamically. In this case, you must recompile the program with support for the character set. </p></li><li><p> The character set is a dynamic character set, but you do not have a configure file for it. In this case, you should install the configure file for the character set from a new MySQL distribution. </p></li><li><p> If your <code class="filename">Index</code> file does not contain the name for the character set, your program displays the following error message: </p><pre class="programlisting">ERROR 1105: File '/usr/local/share/mysql/charsets/?.conf' not found (Errcode: 2) </pre><p> In this case, you should either get a new <code class="literal">Index</code> file or manually add the name of any missing character sets to the current file. </p></li></ul></div><p> For <code class="literal">MyISAM</code> tables, you can check the character set name and number for a table with <span><strong class="command">myisamchk -dvv <em class="replaceable"><code>tbl_name</code></em></strong></span>. </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="time-zone-support"></a>9.9. MySQL Server Time Zone Support</h2></div></div></div><p> The MySQL server maintains several time zone settings: </p><div class="itemizedlist"><ul type="disc"><li><p> The system time zone. When the server starts, it attempts to determine the time zone of the host machine and uses it to set the <code class="literal">system_time_zone</code> system variable. The value does not change thereafter. </p><p> You can set the system time zone for MySQL Server at startup with the <code class="option">--timezone=<em class="replaceable"><code>timezone_name</code></em></code> option to <span><strong class="command">mysqld_safe</strong></span>. You can also set it by setting the <code class="literal">TZ</code> environment variable before you start <span><strong class="command">mysqld</strong></span>. The allowable values for <code class="option">--timezone</code> or <code class="literal">TZ</code> are system-dependent. Consult your operating system documentation to see what values are acceptable. </p></li><li><p> The server's current time zone. The global <code class="literal">time_zone</code> system variable indicates the time zone the server currently is operating in. The initial value for <code class="literal">time_zone</code> is <code class="literal">'SYSTEM'</code>, which indicates that the server time zone is the same as the system time zone. </p><p> The initial global server time zone value can be specified explicitly at startup with the <code class="option">--default-time-zone=<em class="replaceable"><code>timezone</code></em></code> option on the command line, or you can use the following line in an option file: </p><pre class="programlisting">default-time-zone='<em class="replaceable"><code>timezone</code></em>' </pre><p> If you have the <code class="literal">SUPER</code> privilege, you can set the global server time zone value at runtime with this statement: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SET GLOBAL time_zone = <em class="replaceable"><code>timezone</code></em>;</code></strong> </pre></li><li><p> Per-connection time zones. Each client that connects has its own time zone setting, given by the session <code class="literal">time_zone</code> variable. Initially, the session variable takes its value from the global <code class="literal">time_zone</code> variable, but the client can change its own time zone with this statement: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SET time_zone = <em class="replaceable"><code>timezone</code></em>;</code></strong> </pre></li></ul></div><p> The current session time zone setting affects display and storage of time values that are zone-sensitive. This includes the values displayed by functions such as <a href="functions.html#function_now"><code class="literal">NOW()</code></a> or <a href="functions.html#function_curtime"><code class="literal">CURTIME()</code></a>, and values stored in and retrieved from <code class="literal">TIMESTAMP</code> columns. Values for <code class="literal">TIMESTAMP</code> columns are converted from the current time zone to UTC for storage, and from UTC to the current time zone for retrieval. The current time zone setting does not affect values displayed by functions such as <a href="functions.html#function_utc-timestamp"><code class="literal">UTC_TIMESTAMP()</code></a> or values in <code class="literal">DATE</code>, <code class="literal">TIME</code>, or <code class="literal">DATETIME</code> columns. </p><p> The current values of the global and client-specific time zones can be retrieved like this: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SELECT @@global.time_zone, @@session.time_zone;</code></strong> </pre><p> <em class="replaceable"><code>timezone</code></em> values can be given in several formats, none of which are case sensitive: </p><div class="itemizedlist"><ul type="disc"><li><p> The value <code class="literal">'SYSTEM'</code> indicates that the time zone should be the same as the system time zone. </p></li><li><p> The value can be given as a string indicating an offset from UTC, such as <code class="literal">'+10:00'</code> or <code class="literal">'-6:00'</code>. </p></li><li><p> The value can be given as a named time zone, such as <code class="literal">'Europe/Helsinki'</code>, <code class="literal">'US/Eastern'</code>, or <code class="literal">'MET'</code>. Named time zones can be used only if the time zone information tables in the <code class="literal">mysql</code> database have been created and populated. </p></li></ul></div><p> The MySQL installation procedure creates the time zone tables in the <code class="literal">mysql</code> database, but does not load them. You must do so manually using the following instructions. (If you are upgrading to MySQL 4.1.3 or later from an earlier version, you can create the tables by upgrading your <code class="literal">mysql</code> database. Use the instructions in <a href="server-administration.html#mysql-upgrade" title="5.5.8. mysql_upgrade — Check Tables for MySQL Upgrade">Section 5.5.8, “<span><strong class="command">mysql_upgrade</strong></span> — Check Tables for MySQL Upgrade”</a>. After creating the tables, you can load them.) </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p> Loading the time zone information is not necessarily a one-time operation because the information changes occasionally. For example, the rules for Daylight Saving Time in the United States, Mexico, and parts of Canada changed in 2007. When such changes occur, applications that use the old rules become out of date and you may find it necessary to reload the time zone tables to keep the information used by your MySQL server current. See the notes at the end of this section. </p></div><p> If your system has its own <em class="firstterm">zoneinfo</em> database (the set of files describing time zones), you should use the <span><strong class="command">mysql_tzinfo_to_sql</strong></span> program for filling the time zone tables. Examples of such systems are Linux, FreeBSD, Sun Solaris, and Mac OS X. One likely location for these files is the <code class="filename">/usr/share/zoneinfo</code> directory. If your system does not have a zoneinfo database, you can use the downloadable package described later in this section. </p><p> The <span><strong class="command">mysql_tzinfo_to_sql</strong></span> program is used to load the time zone tables. On the command line, pass the zoneinfo directory pathname to <span><strong class="command">mysql_tzinfo_to_sql</strong></span> and send the output into the <span><strong class="command">mysql</strong></span> program. For example: </p><pre class="programlisting">shell> <strong class="userinput"><code>mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql</code></strong> </pre><p> <span><strong class="command">mysql_tzinfo_to_sql</strong></span> reads your system's time zone files and generates SQL statements from them. <span><strong class="command">mysql</strong></span> processes those statements to load the time zone tables. </p><p> <span><strong class="command">mysql_tzinfo_to_sql</strong></span> also can be used to load a single time zone file or to generate leap second information: </p><div class="itemizedlist"><ul type="disc"><li><p> To load a single time zone file <em class="replaceable"><code>tz_file</code></em> that corresponds to a time zone name <em class="replaceable"><code>tz_name</code></em>, invoke <span><strong class="command">mysql_tzinfo_to_sql</strong></span> like this: </p><pre class="programlisting">shell> <strong class="userinput"><code>mysql_tzinfo_to_sql <em class="replaceable"><code>tz_file</code></em> <em class="replaceable"><code>tz_name</code></em> | mysql -u root mysql</code></strong> </pre><p> With this approach, you must execute a separate command to load the time zone file for each named zone that the server needs to know about. </p></li><li><p> If your time zone needs to account for leap seconds, initialize the leap second information like this, where <em class="replaceable"><code>tz_file</code></em> is the name of your time zone file: </p><pre class="programlisting">shell> <strong class="userinput"><code>mysql_tzinfo_to_sql --leap <em class="replaceable"><code>tz_file</code></em> | mysql -u root mysql</code></strong> </pre></li><li><p> After running <span><strong class="command">mysql_tzinfo_to_sql</strong></span>, it is best to restart the server so that it does not continue to use any previously cached time zone data. </p></li></ul></div><p> If your system is one that has no zoneinfo database (for example, Windows or HP-UX), you can use the package of pre-built time zone tables that is available for download at the MySQL Developer Zone: </p><pre class="programlisting"><a href="http://dev.mysql.com/downloads/timezones.html" target="_top">http://dev.mysql.com/downloads/timezones.html</a> </pre><p> This time zone package contains <code class="filename">.frm</code>, <code class="filename">.MYD</code>, and <code class="filename">.MYI</code> files for the <code class="literal">MyISAM</code> time zone tables. These tables should be part of the <code class="literal">mysql</code> database, so you should place the files in the <code class="filename">mysql</code> subdirectory of your MySQL server's data directory. The server should be stopped while you do this and restarted afterward. </p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p> Do not use the downloadable package if your system has a zoneinfo database. Use the <span><strong class="command">mysql_tzinfo_to_sql</strong></span> utility instead. Otherwise, you may cause a difference in datetime handling between MySQL and other applications on your system. </p></div><p> For information about time zone settings in replication setup, please see <a href="replication.html#replication-features" title="15.3.1. Replication Features and Issues">Section 15.3.1, “Replication Features and Issues”</a>. </p><p> <span class="bold"><strong>Staying Current with Time Zone Changes</strong></span> </p><p> As mentioned earlier, when the time zone rules change, applications that use the old rules become out of date. To stay current, it is necessary to make sure that your system uses current time zone information is used. For MySQL, there are two factors to consider in staying current: </p><div class="itemizedlist"><ul type="disc"><li><p> The operating system time affects the value that the MySQL server uses for times if its time zone is set to <code class="literal">SYSTEM</code>. Make sure that your operating system is using the latest time zone information. For most operating systems, the latest update or service pack prepares your system for the time changes. Check the Web site for your operating system vendor for an update that addresses the time changes. </p></li><li><p> If you replace the system's <code class="filename">/etc/localtime</code> timezone file with a verion that uses rules differing from those in effect at <span><strong class="command">mysqld</strong></span> startup, you should restart <span><strong class="command">mysqld</strong></span> so that it uses the updated rules. Otherwise, <span><strong class="command">mysqld</strong></span> might not notice when the system changes its time. </p></li><li><p> If you use named time zones with MySQL, make sure that the time zone tables in the <code class="literal">mysql</code> database are up to date. If your system has its own zoneinfo database, you should reload the MySQL time zone tables whenever the zoneinfo database is updated, using the instructions given earlier in this section. For systems that do not have their own zoneinfo database, check the MySQL Developer Zone for updates. When a new update is available, download it and use it to replace your current time zone tables. <span><strong class="command">mysqld</strong></span> caches time zone information that it looks up, so after replacing the time zone tables, you should restart <span><strong class="command">mysqld</strong></span> to make sure that it does not continue to serve outdated time zone data. </p></li></ul></div><p> If you are uncertain whether named time zones are available, for use either as the server's time zone setting or by clients that set their own time zone, check whether your time zone tables are empty. The following query determines whether the table that contains time zone names has any rows: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SELECT COUNT(*) FROM mysql.time_zone_name;</code></strong> +----------+ | COUNT(*) | +----------+ | 0 | +----------+ </pre><p> A count of zero indicates that the table is empty. In this case, no one can be using named time zones, and you don't need to update the tables. A count greater than zero indicates that the table is not empty and that its contents are available to be used for named time zone support. In this case, you should be sure to reload your time zone tables so that anyone who uses named time zones will get correct query results. </p><p> To check whether your MySQL installation is updated properly for a change in Daylight Saving Time rules, use a test like the one following. The example uses values that are appropriate for the 2007 DST 1-hour change that occurs in the United States on March 11 at 2 a.m. </p><p> The test uses these two queries: </p><pre class="programlisting">SELECT CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central'); SELECT CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central'); </pre><p> The two time values indicate the times at which the DST change occurs, and the use of named time zones requires that the time zone tables be used. The desired result is that both queries return the same result (the input time, converted to the equivalent value in the 'US/Central' time zone). </p><p> Before updating the time zone tables, you would see an incorrect result like this: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SELECT CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central');</code></strong> +------------------------------------------------------------+ | CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central') | +------------------------------------------------------------+ | 2007-03-11 01:00:00 | +------------------------------------------------------------+ mysql> <strong class="userinput"><code>SELECT CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central');</code></strong> +------------------------------------------------------------+ | CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central') | +------------------------------------------------------------+ | 2007-03-11 02:00:00 | +------------------------------------------------------------+ </pre><p> After updating the tables, you should see the correct result: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SELECT CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central');</code></strong> +------------------------------------------------------------+ | CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central') | +------------------------------------------------------------+ | 2007-03-11 01:00:00 | +------------------------------------------------------------+ mysql> <strong class="userinput"><code>SELECT CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central');</code></strong> +------------------------------------------------------------+ | CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central') | +------------------------------------------------------------+ | 2007-03-11 01:00:00 | +------------------------------------------------------------+ </pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="locale-support"></a>9.10. MySQL Server Locale Support</h2></div></div></div><p> Beginning with MySQL 5.0.25, the locale indicated by the <code class="literal">lc_time_names</code> system variable controls the language used to display day and month names and abbreviations. This variable affects the output from the <a href="functions.html#function_date-format"><code class="literal">DATE_FORMAT()</code></a>, <a href="functions.html#function_dayname"><code class="literal">DAYNAME()</code></a> and <a href="functions.html#function_monthname"><code class="literal">MONTHNAME()</code></a> functions. </p><p> Locale names are POSIX-style values such as <code class="literal">'ja_JP'</code> or <code class="literal">'pt_BR'</code>. The default value is <code class="literal">'en_US'</code> regardless of your system's locale setting, but any client can examine or set its <code class="literal">lc_time_names</code> value as shown in the following example: </p><pre class="programlisting">mysql> <strong class="userinput"><code>SET NAMES 'utf8';</code></strong> Query OK, 0 rows affected (0.09 sec) mysql> <strong class="userinput"><code>SELECT @@lc_time_names;</code></strong> +-----------------+ | @@lc_time_names | +-----------------+ | en_US | +-----------------+ 1 row in set (0.00 sec) mysql> <strong class="userinput"><code>SELECT DAYNAME('2010-01-01'), MONTHNAME('2010-01-01');</code></strong> +-----------------------+-------------------------+ | DAYNAME('2010-01-01') | MONTHNAME('2010-01-01') | +-----------------------+-------------------------+ | Friday | January | +-----------------------+-------------------------+ 1 row in set (0.00 sec) mysql> <strong class="userinput"><code>SELECT DATE_FORMAT('2010-01-01','%W %a %M %b');</code></strong> +-----------------------------------------+ | DATE_FORMAT('2010-01-01','%W %a %M %b') | +-----------------------------------------+ | Friday Fri January Jan | +-----------------------------------------+ 1 row in set (0.00 sec) mysql> <strong class="userinput"><code>SET lc_time_names = 'es_MX';</code></strong> Query OK, 0 rows affected (0.00 sec) mysql> <strong class="userinput"><code>SELECT @@lc_time_names;</code></strong> +-----------------+ | @@lc_time_names | +-----------------+ | es_MX | +-----------------+ 1 row in set (0.00 sec) mysql> <strong class="userinput"><code>SELECT DAYNAME('2010-01-01'), MONTHNAME('2010-01-01');</code></strong> +-----------------------+-------------------------+ | DAYNAME('2010-01-01') | MONTHNAME('2010-01-01') | +-----------------------+-------------------------+ | viernes | enero | +-----------------------+-------------------------+ 1 row in set (0.00 sec) mysql> <strong class="userinput"><code>SELECT DATE_FORMAT('2010-01-01','%W %a %M %b');</code></strong> +-----------------------------------------+ | DATE_FORMAT('2010-01-01','%W %a %M %b') | +-----------------------------------------+ | viernes vie enero ene | +-----------------------------------------+ 1 row in set (0.00 sec) </pre><p> The day or month name for each of the affected functions is converted from <code class="literal">utf8</code> to the character set indicated by the <code class="literal">character_set_connection</code> system variable. </p><p> <code class="literal">lc_time_names</code> may be set to any of the following locale values: </p><div class="informaltable"><table border="1"><colgroup><col><col></colgroup><tbody><tr><td><code class="literal">ar_AE</code>: Arabic - United Arab Emirates</td><td><code class="literal">ar_BH</code>: Arabic - Bahrain</td></tr><tr><td><code class="literal">ar_DZ</code>: Arabic - Algeria</td><td><code class="literal">ar_EG</code>: Arabic - Egypt</td></tr><tr><td><code class="literal">ar_IN</code>: Arabic - Iran</td><td><code class="literal">ar_IQ</code>: Arabic - Iraq</td></tr><tr><td><code class="literal">ar_JO</code>: Arabic - Jordan</td><td><code class="literal">ar_KW</code>: Arabic - Kuwait</td></tr><tr><td><code class="literal">ar_LB</code>: Arabic - Lebanon</td><td><code class="literal">ar_LY</code>: Arabic - Libya</td></tr><tr><td><code class="literal">ar_MA</code>: Arabic - Morocco</td><td><code class="literal">ar_OM</code>: Arabic - Oman</td></tr><tr><td><code class="literal">ar_QA</code>: Arabic - Qatar</td><td><code class="literal">ar_SA</code>: Arabic - Saudi Arabia</td></tr><tr><td><code class="literal">ar_SD</code>: Arabic - Sudan</td><td><code class="literal">ar_SY</code>: Arabic - Syria</td></tr><tr><td><code class="literal">ar_TN</code>: Arabic - Tunisia</td><td><code class="literal">ar_YE</code>: Arabic - Yemen</td></tr><tr><td><code class="literal">be_BY</code>: Belarusian - Belarus</td><td><code class="literal">bg_BG</code>: Bulgarian - Bulgaria</td></tr><tr><td><code class="literal">ca_ES</code>: Catalan - Catalan</td><td><code class="literal">cs_CZ</code>: Czech - Czech Republic</td></tr><tr><td><code class="literal">da_DK</code>: Danish - Denmark</td><td><code class="literal">de_AT</code>: German - Austria</td></tr><tr><td><code class="literal">de_BE</code>: German - Belgium</td><td><code class="literal">de_CH</code>: German - Switzerland</td></tr><tr><td><code class="literal">de_DE</code>: German - Germany</td><td><code class="literal">de_LU</code>: German - Luxembourg</td></tr><tr><td><code class="literal">EE</code>: Estonian - Estonia</td><td><code class="literal">en_AU</code>: English - Australia</td></tr><tr><td><code class="literal">en_CA</code>: English - Canada</td><td><code class="literal">en_GB</code>: English - United Kingdom</td></tr><tr><td><code class="literal">en_IN</code>: English - India</td><td><code class="literal">en_NZ</code>: English - New Zealand</td></tr><tr><td><code class="literal">en_PH</code>: English - Philippines</td><td><code class="literal">en_US</code>: English - United States</td></tr><tr><td><code class="literal">en_ZA</code>: English - South Africa</td><td><code class="literal">en_ZW</code>: English - Zimbabwe</td></tr><tr><td><code class="literal">es_AR</code>: Spanish - Argentina</td><td><code class="literal">es_BO</code>: Spanish - Bolivia</td></tr><tr><td><code class="literal">es_CL</code>: Spanish - Chile</td><td><code class="literal">es_CO</code>: Spanish - Columbia</td></tr><tr><td><code class="literal">es_CR</code>: Spanish - Costa Rica</td><td><code class="literal">es_DO</code>: Spanish - Dominican Republic</td></tr><tr><td><code class="literal">es_EC</code>: Spanish - Ecuador</td><td><code class="literal">es_ES</code>: Spanish - Spain</td></tr><tr><td><code class="literal">es_GT</code>: Spanish - Guatemala</td><td><code class="literal">es_HN</code>: Spanish - Honduras</td></tr><tr><td><code class="literal">es_MX</code>: Spanish - Mexico</td><td><code class="literal">es_NI</code>: Spanish - Nicaragua</td></tr><tr><td><code class="literal">es_PA</code>: Spanish - Panama</td><td><code class="literal">es_PE</code>: Spanish - Peru</td></tr><tr><td><code class="literal">es_PR</code>: Spanish - Puerto Rico</td><td><code class="literal">es_PY</code>: Spanish - Paraguay</td></tr><tr><td><code class="literal">es_SV</code>: Spanish - El Salvador</td><td><code class="literal">es_US</code>: Spanish - United States</td></tr><tr><td><code class="literal">es_UY</code>: Spanish - Uruguay</td><td><code class="literal">es_VE</code>: Spanish - Venezuela</td></tr><tr><td><code class="literal">eu_ES</code>: Basque - Basque</td><td><code class="literal">fi_FI</code>: Finnish - Finland</td></tr><tr><td><code class="literal">fo_FO</code>: Faroese - Faroe Islands</td><td><code class="literal">fr_BE</code>: French - Belgium</td></tr><tr><td><code class="literal">fr_CA</code>: French - Canada</td><td><code class="literal">fr_CH</code>: French - Switzerland</td></tr><tr><td><code class="literal">fr_FR</code>: French - France</td><td><code class="literal">fr_LU</code>: French - Luxembourg</td></tr><tr><td><code class="literal">gl_ES</code>: Galician - Galician</td><td><code class="literal">gu_IN</code>: Gujarati - India</td></tr><tr><td><code class="literal">he_IL</code>: Hebrew - Israel</td><td><code class="literal">hi_IN</code>: Hindi - India</td></tr><tr><td><code class="literal">hr_HR</code>: Croatian - Croatia</td><td><code class="literal">hu_HU</code>: Hungarian - Hungary</td></tr><tr><td><code class="literal">id_ID</code>: Indonesian - Indonesia</td><td><code class="literal">is_IS</code>: Icelandic - Iceland</td></tr><tr><td><code class="literal">it_CH</code>: Italian - Switzerland</td><td><code class="literal">it_IT</code>: Italian - Italy</td></tr><tr><td><code class="literal">ja_JP</code>: Japanese - Japan</td><td><code class="literal">ko_KR</code>: Korean - Korea</td></tr><tr><td><code class="literal">lt_LT</code>: Lithuanian - Lithuania</td><td><code class="literal">lv_LV</code>: Latvian - Latvia</td></tr><tr><td><code class="literal">mk_MK</code>: Macedonian - FYROM</td><td><code class="literal">mn_MN</code>: Mongolia - Mongolian</td></tr><tr><td><code class="literal">ms_MY</code>: Malay - Malaysia</td><td><code class="literal">nb_NO</code>: Norwegian(Bokml) - Norway</td></tr><tr><td><code class="literal">nl_BE</code>: Dutch - Belgium</td><td><code class="literal">nl_NL</code>: Dutch - The Netherlands</td></tr><tr><td><code class="literal">no_NO</code>: Norwegian - Norway</td><td><code class="literal">pl_PL</code>: Polish - Poland</td></tr><tr><td><code class="literal">pt_BR</code>: Portugese - Brazil</td><td><code class="literal">pt_PT</code>: Portugese - Portugal</td></tr><tr><td><code class="literal">ro_RO</code>: Romanian - Romania</td><td><code class="literal">ru_RU</code>: Russian - Russia</td></tr><tr><td><code class="literal">ru_UA</code>: Russian - Ukraine</td><td><code class="literal">sk_SK</code>: Slovak - Slovakia</td></tr><tr><td><code class="literal">sl_SI</code>: Slovenian - Slovenia</td><td><code class="literal">sq_AL</code>: Albanian - Albania</td></tr><tr><td><code class="literal">sr_YU</code>: Serbian - Yugoslavia</td><td><code class="literal">sv_FI</code>: Swedish - Finland</td></tr><tr><td><code class="literal">sv_SE</code>: Swedish - Sweden</td><td><code class="literal">ta_IN</code>: Tamil - India</td></tr><tr><td><code class="literal">te_IN</code>: Telugu - India</td><td><code class="literal">th_TH</code>: Thai - Thailand</td></tr><tr><td><code class="literal">tr_TR</code>: Turkish - Turkey</td><td><code class="literal">uk_UA</code>: Ukrainian - Ukraine</td></tr><tr><td><code class="literal">ur_PK</code>: Urdu - Pakistan</td><td><code class="literal">vi_VN</code>: Vietnamese - Vietnam</td></tr><tr><td><code class="literal">zh_CN</code>: Chinese - Peoples Republic of China</td><td><code class="literal">zh_HK</code>: Chinese - Hong Kong SAR</td></tr><tr><td><code class="literal">zh_TW</code>: Chinese - Taiwan</td><td> </td></tr></tbody></table></div><p> <code class="literal">lc_time_names</code> currently does not affect the <a href="functions.html#function_str-to-date"><code class="literal">STR_TO_DATE()</code></a> or <a href="functions.html#function_get-format"><code class="literal">GET_FORMAT()</code></a> function. </p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="language-structure.html">Prev</a> </td><td width="20%" align="center"> </td><td width="40%" align="right"> <a accesskey="n" href="data-types.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 8. Language Structure </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> Chapter 10. Data Types</td></tr></table></div></body></html>