<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <HTML ><HEAD ><TITLE >Localization and TDS 7.0</TITLE ><META NAME="GENERATOR" CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK REL="HOME" TITLE="FreeTDS User Guide" HREF="index.htm"><LINK REL="UP" TITLE="Advanced Configurations" HREF="configs.htm"><LINK REL="PREVIOUS" TITLE="Advanced Configurations" HREF="configs.htm"><LINK REL="NEXT" TITLE="Domain Logins" HREF="domains.htm"><LINK REL="STYLESHEET" TYPE="text/css" HREF="userguide.css"><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8"></HEAD ><BODY CLASS="SECT1" BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#840084" ALINK="#0000FF" ><DIV CLASS="NAVHEADER" ><TABLE SUMMARY="Header navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TH COLSPAN="3" ALIGN="center" ><SPAN CLASS="PRODUCTNAME" >FreeTDS</SPAN > User Guide: A Guide to Installing, Configuring, and Running <SPAN CLASS="PRODUCTNAME" >FreeTDS</SPAN ></TH ></TR ><TR ><TD WIDTH="10%" ALIGN="left" VALIGN="bottom" ><A HREF="configs.htm" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="80%" ALIGN="center" VALIGN="bottom" >Chapter 5. Advanced Configurations</TD ><TD WIDTH="10%" ALIGN="right" VALIGN="bottom" ><A HREF="domains.htm" ACCESSKEY="N" >Next</A ></TD ></TR ></TABLE ><HR ALIGN="LEFT" WIDTH="100%"></DIV ><DIV CLASS="SECT1" ><H1 CLASS="SECT1" ><A NAME="LOCALIZATION" >Localization and <ACRONYM CLASS="ACRONYM" >TDS</ACRONYM > 7.0</A ></H1 ><P ><ACRONYM CLASS="ACRONYM" >TDS</ACRONYM > 7.0 uses 2-byte Unicode (technically, <ACRONYM CLASS="ACRONYM" >UCS-2</ACRONYM >) to transfer character data between servers and clients. Included in <SPAN CLASS="QUOTE" >"character data"</SPAN > are query text (i.e., <ACRONYM CLASS="ACRONYM" >SQL</ACRONYM >), metadata (table names and such), and <I CLASS="FOREIGNPHRASE" >bona fide</I > data of datatypes <TT CLASS="LITERAL" >nchar</TT >, <TT CLASS="LITERAL" >nvarchar</TT >, and <TT CLASS="LITERAL" >ntext</TT >. (Background information on Unicode and how it affects <SPAN CLASS="PRODUCTNAME" >FreeTDS</SPAN > can be found in the <A HREF="unicode.htm" >appendix</A >.)</P ><P >Because most Unix tools and environments do not support <ACRONYM CLASS="ACRONYM" >UCS-2</ACRONYM >, <SPAN CLASS="PRODUCTNAME" >FreeTDS</SPAN > provides for conversion by the client to other character sets. The mechanism used is determined by the <TT CLASS="FILENAME" >configure</TT > script, which looks for a <CODE CLASS="FUNCTION" >iconv(3)</CODE > function, an implementation of the <A HREF="http://www.opengroup.org/onlinepubs/7908799/xsh/iconv.html" TARGET="_top" >iconv</A > standard. If no <CODE CLASS="FUNCTION" >iconv</CODE > library is found, or if it is explicitly disabled, <SPAN CLASS="PRODUCTNAME" >FreeTDS</SPAN > will use its built-in <CODE CLASS="FUNCTION" >iconv</CODE > substitute, and will be capable of converting among only <ACRONYM CLASS="ACRONYM" >ISO 8859-1</ACRONYM >, <ACRONYM CLASS="ACRONYM" >UTF-8</ACRONYM >, and <ACRONYM CLASS="ACRONYM" >UCS-2</ACRONYM >.</P ><P >To learn what character set the client wants, <SPAN CLASS="PRODUCTNAME" >FreeTDS</SPAN > prefers the applicable <A HREF="freetdsconf.htm#CLIENTCHARSET" ><TT CLASS="FILENAME" >freetds.conf</TT ></A > <TT CLASS="LITERAL" >client charset</TT > property. If that is not set, it parses the <CODE CLASS="ENVAR" >LANG</CODE > environment variable. In either case, the found string is passed to <CODE CLASS="FUNCTION" >iconv</CODE >(3) (or its built-in replacment). <A NAME="AEN2150" HREF="#FTN.AEN2150" ><SPAN CLASS="footnote" >[1]</SPAN ></A >. If neither is found, <ACRONYM CLASS="ACRONYM" >UCS-2</ACRONYM > data are converted to <ACRONYM CLASS="ACRONYM" >ISO 8859-1</ACRONYM >.</P ><P >To list all supported iconv character sets try <B CLASS="COMMAND" >iconv</B >(1). GNU's does:</P ><PRE CLASS="SCREEN" > <SAMP CLASS="PROMPT" >$ </SAMP ><KBD CLASS="USERINPUT" >iconv --list</KBD ></PRE ><P >For other systems, consult your documentation (most likely <B CLASS="COMMAND" >man iconv</B > will give you some hints).</P ><P >In this example a server named <TT CLASS="LITERAL" >mssql</TT > will return data encoded in the GREEK character set.</P ><DIV CLASS="EXAMPLE" ><A NAME="E.G.GREEK" ></A ><P ><B >Example 5-2. Configuring for GREEK <TT CLASS="FILENAME" >freetds.conf</TT > setting</B ></P ><PRE CLASS="PROGRAMLISTING" > [mssql] host = ntbox.mydomain.com port = 1433 tds version = 7.0 client charset = GREEK </PRE ></DIV ><P >If <SPAN CLASS="PRODUCTNAME" >FreeTDS</SPAN > runs into a character it can not convert, its behavior varies according to the severity of the problem. On retrieving data from the server, <SPAN CLASS="PRODUCTNAME" >FreeTDS</SPAN > substitutes an <ACRONYM CLASS="ACRONYM" >ASCII</ACRONYM > '?' in the character's place, and emits a warning message stating that some characters could not be converted. On sending data to the server, <SPAN CLASS="PRODUCTNAME" >FreeTDS</SPAN > aborts the query and emits an error message. It is well to ensure that the data contained in the database is representable in the client's character set.</P ><P >If you have a mix of character data that can not be contained in a single-byte character set, you may wish to use <ACRONYM CLASS="ACRONYM" >UTF-8</ACRONYM >. <ACRONYM CLASS="ACRONYM" >UTF-8</ACRONYM > is a variable length unicode encoding that is compatible with <ACRONYM CLASS="ACRONYM" >ASCII</ACRONYM > in the range 0 to 127. With <ACRONYM CLASS="ACRONYM" >UTF-8</ACRONYM >, you are guaranteed to never have an unconvertible character.</P ><DIV CLASS="IMPORTANT" ><P ></P ><TABLE CLASS="IMPORTANT" WIDTH="100%" BORDER="0" ><TR ><TD WIDTH="25" ALIGN="CENTER" VALIGN="TOP" ><IMG SRC="../images/important.gif" HSPACE="5" ALT="Important"></TD ><TD ALIGN="LEFT" VALIGN="TOP" ><P ><SPAN CLASS="PRODUCTNAME" >FreeTDS</SPAN > is not fully compatible with multi-byte character sets such as <ACRONYM CLASS="ACRONYM" >UCS-2</ACRONYM >. You must use an ASCII-extension charset (e.g., UTF-8, ISO-8859-*)<A NAME="AEN2184" HREF="#FTN.AEN2184" ><SPAN CLASS="footnote" >[2]</SPAN ></A >. Great care should be taken testing applications using these encodings. Specifically, many applications do not expect the number of characters returned to exceed the column size (in bytes).</P ></TD ></TR ></TABLE ></DIV ><P >In the following example, a server named <TT CLASS="LITERAL" >mssql</TT > will return data encoded in the <ACRONYM CLASS="ACRONYM" >UTF-8</ACRONYM > character set.</P ><DIV CLASS="EXAMPLE" ><A NAME="E.G.UTF8" ></A ><P ><B >Example 5-3. Configuring for <ACRONYM CLASS="ACRONYM" >UTF-8</ACRONYM > <TT CLASS="FILENAME" >freetds.conf</TT > setting</B ></P ><PRE CLASS="PROGRAMLISTING" > [mssql] host = ntbox.mydomain.com port = 1433 tds version = 7.0 client charset = UTF-8 </PRE ></DIV ><P >It is also worth clarifying that <ACRONYM CLASS="ACRONYM" >TDS 7.0</ACRONYM > and above do not accept any specified character set during login, as 4.2 does. A <ACRONYM CLASS="ACRONYM" >TDS 7.0</ACRONYM > login packet uses <ACRONYM CLASS="ACRONYM" >UCS-2</ACRONYM >.</P ><DIV CLASS="SECT2" ><H2 CLASS="SECT2" ><A NAME="LOCALIZATION.SERVERNOTE" >Microsoft Server Note</A ></H2 ><P >String literals in SQL must be prefixed with 'N' unless the enclosed string can be represented in the server's <SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >single-byte</I ></SPAN > character set, irrespective of the column's datatype. For example, in the SQL statement <DIV CLASS="INFORMALEXAMPLE" ><P ></P ><A NAME="AEN2202" ></A ><PRE CLASS="SCREEN" > INSERT INTO tablename (greeting) VALUES ('Hallå')</PRE ><P ></P ></DIV > the string is subject to somewhat surprising treatment by the server.</P ><P >When the server parses the SQL, it extracts the data values for insertion (or update, or comparison, etc.) Unprefixed strings are converted to the single-byte character set of the server/database.<A NAME="AEN2205" HREF="#FTN.AEN2205" ><SPAN CLASS="footnote" >[3]</SPAN ></A > Inserted data are then of course stored in the column. In the case of UCS-2 columns — <TT CLASS="LITERAL" >nchar</TT >, <TT CLASS="LITERAL" >nvarchar</TT >, and <TT CLASS="LITERAL" >ntext</TT > — the value stored is that which results from a <SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >second</I ></SPAN > conversion: from the single-byte form to the <ACRONYM CLASS="ACRONYM" >UCS-2</ACRONYM > form.</P ><P >The <SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >only</I ></SPAN > safe way to enclose strings in SQL text is with an 'N' prefix: <DIV CLASS="INFORMALEXAMPLE" ><P ></P ><A NAME="AEN2214" ></A ><PRE CLASS="SCREEN" > INSERT INTO tablename (greeting) VALUES (N'Hallå')</PRE ><P ></P ></DIV > </P ><H3 CLASS="BRIDGEHEAD" ><A NAME="AEN2216" ></A >Commentary</H3 ><P >What's surprising about this? Versions 7.0 and later of the TDS protocol use UCS-2 to send SQL text. No matter how your local client is configured — with <ACRONYM CLASS="ACRONYM" >UCS-2</ACRONYM > or <ACRONYM CLASS="ACRONYM" >ISO 8859-1</ACRONYM > or anything else — it's converted to <ACRONYM CLASS="ACRONYM" >UCS-2</ACRONYM > before it's sent to the server. And obviously arrives at the server as <ACRONYM CLASS="ACRONYM" >UCS-2</ACRONYM >. If the column into which it's being inserted is also <ACRONYM CLASS="ACRONYM" >UCS-2</ACRONYM >, there's no need of <SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >any</I ></SPAN > conversion, much less two, and <SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >certainly</I ></SPAN > no need to lose infomation.</P ><P >Why this happens is anyone's guess. Here's one: it makes the datatype of the column unimportant. Regardless of whether you use char/varchar/text or nchar/nvarchar/ntext or a mixture of the two, the arriving SQL (if naïvely written) will store exactly the same characters.</P ></DIV ></DIV ><H3 CLASS="FOOTNOTES" >Notes</H3 ><TABLE BORDER="0" CLASS="FOOTNOTES" WIDTH="100%" ><TR ><TD ALIGN="LEFT" VALIGN="TOP" WIDTH="5%" ><A NAME="FTN.AEN2150" HREF="localization.htm#AEN2150" ><SPAN CLASS="footnote" >[1]</SPAN ></A ></TD ><TD ALIGN="LEFT" VALIGN="TOP" WIDTH="95%" ><P >The built-in replacement expects GNU iconv names: <TT CLASS="LITERAL" >ISO-8859-1</TT >, <TT CLASS="LITERAL" >US-ASCII</TT >, or <TT CLASS="LITERAL" >UTF-8</TT >.</P ></TD ></TR ><TR ><TD ALIGN="LEFT" VALIGN="TOP" WIDTH="5%" ><A NAME="FTN.AEN2184" HREF="localization.htm#AEN2184" ><SPAN CLASS="footnote" >[2]</SPAN ></A ></TD ><TD ALIGN="LEFT" VALIGN="TOP" WIDTH="95%" ><P >not EBCDIC or other weird charsets</P ></TD ></TR ><TR ><TD ALIGN="LEFT" VALIGN="TOP" WIDTH="5%" ><A NAME="FTN.AEN2205" HREF="localization.htm#AEN2205" ><SPAN CLASS="footnote" >[3]</SPAN ></A ></TD ><TD ALIGN="LEFT" VALIGN="TOP" WIDTH="95%" ><P >The precise rules are unknown to the author.</P ></TD ></TR ></TABLE ><DIV CLASS="NAVFOOTER" ><HR ALIGN="LEFT" WIDTH="100%"><TABLE SUMMARY="Footer navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" ><A HREF="configs.htm" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" ><A HREF="index.htm" ACCESSKEY="H" >Home</A ></TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" ><A HREF="domains.htm" ACCESSKEY="N" >Next</A ></TD ></TR ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" >Advanced Configurations</TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" ><A HREF="configs.htm" ACCESSKEY="U" >Up</A ></TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" >Domain Logins</TD ></TR ></TABLE ></DIV ></BODY ></HTML >