Sophie

Sophie

distrib > Fedora > 14 > x86_64 > by-pkgid > b12db5622413300f28189571403ac3d5 > files > 89

freetds-0.91-1.fc14.x86_64.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<HTML
><HEAD
><TITLE
>ISO 8859: What everyone would like to forget</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK
REL="HOME"
TITLE="FreeTDS User Guide"
HREF="index.htm"><LINK
REL="UP"
TITLE="About Unicode, UCS-2, and UTF-8"
HREF="aboutunicode.htm"><LINK
REL="PREVIOUS"
TITLE="About Unicode, UCS-2, and UTF-8"
HREF="aboutunicode.htm"><LINK
REL="NEXT"
TITLE="Unicode: East meets West"
HREF="unicode.htm"><LINK
REL="STYLESHEET"
TYPE="text/css"
HREF="userguide.css"><META
HTTP-EQUIV="Content-Type"
CONTENT="text/html; charset=utf-8"></HEAD
><BODY
CLASS="SECTION"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
><SPAN
CLASS="PRODUCTNAME"
>FreeTDS</SPAN
> User Guide: A Guide to Installing, Configuring, and Running <SPAN
CLASS="PRODUCTNAME"
>FreeTDS</SPAN
></TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="aboutunicode.htm"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
>Appendix C. About Unicode, UCS-2, and UTF-8</TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="unicode.htm"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECTION"
><H1
CLASS="SECTION"
><A
NAME="ISO8859"
>ISO 8859: What everyone would like to forget</A
></H1
><P
><ACRONYM
CLASS="ACRONYM"
>ASCII</ACRONYM
> won, it would seem, but the race goes not to the swift.  <ACRONYM
CLASS="ACRONYM"
>ASCII</ACRONYM
> has many limitations, the most egregious of which is, it's not much good for anything besides English.  It encodes all the letters and punctuation (almost) of the English alphabet, but is useless for German, Russian, and Greek, to say nothing of Chinese.</P
><P
><ACRONYM
CLASS="ACRONYM"
>ASCII</ACRONYM
> assigns one byte to every character, but deals with only 7 of the 8 available bits, the range 0-127 (with the <SPAN
CLASS="QUOTE"
>"high bit"</SPAN
> always zero).  Demand for computers that could display and print languages besides English &mdash; even English with em dashes and cent (&cent;) signs &mdash; arrived soon enough, with the Marketing Department way out in front of the propeller heads.  The predictable result was an array of <SPAN
CLASS="QUOTE"
>"8-bit <ACRONYM
CLASS="ACRONYM"
>ASCII</ACRONYM
>"</SPAN
> encoding standards for a wide variety of alphabets.  Eventually, they were standardized (or at least enumerated and documented) by the ISO.  These are what our friendly database vendors are referring to when they talk about <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>character sets</I
></SPAN
>.  More information on this subject can be found at <A
HREF="http://www.webreference.com/dlab/books/html/39-1.html"
TARGET="_top"
>webreference.com</A
>.			</P
><P
>The upshot is, there is no uniform standard, no agreement on the meaning of a byte, particularly if that byte's value is greater than 127.  Let's say your client machine sends <TT
CLASS="LITERAL"
>HELLO</TT
> and your database stores it as  <TT
CLASS="LITERAL"
>72 69 76 76 79</TT
>.  When another client retrieves that value, it will convert it into human-readable form by applying an encoding standard.  If everything's tightly wrapped, it will use the very same encoding that your database used (and the same one you had in mind when you sent it), and that client will also see <TT
CLASS="LITERAL"
>HELLO</TT
>.  If things are not so tightly wrapped but that client is fortunate enough to be using a similar standard to what you were using, say, ISO 8859-1, he'll still see <TT
CLASS="LITERAL"
>HELLO</TT
>.  Most languages based on the Roman alphabet can be represented by ISO 8859-1, and are thus interchangeable.  Beyond that, things get quickly messy.  Greek clients, for one, are not so lucky: there are three ISO 8859 standards for Greek, all mutually incompatible.
			
			For more information, see <A
HREF="http://czyborra.com/charsets/iso8859.html"
TARGET="_top"
>ISO 8859 Alphabet Soup</A
>.  Roman Czyborra's site is very informative; take your time there if you don't want your head to spin.</P
><P
>Database servers need to know what encoding standard to employ, too.  It's not obvious at first, but notions like <SPAN
CLASS="QUOTE"
>"uppercase"</SPAN
> and <SPAN
CLASS="QUOTE"
>"lowercase"</SPAN
>, trailing blanks, and collation rules all depend on what letter is meant by what number.  (Collation even depends on what culture is interpreting the letters.)</P
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="aboutunicode.htm"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.htm"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="unicode.htm"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>About Unicode, UCS-2, and UTF-8</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="aboutunicode.htm"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Unicode: East meets West</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>