

distrib > Fedora > 14 > i386 > by-pkgid > aad95ed02015570e8e657e9b095a0226 > files > 567


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

<html xmlns="">
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Unicode Objects and Codecs &mdash; Python v2.7 documentation</title>
    <link rel="stylesheet" href="../_static/default.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    <script type="text/javascript">
        URL_ROOT:    '../',
        VERSION:     '2.7',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
    <script type="text/javascript" src="../_static/jquery.js"></script>
    <script type="text/javascript" src="../_static/underscore.js"></script>
    <script type="text/javascript" src="../_static/doctools.js"></script>
    <link rel="search" type="application/opensearchdescription+xml"
          title="Search within Python v2.7 documentation"
    <link rel="author" title="About these documents" href="../about.html" />
    <link rel="copyright" title="Copyright" href="../copyright.html" />
    <link rel="top" title="Python v2.7 documentation" href="../index.html" />
    <link rel="up" title="Concrete Objects Layer" href="concrete.html" />
    <link rel="next" title="Buffers and Memoryview Objects" href="buffer.html" />
    <link rel="prev" title="String/Bytes Objects" href="string.html" />
    <link rel="shortcut icon" type="image/png" href="../_static/py.png" />

    <div class="related">
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
        <li class="right" >
          <a href="../py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="buffer.html" title="Buffers and Memoryview Objects"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="string.html" title="String/Bytes Objects"
             accesskey="P">previous</a> |</li>
        <li><img src="../_static/py.png" alt=""
                 style="vertical-align: middle; margin-top: -1px"/></li>
        <li><a href="../index.html">Python v2.7 documentation</a> &raquo;</li>

          <li><a href="index.html" >Python/C API Reference Manual</a> &raquo;</li>
          <li><a href="concrete.html" accesskey="U">Concrete Objects Layer</a> &raquo;</li> 

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
  <div class="section" id="unicode-objects-and-codecs">
<span id="unicodeobjects"></span><h1>Unicode Objects and Codecs<a class="headerlink" href="#unicode-objects-and-codecs" title="Permalink to this headline">¶</a></h1>
<div class="section" id="unicode-objects">
<h2>Unicode Objects<a class="headerlink" href="#unicode-objects" title="Permalink to this headline">¶</a></h2>
<div class="section" id="unicode-type">
<h3>Unicode Type<a class="headerlink" href="#unicode-type" title="Permalink to this headline">¶</a></h3>
<p>These are the basic Unicode object types used for the Unicode implementation in
<p>Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
this in mind when writing extensions or interfaces.</p>
<p>The following APIs are really C macros and can be used to do fast checks and to
access internal read-only data of Unicode objects:</p>
<div class="section" id="unicode-character-properties">
<h3>Unicode Character Properties<a class="headerlink" href="#unicode-character-properties" title="Permalink to this headline">¶</a></h3>
<p>Unicode provides many different character properties. The most often needed ones
are available through these macros which are mapped to C functions depending on
the Python configuration.</p>
<p>These APIs can be used for fast direct character conversions:</p>
<div class="section" id="plain-py-unicode">
<h3>Plain Py_UNICODE<a class="headerlink" href="#plain-py-unicode" title="Permalink to this headline">¶</a></h3>
<p>To create Unicode objects and access their basic sequence properties, use these
<p>If the platform supports <a href="#id1"><span class="problematic" id="id2">:ctype:`wchar_t`</span></a> and provides a header file wchar.h,
Python can interface directly to this type using the following functions.
Support is optimized if Python&#8217;s own <a href="#id3"><span class="problematic" id="id4">:ctype:`Py_UNICODE`</span></a> type is identical to
the system&#8217;s <a href="#id5"><span class="problematic" id="id6">:ctype:`wchar_t`</span></a>.</p>
<div class="section" id="wchar-t-support">
<h3>wchar_t Support<a class="headerlink" href="#wchar-t-support" title="Permalink to this headline">¶</a></h3>
<p>wchar_t support for platforms which support it:</p>
<div class="section" id="built-in-codecs">
<span id="builtincodecs"></span><h2>Built-in Codecs<a class="headerlink" href="#built-in-codecs" title="Permalink to this headline">¶</a></h2>
<p>Python provides a set of built-in codecs which are written in C for speed. All of
these codecs are directly usable via the following functions.</p>
<p>Many of the following APIs take two arguments encoding and errors. These
parameters encoding and errors have the same semantics as the ones of the
built-in <a class="reference internal" href="../library/functions.html#unicode" title="unicode"><tt class="xref py py-func docutils literal"><span class="pre">unicode()</span></tt></a> Unicode object constructor.</p>
<p>Setting encoding to <em>NULL</em> causes the default encoding to be used which is
ASCII.  The file system calls should use <a href="#id7"><span class="problematic" id="id8">:cdata:`Py_FileSystemDefaultEncoding`</span></a>
as the encoding for file names. This variable should be treated as read-only: On
some systems, it will be a pointer to a static string, on others, it will change
at run-time (such as when the application invokes setlocale).</p>
<p>Error handling is set by errors which may also be set to <em>NULL</em> meaning to use
the default handling defined for the codec.  Default error handling for all
built-in codecs is &#8220;strict&#8221; (<a class="reference internal" href="../library/exceptions.html#exceptions.ValueError" title="exceptions.ValueError"><tt class="xref py py-exc docutils literal"><span class="pre">ValueError</span></tt></a> is raised).</p>
<p>The codecs all use a similar interface.  Only deviation from the following
generic ones are documented for simplicity.</p>
<div class="section" id="generic-codecs">
<h3>Generic Codecs<a class="headerlink" href="#generic-codecs" title="Permalink to this headline">¶</a></h3>
<p>These are the generic codec APIs:</p>
<div class="section" id="utf-8-codecs">
<h3>UTF-8 Codecs<a class="headerlink" href="#utf-8-codecs" title="Permalink to this headline">¶</a></h3>
<p>These are the UTF-8 codec APIs:</p>
<div class="section" id="utf-32-codecs">
<h3>UTF-32 Codecs<a class="headerlink" href="#utf-32-codecs" title="Permalink to this headline">¶</a></h3>
<p>These are the UTF-32 codec APIs:</p>
<div class="section" id="utf-16-codecs">
<h3>UTF-16 Codecs<a class="headerlink" href="#utf-16-codecs" title="Permalink to this headline">¶</a></h3>
<p>These are the UTF-16 codec APIs:</p>
<div class="section" id="unicode-escape-codecs">
<h3>Unicode-Escape Codecs<a class="headerlink" href="#unicode-escape-codecs" title="Permalink to this headline">¶</a></h3>
<p>These are the &#8220;Unicode Escape&#8221; codec APIs:</p>
<div class="section" id="raw-unicode-escape-codecs">
<h3>Raw-Unicode-Escape Codecs<a class="headerlink" href="#raw-unicode-escape-codecs" title="Permalink to this headline">¶</a></h3>
<p>These are the &#8220;Raw Unicode Escape&#8221; codec APIs:</p>
<div class="section" id="latin-1-codecs">
<h3>Latin-1 Codecs<a class="headerlink" href="#latin-1-codecs" title="Permalink to this headline">¶</a></h3>
<p>These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
ordinals and only these are accepted by the codecs during encoding.</p>
<div class="section" id="ascii-codecs">
<h3>ASCII Codecs<a class="headerlink" href="#ascii-codecs" title="Permalink to this headline">¶</a></h3>
<p>These are the ASCII codec APIs.  Only 7-bit ASCII data is accepted. All other
codes generate errors.</p>
<div class="section" id="character-map-codecs">
<h3>Character Map Codecs<a class="headerlink" href="#character-map-codecs" title="Permalink to this headline">¶</a></h3>
<p>These are the mapping codec APIs:</p>
<p>This codec is special in that it can be used to implement many different codecs
(and this is in fact what was done to obtain most of the standard codecs
included in the <tt class="xref py py-mod docutils literal"><span class="pre">encodings</span></tt> package). The codec uses mapping to encode and
decode characters.</p>
<p>Decoding mappings must map single string characters to single Unicode
characters, integers (which are then interpreted as Unicode ordinals) or None
(meaning &#8220;undefined mapping&#8221; and causing an error).</p>
<p>Encoding mappings must map single Unicode characters to single string
characters, integers (which are then interpreted as Latin-1 ordinals) or None
(meaning &#8220;undefined mapping&#8221; and causing an error).</p>
<p>The mapping objects provided must only support the __getitem__ mapping
<p>If a character lookup fails with a LookupError, the character is copied as-is
meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
resp. Because of this, mappings only need to contain those mappings which map
characters to different code points.</p>
<p>The following codec API is special in that maps Unicode to Unicode.</p>
<p>These are the MBCS codec APIs. They are currently only available on Windows and
use the Win32 MBCS converters to implement the conversions.  Note that MBCS (or
DBCS) is a class of encodings, not just one.  The target encoding is defined by
the user settings on the machine running the codec.</p>
<div class="section" id="mbcs-codecs-for-windows">
<h3>MBCS codecs for Windows<a class="headerlink" href="#mbcs-codecs-for-windows" title="Permalink to this headline">¶</a></h3>
<div class="section" id="methods-slots">
<h3>Methods &amp; Slots<a class="headerlink" href="#methods-slots" title="Permalink to this headline">¶</a></h3>
<div class="section" id="methods-and-slot-functions">
<span id="unicodemethodsandslots"></span><h2>Methods and Slot Functions<a class="headerlink" href="#methods-and-slot-functions" title="Permalink to this headline">¶</a></h2>
<p>The following APIs are capable of handling Unicode objects and strings on input
(we refer to them as strings in the descriptions) and return Unicode objects or
integers as appropriate.</p>
<p>They all return <em>NULL</em> or <tt class="docutils literal"><span class="pre">-1</span></tt> if an exception occurs.</p>

      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
  <h3><a href="../contents.html">Table Of Contents</a></h3>
<li><a class="reference internal" href="#">Unicode Objects and Codecs</a><ul>
<li><a class="reference internal" href="#unicode-objects">Unicode Objects</a><ul>
<li><a class="reference internal" href="#unicode-type">Unicode Type</a></li>
<li><a class="reference internal" href="#unicode-character-properties">Unicode Character Properties</a></li>
<li><a class="reference internal" href="#plain-py-unicode">Plain Py_UNICODE</a></li>
<li><a class="reference internal" href="#wchar-t-support">wchar_t Support</a></li>
<li><a class="reference internal" href="#built-in-codecs">Built-in Codecs</a><ul>
<li><a class="reference internal" href="#generic-codecs">Generic Codecs</a></li>
<li><a class="reference internal" href="#utf-8-codecs">UTF-8 Codecs</a></li>
<li><a class="reference internal" href="#utf-32-codecs">UTF-32 Codecs</a></li>
<li><a class="reference internal" href="#utf-16-codecs">UTF-16 Codecs</a></li>
<li><a class="reference internal" href="#unicode-escape-codecs">Unicode-Escape Codecs</a></li>
<li><a class="reference internal" href="#raw-unicode-escape-codecs">Raw-Unicode-Escape Codecs</a></li>
<li><a class="reference internal" href="#latin-1-codecs">Latin-1 Codecs</a></li>
<li><a class="reference internal" href="#ascii-codecs">ASCII Codecs</a></li>
<li><a class="reference internal" href="#character-map-codecs">Character Map Codecs</a></li>
<li><a class="reference internal" href="#mbcs-codecs-for-windows">MBCS codecs for Windows</a></li>
<li><a class="reference internal" href="#methods-slots">Methods &amp; Slots</a></li>
<li><a class="reference internal" href="#methods-and-slot-functions">Methods and Slot Functions</a></li>

  <h4>Previous topic</h4>
  <p class="topless"><a href="string.html"
                        title="previous chapter">String/Bytes Objects</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="buffer.html"
                        title="next chapter">Buffers and Memoryview Objects</a></p>
<h3>This Page</h3>
<ul class="this-page-menu">
  <li><a href="../bugs.html">Report a Bug</a></li>
  <li><a href="../_sources/c-api/unicode.txt"
         rel="nofollow">Show Source</a></li>

<div id="searchbox" style="display: none">
  <h3>Quick search</h3>
    <form class="search" action="../search.html" method="get">
      <input type="text" name="q" size="18" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    <p class="searchtip" style="font-size: 90%">
    Enter search terms or a module, class or function name.
<script type="text/javascript">$('#searchbox').show(0);</script>
      <div class="clearer"></div>
    <div class="related">
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
        <li class="right" >
          <a href="../py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="buffer.html" title="Buffers and Memoryview Objects"
             >next</a> |</li>
        <li class="right" >
          <a href="string.html" title="String/Bytes Objects"
             >previous</a> |</li>
        <li><img src="../_static/py.png" alt=""
                 style="vertical-align: middle; margin-top: -1px"/></li>
        <li><a href="../index.html">Python v2.7 documentation</a> &raquo;</li>

          <li><a href="index.html" >Python/C API Reference Manual</a> &raquo;</li>
          <li><a href="concrete.html" >Concrete Objects Layer</a> &raquo;</li> 
    <div class="footer">
    &copy; <a href="../copyright.html">Copyright</a> 1990-2010, Python Software Foundation.
    <br />
    The Python Software Foundation is a non-profit corporation.  
    <a href="">Please donate.</a>
    <br />
    Last updated on Aug 09, 2010.
    <a href="../bugs.html">Found a bug</a>?
    <br />
    Created using <a href="">Sphinx</a> 1.0b2.
