Sophie

Sophie

distrib > * > 2010.0 > * > by-pkgid > eac5a7c48edfce6f0b51ce3459b7a834 > files > 16

CJK-4.8.1-2mdv2010.0.x86_64.rpm


This is the LaTeX2e style package CJK Version 4.8.1 (10-Aug-2008)
=================================================================

It is freely distributable under the GNU Public License.


        **************************************************
        *                                                *
        * You need LaTeX 2e version 2001/06/01 or newer! *
        *                                                *
        **************************************************


Usage
-----

Use CJK.sty as a package, e.g.,

    \documentclass{article}
    \usepackage[<option>]{CJK}            .

See section `Caveats' below for the available options. Normally, you don't
need them.

Two new environments,

    \begin{CJK}[<fontencoding>]{<encoding>}{<family>}
    ...
    \end{CJK}

and

    \begin{CJK*}[<fontencoding>]{<encoding>}{<family>}
    ...
    \end{CJK*}

are defined. The parameters have the following meaning:

    <encoding>      These character sets and encodings are currently
                    implemented in CJK.enc:

                        Bg5  (For traditional Chinese. Mainly used in Taiwan.
                              Character set: Big 5.
                              Encoding: Big 5 without UDA2 and UDA3.)
                        Bg5+ (For traditional Chinese. Obsolete.
                              Character set: Big 5+.
                              Encoding: GBK.)

                        HK   (For traditional Chinese.  Used in Hong Kong.
                              Character set: Big 5 + HKSCS-2004.
                              Encoding: Full Big 5.)

                        GB   (For simplified Chinese. Mainly used in
                              PR China. Also called `EUC-CN'.
                              Character set: GB 2312-1980.
                              Encoding: EUC.)
                        GBt  (For traditional Chinese. Rarely used in
                              PR China.
                              Character set: GB/T 12345-1990.
                              Encoding: EUC.)
                        GBK  (For Chinese. An extension of GB 2312.
                              Character set: GBK.
                              Encoding: GBK.)

                        JIS  (For Japanese.
                              Character set: JIS X 0208:1997.
                              Encoding: EUC.)
                        JIS2 (Japanese supplementary character set,
                              Character set: JIS X 0212-1990.
                              Encoding: EUC.)
                        SJIS (For Japanese. Used mainly on PCs. Also known
                              as `MS Kanji'.
                              Character sets:
                                1-byte characters from JIS X 0201-1997
                                (half-width katakana),
                                2-byte characters from JIS X 0208:1997.
                              Encoding: SJIS.)

                        KS   (For Korean. Also called `EUC-KR'.
                              Character set: KS X 1001:1992 = KS C 5601-1992.
                              Encoding: EUC.)

                        UTF8 (Unicode Transformation format 8, also called
                              `UTF-2' or `FSS-UTF'.
                              Character set: Unicode.
                              Encoding: UTF-8.)

                        CNS1 (Chinese National Standard Plane 1,
                              Character set: CNS 11643-1992 plane 1.
                              Encoding: EUC.)
                        CNS2
                        ...
                        CNS7 (Character set: CNS 11643-1992 plane 2 - 7.
                              Encoding: EUC.)

                        CEFX (reserved CEF character set for IRIZ.
                              Encoding: EUC.)
                        CEFY (private CEF character set.
                              Encoding: EUC.)

                    Note: The value `HK' can be also used for complete Big 5
                          support which needs user-defined areas 2 and 3
                          (UDA2 and UDA3), located in the ranges
                          0x8E40-0xA0FE and 0x8140-0x8DFE, respectively.

                          For details on HKSCS-2004 see

                            http://www.info.gov.hk/digital21/eng
                                   /hkscs/download/e_sect3_2004.pdf


                    These encodings (except Big 5, Big 5+, HK, GBK, SJIS, and
                    UTF-8) are simplified EUC (Extended Unix Code) character
                    sets without single shifts. The used character set slot
                    G1 stands for two-byte encodings with byte values taken
                    from the GR (Graphic Right) character range 0xA1-0xFE
                    (as defined in ISO 2022).

                    Note that CNS1 and CNS2 contain almost the same
                    characters in the same order as Big 5 (but in EUC).

                    For CEF and CNS character sets see CEF.txt also.

                    Big 5+ and GBK have exactly the same encoding layout
                    (but their origins differ).

                    Additionally, the following encodings *with* single
                    shifts are implemented, using some of the above defined
                    character sets:

                        EUC-JP (for Japanese.
                                Character sets:
                                  Half-width katakana (from JIS X 0201-1997),
                                  JIS X 0208:1997,
                                  JIS X 0212-1992.)

                        EUC-TW (for traditional Chinese.
                                Character sets:
                                  CNS 11643-1992 planes 1-7.)

                    EUC-JP, EUC-TW, and UTF-8 encodings can't be used in
                    preprocessed mode (see below) because it makes no sense.
                    (To be more precise, UTF-8 sequences with more than two
                    bytes can't be used.)


                    If you use this parameter it is the same as you would
                    have used \CJKenc: Writing e.g.,

                      \begin{CJK}{Bg5}{...}
                      ...

                    is identical to

                      \begin{CJK}{}{...}
                      \CJKenc{Bg5}
                      ...

                    Note: A `character set' is an ordered collection of
                          glyphs. The order of the glyphs is just for
                          defining purposes and for reference.

                          An `encoding' is an ordering scheme to access a
                          character set. LaTeX 2e also uses the term `input
                          encoding'.

                          A character set can have many encodings
                          (cf. JIS X 0208 -> EUC, SJIS).

                          An encoding can be used for many character sets
                          (cf. EUC -> KS X 1001, GB 2312, etc.).

                          Sometimes, the character set has the same name
                          as the encoding (Big 5, Big 5+, GBK).

                          For more details I suggest to read the document
                          cjk.inf from Ken Lunde; it is available from

                            ftp://ftp.ora.com/pub/examples/cjkvinfo/
                                              doc/cjk.inf

                          A really thorough reference is his latest book
                          `CJKV Information Processing' (O'Reilly).

                          Throughout this CJK documentation, `encoding'
                          refers to the valid encoding/character set
                          combinations defined just above.

    <fontencoding>  These font encodings are currently defined: `' (empty;
                    the default), `pmC' (available for Bg5, GB, GBt, JIS,
                    and KS), `dnp' (for JIS and SJIS), `wn' (for JIS), and
                    `HL' (for KS).

                    `Font encoding' means the order of characters in the
                    subfonts itselves. A change of the font encoding neither
                    alters the meaning of a CJK character nor changes the
                    character code in the selected encoding.

                    The font encoding `pmC' is defined for compatibility
                    with the pmC package (which is obsolete). It is not
                    encouraged to use this font encoding because of wasting
                    subfonts. If possible, convert your original CJK bitmap
                    fonts with hbf2gf (see hbf2gf.txt) or other tools to CJK
                    encodings.

                    `dnp' implements the character order of the Dai Nippon
                    Printing fonts and is only available for JIS and SJIS
                    encoding. `wn' (only available for JIS) is the font
                    encoding for watanabe jfonts. There exists a linking
                    package which maps the watanabe jfonts onto the dnp
                    naming scheme (thus you can use the real dnp fonts for
                    printing and the mapped jfonts for previewing). See the
                    documentation files in the `japanese' subdirectory for
                    further details.

                    `HL' allows the use of the new HLaTeX fonts (starting
                    with version 1.0); note that the definition of fonts is
                    rather different compared to HLaTeX. See the section
                    `Korean input' below for a detailed description.

                    You can change the font encoding per encoding with the
                    command \CJKfontenc; the first parameter is the
                    encoding, the second the font encoding.

    <family>        It is impossible to know in advance what fonts are
                    available at your site; look at the example FD (font
                    definition) files how to create or modify appropriate FD
                    files suiting your needs. See fonts.txt also for further
                    hints.

                    If this parameter is empty, the default value given in
                    CJK.enc is selected: `song' for all encodings except KS
                    (which defaults to `mj'). If you use this parameter it
                    is the same as you would have used \CJKfamily; all
                    encodings then use this family:

                      \begin{CJK}{...}{song}
                      ...

                    is identical to

                      \begin{CJK}{...}{}
                      \CJKfamily{song}
                      ...

                    You can change the families per encoding (and font
                    encoding) with the command \CJKencfamily; the first
                    parameter is the encoding, the second the family, the
                    optional argument is the font encoding. This overrides
                    the default value.

                    Note that \CJKfamily or a non-empty `family' parameter
                    of the CJK environment overrides any \CJKencfamily
                    commands. Say `\CJKfamily{}' to enable \CJKencfamily
                    again.


    The CJK* environment swallows unprotected spaces and newlines after a
    CJK character (the usual habit for Chinese and Japanese text), whereas
    CJK does not (for European and Korean text). You can switch between
    these two `modes' with \CJKspace (CJK* -> CJK) and \CJKnospace (CJK ->
    CJK*).

    If you use cjk-enc.el, you don't need to specify a CJK environment. This
    is done automatically. See cjk-enc.txt for details.


This is a typical example:

    \begin{CJK*}{GB}{kai}
      ...
      Chinese simplified text in GB encoding
      ...
    \end{CJK*}


How it works
------------

Asian logographs can't be represented completely with one byte per
character. (At least) two bytes are needed, and the most common encoding
schemes (UTF-8, GB, Big 5, JIS, KS, etc.) have a certain range for the first
byte (usually 0xA1-0xFE or a part of it) which signals that this and the
next byte represent an Asian logograph. This means almost all plain ASCII
characters (characters between 0x00 and 0x7E) are left undisturbed, and the
remaining character codes (0x80-0xFF) are assigned to a CJK encoding,
creating a multiple-byte encoding with 1-byte and 2-byte characters (and
even 3-byte and 4-byte characters for UTF-8).

The character 0x7F is reserved also for the CJK package. See the section
`Preprocessors' below.

Encodings like EUC-TW access additional character sets using escape
characters (0x8E and 0x8F) which signals that the next character comes
from another character set (which is `shifted' to the GR range); up to
four bytes are needed for a single character. Example:

  0x8E 0xA3 0xB7 0xCE

    0x8E is a single shift escape character; 0xA3 selects CNS plane 3, and
    0xB7CE is the character code (in GR representation) in this plane.

CJK.sty makes the character codes 0x7F and 0x81-0xFE active inside of the
CJK environment and assigns macros to the active characters which then
select the proper font and character. The real mechanism is a bit more
complex to assure robustness (it was borrowed and modified from LaTeX 2e's
inputenc.sty) and correct handling of punctuation characters.

*   emTeX users: you must activate 8bit input and output while creating the
*   LaTeX2e format file! Do this by using the switches -o and -8 (additional
*   to the iniTeX switch -i).
*
*   Example:
*
*       tex386 -i -o -8 latex.ltx


Some internals
--------------

Internally three levels (bindings, encodings, character macro sets) are
defined:

        active characters
            |
            +--------------> bindings (standard, SJIS, UTF8)
            |
        active character macros
            |
            +--------------> encodings (GB, Big 5, ...) + 
            |                font encodings (none, dnp, wn, pmC, HL)
            |
        subfont selecting macros
            |
            +--------------> character macro sets (standard, Big 5, ...)
            |
        character selecting macros

User-selectable are only the encoding and the font encoding (as explained
above); the other levels are selected by the CJK package.

These levels correspond to the following internal macros:

  \CJK@xxxxBinding (`xxxx.bdg' files):
    Possible values for `xxxx' are: standard, SJIS, UTF8, EUC-JP, and
    EUC-TW.

  \CJK@xxxxEncoding (`xxxx.enc' files):
    Possible values for `xxxx' are: standard, extended, Bg5, SJIS, KS, UTF8,
    pmCsmall, pmCbig, JISdnp, SJISdnp, KSHL, EUC-JP, and EUC-TW.

  \CJK@xxxxChr (`xxxx.chr' files):
    Possible values for `xxxx' are: standard, Bg5, KS, SJIS, UTF8, pmC,
    HLaTeX, EUC-JP, and EUC-TW.

In preprocessed mode (see below), no bindings are used.


And now a more detailed description of the various encodings. Please note
that you should never access these macros directly.

  \CJK@standardEncoding is used for EUC encodings with the first and second
  byte in the range 0xA1-0xFE (GB, GBt, JIS, JIS2, CNS, CEF).

  \CJK@extendedEncoding is used for Big 5+ and GBK encodings. The first byte
  is in the range 0x81-0xFE, the second byte in the range 0x40-0xFE (with a
  gap at 0x7F).

  \CJK@Bg5Encoding is used for Big 5 encoding with the first byte in the
  range 0xA1-0xFE and the second byte in the range 0x40-0xFE (with a gap
  from 0x7F-0xA0).

  \CJK@SJISEncoding is used for SJIS encoding; one-byte characters are in
  the range 0xA1-0xDF, two-byte characters have the first byte in the ranges
  0x81-0x9F and 0xE0-0xEF, the second byte runs from 0x40 to 0xFC except
  0x7F. Since SJIS only squeezes the JIS X 0208 character set into a new
  scheme without changing the ordering, fonts produced by hbf2gf or ttf2pk
  look the same for EUC and SJIS encoding except one-byte SJIS characters.
  For more details see below the section `SJIS encoding'.

  \CJK@KSEncoding is used for the KS X 1001 character set in EUC encoding.
  Two sets of subfonts are defined, one for Hangul syllables and elements,
  and a second for Hanja. For more details see below the section `Korean
  input'.

  \CJK@UTF8Encoding is used for Unicode in UTF-8 encoding. The first byte is
  in the range 0xC0-0xDF for two-byte values, 0xE0-0xEF for three-byte
  values, and 0xF0-0xF4 for four-byte values. The other byte(s) are in the
  range 0x80-0xBF. Note that CJK expects two hexadecimal digits as a running
  number in the font name (as defined in UTF8.enc) instead of two decimal
  digits for subfonts covering characters up to U+FFFF. Subfonts for Unicode
  values greater than 0xFFFF use four hexadecimal digits in the font name.
  Select the option `unicode yes' in the hbf2gf config file if you use
  hbf2gf to transform bitmap fonts in HBF format to PK fonts as used by
  CJK.sty . Three commands (\CJKCJKchar, \CJKhangulchar, and \CJKlatinchar)
  control the handling of intercharacter glue: \CJKCJKchar (the default)
  selects CJK style (using \CJKglue), \CJKhangulchar selects hangul style
  (using \CJKtolerance), and \CJKlatinchar selects none of them. This
  encoding does not work in preprocessed mode.

  \CJK@pmCsmallEncoding and \CJK@pmCbigEncoding can be activated with
  \pmCsmall (this is the default) and \pmCbig inside the CJK environment.
  Note that the original pmC fonts have two character sizes per font (the
  bigger ones with an offset of -128); Bg5pmC encoded fonts cannot contain
  big characters. The names of the fonts in the FD files reflect the
  modifications added by Marc Leisher <mleisher@nmsu.edu> to the original
  poor man's Chinese (pmC) package written by Thomas Ridgeway
  <ridgeway@blackbox.hacc.washington.edu>.

  \CJK@JISdnpEncoding is the JIS X 0208 character set in EUC encoding with
  dnp fonts. The main difference (besides the offsets) is the composition of
  real font names; a dnp font name consists of name stem + subfont name +
  designsize: an example is dmjkata10. Note that the wadalab PS fonts omit
  the designsize part in the font names, thus it is sufficient (and even
  better) to use the `CJK' size functions in FD files instead of the `DNP'
  ones. \CJK@JISwnEncoding is similar to JISdnp encoding but uses Watanabe
  jfonts; \CJK@SJISdnpEncoding maps SJIS onto dnp encoded fonts.

  \CJK@KSHLEncoding finally uses the new fonts of the HLaTeX package for
  Korean; three internal encodings are necessary to represent it. See the
  next section for details.

  \CJK@EUC-TWEncoding and \CJK@EUC-JPEncoding are quite similar to
  \CJK@standardEncoding but implement single shift access additionally. They
  can't be used in preprocessed mode.


Korean input
------------

There is already a package which handles Hangul and Hanja (but no other
CJK character sets): HLaTeX.

To use KS encoding, say

    \begin{CJK}{KS}{}
    ...
    \end{CJK}       .

These font switches are available inside the environment:

    hangul fonts from former hlatex (in the han font packages):

    *   \mj  MyoungJo   (default)
        \gt  Gothic
        \gs  BootGulssi
        \gr  Graphic
        \dr  Dinaru

    hangul fonts from former jhtex (in the han1 font packages):

    *   \hgt Hangul Gothic
    *   \hmj Hangul MyoungJo (MunHwaBu fonts)
    *   \hpg Hangul Pilgi
        \hol Hangul Outline (MyoungJo)


If a font is marked with a star, real bold series are available. All other
bold fonts are defined using poor man's boldface (see below the section
`Poor man's boldface').

See the file INSTALL how to get these fonts. Both `han' and `han1' packages
contain bitmap fonts only (in PK format).

Note that the font switches are abbreviations for \CJKencfamily and not for
\CJKfamily.


For characters with the first byte in the ranges 0xA1-0xAF (except 0xA4) and
0xC9-0xFD (graphic characters, hanja, archaic hangul, etc.) fonts with the
encoding C60 are used. C61 is assigned to hangul fonts (for hangul elements
with the first byte 0xA4 and hangul characters in the range 0xB0-0xC8). This
enables the use of many hangul fonts and perhaps only one or two different
hanja fonts. If you want to use C60 encoding for hangul characters also say
\CJKhanja. The opposite command is \CJKhangul (of course this works only if
you have hangul characters in the C60 font).

Archaic hangul elements (KS X 1001 0xA4D5-0xA4FE) and the character
KS X 1001 0xA4D4 are only accessible if \CJKhanja is active.

You should convert your KS X 1001 hanja fonts using hbf2gf (or ttf2pk) as
described above.


To use HLaTeX fonts, say

  \begin{CJK}[HL]{KS}{}
  ...
  \end{CJK}         .

All HLaTeX fonts are PS fonts; these font switches are available inside the
environment (as defined in HLaTeX 1.0; this differs from older versions):

        \bm     Bom
    *   \dn     Dinaru
    *   \gr     Graphic
  +     \gs     Gungseo
  + *   \gt     Gothic
        \jgt    Jamo Gothic
        \jmj    Jamo Myoungjo
        \jnv    Jamo Novel
        \jsr    Jamo Sora
  + *   \mj     Myoungjo
    *   \pg     Pilgi
        \pga    Pilgia
        \ph     Pen Heulim
        \pn     Pen
  +     \sh     Shinmun Myoungjo
  +     \tz     Typewriter
        \vd     Vada
        \yt     Yetgul

If a font is marked with an asterisk, real bold series are available. All
other fonts are defined using poor man's boldface (see below). Only fonts
marked with a plus sign are available for hanja too; the other font families
are mapped to these six hanja families. For backwards compatibility, \ol and
\sm are defined also; both are now equivalent to \mj.

UN Koaung-Hi <koaunghi@kornet.net>, the author of HLaTeX, defines three
groups of fonts: hangul, hanja, and symbols. The CJK package needs three
internal encodings (C63 for hanja, C64 for symbols, and C65 for hangul) to
represent the font encoding scheme of HLaTeX.

HLaTeX options:

The option `hardbold' has been integrated into the FD files---I consider the
fact whether you have bold series available or not as a fundamental local
font setup decision which should be coded into the FD files and not into the
document. As a consequence you have to change your FD files to emulate the
`softbold' option with CJK's poor man's boldface. Example:

    \DeclareFontShape{C63}{gt}{bx}{n}{<-> CJK * wgtb}{}

should be changed to

    \DeclareFontShape{C63}{gt}{bx}{n}{<-> CJKb * wgt}{\CJKbold} .

and similar font definitions too.

[Well, it is not really necessary to modify the FD files to emulate the
 `softbold' option: just insert the appropriate \DeclareFontShape and/or
 \DeclareFontFamily commands in the preamble of your document.]
 
Finally a warning: Please bear in mind that CJK does not emulate the
behaviour of HLaTeX, it only supports its fonts.


Big 5 encoding
--------------

See below the section `Preprocessors' for the preferred input method using
bg5conv.

The characters `\', `{', and `}' are used as second bytes in the Big 5
encoding. This collides with TeX. If you write Big 5 text mixed with other
encodings (and you don't want/can't use Mule, Emacs or bg5conv), you should
use the Bg5text environment which changes the category codes of these
characters. The command prefix is now the forward slash `/', and the
grouping characters are `(' and `)', respectively.

An example:

            \begin{CJK}{Bg5}{song}
            \begin{Bg5text}
            ...
            /begin(center)
            ...
            /end(center)
            ...
            /end(Bg5text)
            \end{CJK}

To get the `/', `(', and `)' characters, write `//', `/(', and `/)' inside
the Bg5text environment.

This environment is ugly, and some commands like \newcommand don't work in
it. Starting with CJK version 3.0 it is also possible to use different
encodings in preprocessed mode, thus this environment is almost obsolete.

Instead of using the Bg5text environment you can protect the offending
second bytes with a backslash, i.e., `\{', `\}', `\\' (using a non-Chinese
editor). This doesn't increase the readability of the Chinese text, but for
short texts it is perhaps more comfortable. Alas, it doesn't work in page
header commands because the macros `\{', etc., are not expanded.

Be careful not to use any commands inside the Bg5text environment which
write something into an external file (commands like \chapter, etc.).
 
If it is not possible to avoid Big 5 character codes with `\', `{', or `}'
outside of the Bg5text environment (e.g., having Big 5 text in a \chapter or
\section command), you can replace them with the \CJKchar macro manually:

    \section{This is a problematic Big 5 character: \CJKchar{169}{92}}

The parameters are the first and second byte of the Big 5 character code.
You can also use hexadecimal or octal notation. See commands.txt for a full
description of \CJKchar.

An environment `HKtext' similar to `Bg5text' is defined for the `HK'
encoding; the same restrictions as explained above hold.


SJIS encoding
-------------

See below the section `Preprocessors' for the preferred input method using
sjisconv.

Shift-JIS encoding is widely used on PCs for Japanese. A special feature is
the simultaneous use of one-byte and two-byte encoded characters which arose
because of backwards compatibility. The two-byte encoded character set is
completely identical to the JIS X 0208 character set, even the ordering is
the same. Thus there is no need for special two-byte SJIS FD files; the font
definition files for JIS X 0208 are used.

The situation is different for one-byte SJIS characters, the so called
`half-width' Katakana (encoding C49) from JIS X 0201. Usually you should use
full-width Katakana fonts too to get a typographically correct output. The
exception is a typewriter font which should really have only the half width
of normal Kanji or Katakana to represent screen snapshots or similar things.
The use of C49 encoding can be controlled with the \CJKhwkatakana and
\CJKnohwkatakana macros (see commands.txt for more information).

Fonts in C49 encoding scheme must have the character glyphs at the code
points 0xA1-0xDF.

An environment `SJIStext' similar to `Bg5text' is defined; the same
restrictions as explained in the previous section hold.


Big 5+ and GBK encodings
------------------------

See below the section `Preprocessors' for the preferred input method using
extconv.

These relatively new encodings are used in some older MS Windows versions in
Taiwan (Big 5+) and Mainland China (GBK). Both encodings implement the whole
CJK character repertory of Unicode in the Basic Multilingual Plane
(U+4E00-U+9FFF, approx. 21000 characters) and a few other characters but
still try to be backwards compatible. All code points of Big 5 are identical
to the code points in Big 5+, and the same holds for GB 2312-1980 and GBK.
Note that the default CJK font encodings for Big 5+ and Big 5 are *not*
compatible. The same is true for GBK and GB2312.

Two new environments, `Bg5+text' and `GBKtext' similar to `Bg5text' are
defined also; the same restrictions as above hold.
 

CJK captions
------------

To use the supplied caption files you need the koma-script package. The main
reason why I choose these style files instead of the standard classes is the
fact that the author of koma-script is willing to support CJK. On the other
hand, the philosophy of the LaTeX 2e maintainers is not to add new features
to the standard classes.

The koma-script style files are maintained by Markus Kohm
(Markus.Kohm@gmx.de); they are available at the CTAN hosts.


If you say \CJKcaption{<caption>} inside of a CJK environment, the file
<caption>.cpx is loaded (.cpx is a preprocessed version of .cap)

Example:

    \documentclass{scrartcl}%   this is a KOMA-script class
    \usepackage{CJK}

    \begin{document}
    \begin{CJK*}{GB}{kai}
    \CJKcaption{GB}%            loading GB.cpx

    \chapter{blablabla}%        is formatted in Chinese

    ...

    \end{CJK*}    
    \end{document}


Note that for Korean three caption files are available: hanja.cap for
captions using hanja (this corresponds to HLaTeX's `hanja' option) and
two caption files (hangul.cap and hangul2.cap) using hangul.

For GBK encoding use the GB.cap file. Similarly, use Big5.cap for Big 5+
encoding.

In case you want to edit a CAP file, you must create its corresponding
CPX file too. After editing, preprocess the file with

  bg5conv < xxx.cap > xxx.cpx

(for caption files in SJIS encoding use sjisconv instead), then change
the file name identification strings in the CPX file accordingly.

In UTF-8 encoding, the following caption files are available.

    ja        Japanese
    ko-Hang   Korean using Hangul
    ko-Hang2  another version using Hangul
    ko-Hani   Korean using Hanja
    zh-Hans   Chinese simplified
    zh-Hant   Chinese traditional

Since those files are identical to its encoding-specific counterparts,
only CPX versions are provided.


Underlining and other font effects
----------------------------------

Full support for Donald Arseneau's ulem.sty package (beginning with version
2000-05-26) is available by using CJKulem.sty (which loads ulem.sty
automatically). No changes to ulem's interface.

Even more font effects specific to CJK scripts can be found in CJKfntef.sty;
usage examples can be found in the file CJKfntef.tex .

A word of caution: Don't use \CJKfamily{...} or similar commands within the
argument to \uline and friends.


Poor man's boldface
-------------------

Most CJK fonts available in the public domain do not have bold series. To
emulate boldface by printing the character three times with slight
horizontal offsets some special features are used:

    CJK uses \CJKsymbol internally instead of \symbol to access CJK
    characters (after the correct font has been selected). This macro
    honours the \ifCJK@bold@ flag; if set it emulates boldface. The default
    value of the horizontal offset is 0.015em; to change it you should
    redefine \CJKboldshift, the macro which holds this shift.

    \ifCJK@bold@ can be set and unset globally with the commands \CJKbold
    and \CJKnormal. These commands are intended to be used with
    \DeclareFontShape as follows:

        \DeclareFontShape{C00}{CNS}{m}{n}{<-> CJK * csso12}{}
        \DeclareFontShape{C00}{CNS}{bx}{n}{<-> CJKb * csso12}{\CJKbold}

    It should be never necessary to use \CJKnormal since \selectfont has
    been modified to always reset \ifCJK@bold@ and to call the
    loading-settings (i.e., the sixth parameter) of \DeclareFontShape if
    a CJK size function is in use.

    Additionally, new size functions (CJKb, sCJKb, CJKfixedb, sCJKfixedb,
    and others; see fonts.txt for details) have been introduced which are
    completely identical to its counterparts without the final `b'. The only
    reason to use them is, as shown in the above example, to make the fifth
    parameter of \DeclareFontShape for bold series different from the one
    for medium series (LaTeX 2e uses this parameter as a macro name to
    execute loading-settings, thus they must not be equal).


Embedding non-CJK words into CJK text
-------------------------------------

To enable line breaking you should separate non-CJK words and CJK characters
with horizontal space. But the ordinary space dimensions inserted by TeX
based on the current non-CJK font often looks bad because the surrounding
CJK characters are printed almost side by side (the non-stretched value of
\CJKglue is 0pt). Especially in extreme cases which happen in underfull
\hbox commands the default space distorts the CJK text too much.

If you say \CJKtilde, the active `~' character doesn't produce an
unbreakable space; instead, the following definition is used:

    \def~{\hspace{0.25em plus 0.125em minus 0.08em}}        .

This defines a space which has a normal width of a quarter (CJK) space. See
the file japanese/shibuaki.txt for some further details.

Here an example:

        ThisIsChineseText~test~ThisIsChineseText

                         ^^^^^^

Simply use tilde characters instead of spaces at the border between CJK and
non-CJK characters.

In BibTeX entries, you have to use `{~}' instead of `~'.

The original definition of `~' is available as \nbs (non-breakable space, a
shorthand for the LaTeX command \nobreakspace). To return to the standard
`~' macro definition say \standardtilde.

Note that the opposite is not true: To embed CJK words into non-CJK text an
ordinary space is optimal.

If you use Mule or Emacs 20 please consider the use of cjktilde.el in
utils/lisp. This small package defines a minor mode (cjk-tilde-mode) which
exchanges the space key with the tilde key. It is convenient to bind this
mode to a key, e.g., C-insert.

For AUC TeX you can also use cjkspace.el which is similar (but not
identical) to cjktilde.el .


Preprocessors
-------------

Using the `XXXtext' environments like `Bg5text' is a mess. Thus three
preprocessors are provided to overcome the restrictions of the XXXtext
environments: bg5conv and sjisconv Big 5 and SJIS encoding, and extconv for
GBK and Big 5+ encoding characters. Compile them with

  cc -O -s -o bg5conv bg5conv.c
  cc -O -s -o sjisconv sjisconv.c
  cc -O -s -o extconv extconv.c

and move the binaries to a location in your path, e.g., /usr/local/bin in
a Unix system. [`cc' is the C compiler.]

See the batch files bg5latex[.bat], etc., for examples how to use them.

Each Big 5, Big 5+, or GBK character (and each two-byte encoded SJIS
character) `XY' is converted into the form `^^7fX^^7fZZZ^^7f'; ZZZ is the
decimal equivalent of Y, and ^^7f is a character with the hex value 0x7F.
The use of bg5conv/sjisconv/extconv is completely transparent; no changes to
your documents are necessary.

It is possible to mix preprocessed and non-preprocessed data; simply use
\CJKenc to change the encoding; you can use \CJKinput and \CJKinclude to
load preprocessed data (see commands.txt for a detailed description).

If you use traditional Chinese characters within Mule or Emacs 20, it is not
necessary to call bg5conv after the use of *cjk-coding* output encoding (but
it is necessary if you write out the file in Big 5 encoding).

Note 1: The OS/2 script files bg5latex.cmd, etc., need REXX which you
        probably have to install first.

Note 2: With extconv, you can also preprocess encodings like GB or SJIS.
        This has the advantage that such data is robust against any changes
        of the uc/lccodes in the range 0xA1-0xFE. Only three encodings can't
        be preprocessed: UTF8, EUC-TW, and EUC-JP.
        

Customization
-------------

In case you want to add encodings, font encodings, and related things, or if
you must change or customize some CJK settings, you should use a
configuration file called `CJK.cfg' which is loaded (if it exists) by
CJK.sty just before the final \endinput command.


Caveats
-------

    o   You can of course use CJK environments inside of a CJK environment,
        but it is possible that you must increase the so called `save size'
        of TeX (with emTeX you can adjust this with -ms=...; web2c users
        can control it with the `save_size' parameter in texmf.cnf).

        The CJK package has optional arguments which control the scope of
        CJK environments:

            lowercase       If you want to use \lowercase with encodings
                            inside CJK environments. You need less save size
                            using the `encapsulated' option if `lowercase'
                            is not set. You must use bg5conv (sjisconv) or
                            cjk-enc.el to use Big 5 (SJIS) characters with
                            this option.

                            Use this with caution! All \lccode values in the
                            range 0x80-0xFF are set to zero, thus disabling
                            TeX's hyphenation mechanism for words which
                            contain characters of this range in the *input
                            encoding* (e.g., Latin-1 encoded words with
                            accents). This is due to an unfortunate mangling
                            of the input and output encoding mechanism in
                            TeX itself.

            global          \lccode (if `lowercase' set), \uccode, \catcode
                            and the activation of the characters 0x81-0xFE
                            are globally modified (\lccode and \uccode reset
                            to 0). This is the most economical mode
                            concerning save size, but you can't have CJK
                            environments inside of CJK environments or other
                            environments which manipulate the character
                            range 0x81-0xFE.

                            All CJK font selection commands are globally too!

                            Packages which change some of the above values
                            only once (e.g., in the preamble) also don't
                            work after the first use of a CJK environment.

                            cjk-enc.el automatically selects this option.

            local           \lccode (if `lowercase' set) and \uccode
                            together with bindings are modified globally.
                            This is the default. You can stack CJK
                            environments.

            active          If activated, bindings are local additionally.
                            You need this option if you want to mix
                            preprocessed text with non-preprocessed text in
                            nested CJK environments. This can happen if you
                            merge texts in various encodings.

            encapsulated    If you want to access e.g., T1 fonts directly
                            (i.e., without the macros defined in t1enc.def)
                            or if you want to use a non-CJK LaTeX 2e input
                            encoding outside of the CJK environment (e.g.,
                            `latin1' for Western European, `latin2' for
                            Eastern European), you must use this option.
                            This also ensures that \uppercase and \lowercase
                            (together with \MakeUppercase and
                            \MakeLowercase) work correctly. All values
                            mentioned above are local, so you can stack
                            environments. This option probably causes an
                            overflow of the save size.

                            Note: All macro packages which access T1 fonts
                            with the macros defined in t1enc.def work in CJK
                            environments! E.g., the command `"s' of
                            german.sty works with \MakeUppercase too.


        Say
    
            \usepackage[<option>]{CJK}

        to activate <option>.

    o   There is another way to overcome the problem of stacked
        environments. CJK implements four CJK attribute switches: \CJKenc,
        \CJKfontenc, \CJKencfamily, and \CJKfamily; see commands.txt for a
        detailed description. If you need two different encodings/families
        at the same output line, you must use these macros.

        An example for \CJKfamily:

            \begin{CJK}{GB}{song}
            ... Text in GB song ...  \CJKenc{GBt}
            ... Text in GBt song ... \CJKfamily{kai}
            ... Text in GBt kai ...
            \end{CJK}

        An example for \CJKencfamily:

            \CJKencfamily{Bg5}{fs}%     fangsong
            \CJKencfamily{GB}{kai}

            \begin{CJK*}{}{}
            \CJKenc{Bg5} ... Text in Big 5 fangsong ...
            \CJKenc{GB}  ... Text in GB kai ...
            \end{CJK*}

        Contrary to \begin{CJK}{...}{...} it is not necessary to start a new
        line in your TeX document file after \CJKenc.

    o   A similar command to \CJKchar is \Unicode{<byte1>}{<byte2>} to
        access Unicode characters (real Unicode values, not UTF-8 encoded
        Unicode) directly; the parameters are the first (high) and second
        (low) byte of the Unicode. \Unicode works only in UTF-8 encoding; in
        all other encodings you must use \CJKchar[UTF8]{<byte1>}{<byte2>}
        instead.

        For Unicode characters greater than U+FFFF, put the first two bytes
        into the first argument, and the third byte into the second
        argument. Examples are \Unicode{"25E}{"9A} and
        \CJKchar[UTF8]{"25E}{"9A} to represent U+25E9A.

    o   CJK disables \MakeUppercase (preserving the command as
        \CJKuppercase) if you select Big 5 or SJIS encoding without using
        bg5conv or sjisconv. This usually affects the headers of the LaTeX
        2e standard classes only.

    o   Because CJK.sty and MULEenc.sty insert glue between CJK (and Thai)
        characters, it is possible to get unwanted line breaks in verbatim
        environments if lines are too long. To avoid this, use the command
        \CJKverbatim in combination with the `verbatim' package. It installs
        a hook which disables \CJKglue and \Thaiglue in verbatim
        environments.


Possible errors
---------------

    o   If you write Chinese (or Japanese) text, don't forget to suppress
        the linefeed character with a trailing `%' in the CJK environment,
        otherwise you get unwanted spaces in the output. On the other side,
        say `\ ' or something similar inside the CJK* environment to get a
        space after a CJK character.

    o   To suppress a line break before a CJK character, say \CJKkern. This
        command prevents the insertion of \CJKglue before the CJK character.

        You may wonder about the strange name: a small kern (2 sp) between
        two CJK characters signals that the first one is a punctuation
        character.

    o   If you get the error message: `\CJK... undefined' or other `...
        undefined ...' messages and you can't find an error, try inserting
        \newpage, \clearpage, or \cleardoublepage (the latter for two-column
        printing) before saying \end{CJK} or \end{CJK*}. This can happen if
        LaTeX 2e writes headers, footers, or index entries (both \index and
        \printindex) of a page containing CJK characters after closing the
        CJK environment.

        In case of footnotes with CJK characters which are split across
        pages, you have to close the CJK environment on the page on which
        the particular footnote ends (probably preceded by a \newpage
        command).

    o   A similar message to the one mentioned in the last item can be
        caused by using the \EveryShipout command from everyshi.sty; here
        the reason is exactly the opposite, namely the possible use of a
        non-CJK font within an implicit CJK environment.  For example, if
        you have

          \EveryShipout{
            \fontfamily{phv}%
            \selectfont
            ...
          }

        it can happen that LaTeX tries to use family `phv' for a `CXX'
        encoding.  The solution is to specify the encoding in \EveryShipout
        also:

          \EveryShipout{
            \fontfamily{phv}%
            \fontencoding{T1}%
            \selectfont
            ...
          }

    o   Some file editors insert a Byte Order Mark (BOM, U+FEFF) even if
        they emit UTF-8.  This sequence consists of the three bytes 0xEF
        0xBB 0xBF, always to be found at the very beginning of a file,
        and which should be ignored.

        Unfortunately, there is no way to handle them automatically in the
        CJK package so that they don't produce output or warnings (or even
        error messages) -- it would be necessary to add a hack to the LaTeX
        kernel itself.  With other words, these three bytes must be removed
        before LaTeX is called.

    o   If you get overfull \hbox'es caused by CJK characters, try to
        increase \CJKglue. It defines the glue between CJK characters; the
        default definition is

            \newcommand{\CJKglue}{\hskip 0pt plus 0.08\baselineskip}  .

        \CJKglue is inserted by CJK.sty between CJK characters (except
        punctuation characters as defined in the punctuation tables; see
        CJK.enc for the lists). You should separate non-CJK text from CJK
        characters with spaces to enable hyphenation, or you write
        \CJKtilde and then use `~' instead of spaces to embed non-CJK text
        into CJK characters.

    o   If you get overfull \hbox'es caused by Hangul syllables, try to
        increase \CJKtolerance. The default definition is

            \newcommand{\CJKtolerance}{400}  .

        Alternatively, try to increase \emergencystretch (which is a TeX
        primitive), setting it to a reasonable value.

    o   It is not possible to start a new encoding inside of a verbatim
        environment which has not been loaded before (CJK.sty emits an
        \input ... command which causes the encoding file to be printed
        verbatim instead of being executed). In this case, write a proper
        \CJKenc{...} command before opening the verbatim environment.

        Example:

            \CJKenc{JIS} % this loads standard.enc and standard.chr

            \begin{verbatim}
            ...
            first time JIS characters appear
            ...
            \end{verbatim}


        cjk-enc.el does this automatically for you.

    o   If you get an error message which looks like this:


          ! Undefined control sequence.
          try@size@range ...extract@rangefontinfo font@info
                                                            <-*>@nil <@nnil


        then you are using an unknown family for a CJK encoding.

        Reason: If you declare an NFSS font encoding in the standard way the
        corresponding FD file for the default font is loaded. For the CJK
        package this would be almost 30 files which is inacceptable. To
        avoid this overhead NFSS is faked with some rudimentary definitions
        just enough to pass the NFSS tests. Of course this has a
        disadvantage: An unknown CJK family causes the above error instead
        of switching to the fallback family usually defined with
        \DeclareFontSubstitution. Nevertheless, replacing an undefined
        series or shape works correctly.

        The CJK package's default family value is `song' for all encodings
        except KS; to avoid the error just described in cases you start an
        environment with an empty family parameter the files `XXXsong.fd'
        for all encodings `XXX' (except for KS) are already provided.

    o   It is neither possible to use a CJK character in a \cite command of
        standard LaTeX, nor is it possible to use the `alpha' citation
        style. This is a limitation of LaTeX and not of the CJK package.

    o   Sometimes it is necessary to define or redefine a command or
        environment globally in the preamble, using CJK characters. Example:

          \newtheorem{Them}{some Chinese characters}[section]

        This won't work directly because of the Chinese characters,
        producing an error. The next idea is to use a CJK environment in the
        preamble:

          \begin{CJK}{...}{...}
          \newtheorem{Them}{some Chinese characters}}[section]
          \end{CJK}

        Don't be surprised that this also fails! Most commands like
        \newtheorem expand to \def which define a macro locally only;
        consequently, the just defined command is undefined again after
        leaving the CJK environment.

        The correct solution is to use a globally defined macro:

          \begin{CJK}{...}{...}
          \gdef\ChineseTheorem{some Chinese characters}
          \end{CJK}

          \newtheorem{Them}{\ChineseTheorem}[section]

    o   The \makelabels command of letter.sty needs special treatment if you
        have an address with CJK characters because it uses the
        \AtEndDocument hook to write out its data. Since \AtEndDocument is
        called by \end{document} after all environments have been closed
        already, a CJK environment must be explicitly inserted into the AUX
        file. Example:

          \documentclass{letter}

          \usepackage{CJK}

          \makeatletter
          \AtBeginDocument{%
            \if@filesw
              \immediate\write\@mainaux{\string\begin{CJK*}{...}{...}}%
            \fi}
          \makelabels
          \AtEndDocument{%
            \if@filesw
              \immediate\write\@mainaux{\string\end{CJK*}}%
            \fi}
          \makeatother


          \begin{CJK*}{...}{...}
          \address{An address\\
                   with some CJK characters}
          \signature{...}
          \end{CJK*}


          \begin{document}

          \begin{CJK*}{...}{...}

          \begin{letter}{Another address\\
                         with some CJK characters}
          \opening{...}

          Your letter text

          \closing{...}
          \end{letter}

          \end{CJK*}

          \end{document}

    o   A similar solution is needed if you use \bibliography and your
        bibliographic database contains author names with CJK characters.

          \makeatletter
          \AtBeginDocument{%
            \if@filesw
              \immediate\write\@mainaux{\string\begin{CJK*}{...}{...}}%
              \immediate\write\@mainaux{\string\makeatletter}%
            \fi}
          \AtEndDocument{%
            \if@filesw
              \immediate\write\@mainaux{\string\end{CJK*}}%
            \fi}
          \makeatother

    o   If you get strange error messages while using the hyperref package,
        add the `CJKbookmarks' option:

          \usepackage[CJKbookmarks]{hyperref}

    o   Some versions of fourier.sty cause the following error message:

          ! Undefined control sequence.
          \<->futr8t ->\SetFourierSpace

        A simple solution is to insert the line

          \providecommand{\SetFourierSpace}{}

        right before loading fourier.sty .


Author
------

Werner Lemberg <wl@gnu.org>

Kleine Beurhausstr. 1
D-44137 Dortmund
Germany

Tel. +49 231 165290

Please report any errors or suggestions to cjk-bug@ffii.org.


---End of CJK.txt---