  <div id="content">
    <h1 class="heading">Pygments</h1>
    <h2 class="subheading">Builtin Tokens</h2>
      <div class="toc">
        <ul class="contents">
          <li><a href="#keyword-tokens">Keyword Tokens</a></li>
          <li><a href="#name-tokens">Name Tokens</a></li>
          <li><a href="#literals">Literals</a></li>
          <li><a href="#operators">Operators</a></li>
          <li><a href="#punctuation">Punctuation</a></li>
          <li><a href="#comments">Comments</a></li>
          <li><a href="#generic-tokens">Generic Tokens</a></li>
    <!-- -*- mode: rst -*- -->
<p>Inside the <cite>pygments.token</cite> module, there is a special object called <cite>Token</cite>
that is used to create token types.</p>
<p>You can create a new token type by accessing an attribute of <cite>Token</cite>:</p>
<div class="syntax"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pygments.token</span> <span class="kn">import</span> <span class="n">Token</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">Token</span><span class="o">.</span><span class="n">String</span>
<span class="go">Token.String</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">Token</span><span class="o">.</span><span class="n">String</span> <span class="ow">is</span> <span class="n">Token</span><span class="o">.</span><span class="n">String</span>
<span class="go">True</span>
<p>Note that tokens are singletons so you can use the <tt class="docutils literal">is</tt> operator for comparing
token types.</p>
<p>As of Pygments 0.7 you can also use the <tt class="docutils literal">in</tt> operator to perform set tests:</p>
<div class="syntax"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pygments.token</span> <span class="kn">import</span> <span class="n">Comment</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">Comment</span><span class="o">.</span><span class="n">Single</span> <span class="ow">in</span> <span class="n">Comment</span>
<span class="go">True</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">Comment</span> <span class="ow">in</span> <span class="n">Comment</span><span class="o">.</span><span class="n">Multi</span>
<span class="go">False</span>
<p>This can be useful in <a class="reference external" href="./filters.html">filters</a> and if you write lexers on your own without
using the base lexers.</p>
<p>You can also split a token type into a hierarchy, and get the parent of it:</p>
<div class="syntax"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">String</span><span class="o">.</span><span class="n">split</span><span class="p">()</span>
<span class="go">[Token, Token.Literal, Token.Literal.String]</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">String</span><span class="o">.</span><span class="n">parent</span>
<span class="go">Token.Literal</span>
<p>In principle, you can create an unlimited number of token types but nobody can
guarantee that a style would define style rules for a token type. Because of
that, Pygments proposes some global token types defined in the
<cite>pygments.token.STANDARD_TYPES</cite> dict.</p>
<p>For some tokens aliases are already defined:</p>
<div class="syntax"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pygments.token</span> <span class="kn">import</span> <span class="n">String</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">String</span>
<span class="go">Token.Literal.String</span>
<p>Inside the <cite>pygments.token</cite> module the following aliases are defined:</p>
<table border="1" class="docutils">
<col width="17%" />
<col width="36%" />
<col width="47%" />
<tbody valign="top">
<td>for any type of text data</td>
<td>for specially highlighted whitespace</td>
<td>represents lexer errors</td>
<td>special token for data not
matched by a parser (e.g. HTML
markup in PHP code)</td>
<td>any kind of keywords</td>
<td>variable/function names</td>
<td>Any literals</td>
<td>string literals</td>
<td>number literals</td>
<td>operators (<tt class="docutils literal">+</tt>, <tt class="docutils literal">not</tt>...)</td>
<td>punctuation (<tt class="docutils literal">[</tt>, <tt class="docutils literal">(</tt>...)</td>
<td>any kind of comments</td>
<td>generic tokens (have a look at
the explanation below)</td>
<p>The <cite>Whitespace</cite> token type is new in Pygments 0.8. It is used only by the
<cite>VisibleWhitespaceFilter</cite> currently.</p>
<p>Normally you just create token types using the already defined aliases. For each
of those token aliases, a number of subtypes exists (excluding the special tokens
<cite>Token.Text</cite>, <cite>Token.Error</cite> and <cite>Token.Other</cite>)</p>
<p>The <cite>is_token_subtype()</cite> function in the <cite>pygments.token</cite> module can be used to
test if a token type is a subtype of another (such as <cite>Name.Tag</cite> and <cite>Name</cite>).
(This is the same as <tt class="docutils literal">Name.Tag in Name</tt>. The overloaded <cite>in</cite> operator was newly
introduced in Pygments 0.7, the function still exists for backwards
<p>With Pygments 0.7, it's also possible to convert strings to token types (for example
if you want to supply a token from the command line):</p>
<div class="syntax"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pygments.token</span> <span class="kn">import</span> <span class="n">String</span><span class="p">,</span> <span class="n">string_to_tokentype</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">string_to_tokentype</span><span class="p">(</span><span class="s">&quot;String&quot;</span><span class="p">)</span>
<span class="go">Token.Literal.String</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">string_to_tokentype</span><span class="p">(</span><span class="s">&quot;Token.Literal.String&quot;</span><span class="p">)</span>
<span class="go">Token.Literal.String</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">string_to_tokentype</span><span class="p">(</span><span class="n">String</span><span class="p">)</span>
<span class="go">Token.Literal.String</span>
<div class="section" id="keyword-tokens">
<h3>Keyword Tokens</h3>
<dl class="docutils">
<dd>For any kind of keyword (especially if it doesn't match any of the
subtypes of course).</dd>
<dd>For keywords that are constants (e.g. <tt class="docutils literal">None</tt> in future Python versions).</dd>
<dd>For keywords used for variable declaration (e.g. <tt class="docutils literal">var</tt> in some programming
languages like JavaScript).</dd>
<dd>For keywords used for namespace declarations (e.g. <tt class="docutils literal">import</tt> in Python and
Java and <tt class="docutils literal">package</tt> in Java).</dd>
<dd>For keywords that aren't really keywords (e.g. <tt class="docutils literal">None</tt> in old Python
<dd>For reserved keywords.</dd>
<dd>For builtin types that can't be used as identifiers (e.g. <tt class="docutils literal">int</tt>,
<tt class="docutils literal">char</tt> etc. in C).</dd>
<div class="section" id="name-tokens">
<h3>Name Tokens</h3>
<dl class="docutils">
<dd>For any name (variable names, function names, classes).</dd>
<dd>For all attributes (e.g. in HTML tags).</dd>
<dd>Builtin names; names that are available in the global namespace.</dd>
<dd>Builtin names that are implicit (e.g. <tt class="docutils literal">self</tt> in Ruby, <tt class="docutils literal">this</tt> in Java).</dd>
<dd>Class names. Because no lexer can know if a name is a class or a function
or something else this token is meant for class declarations.</dd>
<dd>Token type for constants. In some languages you can recognise a token by the
way it's defined (the value after a <tt class="docutils literal">const</tt> keyword for example). In
other languages constants are uppercase by definition (Ruby).</dd>
<dd>Token type for decorators. Decorators are synatic elements in the Python
language. Similar syntax elements exist in C# and Java.</dd>
<dd>Token type for special entities. (e.g. <tt class="docutils literal">&amp;nbsp;</tt> in HTML).</dd>
<dd>Token type for exception names (e.g. <tt class="docutils literal">RuntimeError</tt> in Python). Some languages
define exceptions in the function signature (Java). You can highlight
the name of that exception using this token then.</dd>
<dd>Token type for function names.</dd>
<dd>Token type for label names (e.g. in languages that support <tt class="docutils literal">goto</tt>).</dd>
<dd>Token type for namespaces. (e.g. import paths in Java/Python), names following
the <tt class="docutils literal">module</tt>/<tt class="docutils literal">namespace</tt> keyword in other languages.</dd>
<dd>Other names. Normally unused.</dd>
<dd>Tag names (in HTML/XML markup or configuration files).</dd>
<dd>Token type for variables. Some languages have prefixes for variable names
(PHP, Ruby, Perl). You can highlight them using this token.</dd>
<dd>same as <cite>Name.Variable</cite> but for class variables (also static variables).</dd>
<dd>same as <cite>Name.Variable</cite> but for global variables (used in Ruby, for
<dd>same as <cite>Name.Variable</cite> but for instance variables.</dd>
<div class="section" id="literals">
<dl class="docutils">
<dd>For any literal (if not further defined).</dd>
<dd>for date literals (e.g. <tt class="docutils literal">42d</tt> in Boo).</dd>
<dd>For any string literal.</dd>
<dd>Token type for strings enclosed in backticks.</dd>
<dd>Token type for single characters (e.g. Java, C).</dd>
<dd>Token type for documentation strings (for example Python).</dd>
<dd>Double quoted strings.</dd>
<dd>Token type for escape sequences in strings.</dd>
<dd>Token type for &quot;heredoc&quot; strings (e.g. in Ruby or Perl).</dd>
<dd>Token type for interpolated parts in strings (e.g. <tt class="docutils literal">#{foo}</tt> in Ruby).</dd>
<dd>Token type for any other strings (for example <tt class="docutils literal">%q{foo}</tt> string constructs
in Ruby).</dd>
<dd>Token type for regular expression literals (e.g. <tt class="docutils literal">/foo/</tt> in JavaScript).</dd>
<dd>Token type for single quoted strings.</dd>
<dd>Token type for symbols (e.g. <tt class="docutils literal">:foo</tt> in LISP or Ruby).</dd>
<dd>Token type for any number literal.</dd>
<dd>Token type for float literals (e.g. <tt class="docutils literal">42.0</tt>).</dd>
<dd>Token type for hexadecimal number literals (e.g. <tt class="docutils literal">0xdeadbeef</tt>).</dd>
<dd>Token type for integer literals (e.g. <tt class="docutils literal">42</tt>).</dd>
<dd>Token type for long integer literals (e.g. <tt class="docutils literal">42L</tt> in Python).</dd>
<dd>Token type for octal literals.</dd>
<div class="section" id="operators">
<dl class="docutils">
<dd>For any punctuation operator (e.g. <tt class="docutils literal">+</tt>, <tt class="docutils literal">-</tt>).</dd>
<dd>For any operator that is a word (e.g. <tt class="docutils literal">not</tt>).</dd>
<div class="section" id="punctuation">
<p><em>New in Pygments 0.7.</em></p>
<dl class="docutils">
<dd>For any punctuation which is not an operator (e.g. <tt class="docutils literal">[</tt>, <tt class="docutils literal">(</tt>...)</dd>
<div class="section" id="comments">
<dl class="docutils">
<dd>Token type for any comment.</dd>
<dd>Token type for multiline comments.</dd>
<dd>Token type for preprocessor comments (also <tt class="docutils literal"><span class="pre">&lt;?php</span></tt>/<tt class="docutils literal">&lt;%</tt> constructs).</dd>
<dd>Token type for comments that end at the end of a line (e.g. <tt class="docutils literal"># foo</tt>).</dd>
<dd>Special data in comments. For example code tags, author and license
information, etc.</dd>
<div class="section" id="generic-tokens">
<h3>Generic Tokens</h3>
<p>Generic tokens are for special lexers like the <cite>DiffLexer</cite> that doesn't really
highlight a programming language but a patch file.</p>
<dl class="docutils">
<dd>A generic, unstyled token. Normally you don't use this token type.</dd>
<dd>Marks the token value as deleted.</dd>
<dd>Marks the token value as emphasized.</dd>
<dd>Marks the token value as an error message.</dd>
<dd>Marks the token value as headline.</dd>
<dd>Marks the token value as inserted.</dd>
<dd>Marks the token value as program output (e.g. for python cli lexer).</dd>
<dd>Marks the token value as command prompt (e.g. bash lexer).</dd>
<dd>Marks the token value as bold (e.g. for rst lexer).</dd>
<dd>Marks the token value as subheadline.</dd>
<dd>Marks the token value as a part of an error traceback.</dd>

