Sophie

Sophie

distrib > Mageia > 4 > x86_64 > by-pkgid > f9d20baf2d42bbb9f9c5746dba0abad5 > files > 229

python-translate-doc-1.10.0-3.mga4.noarch.rpm


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>Stopword file format &mdash; Translate Toolkit 1.9.0 documentation</title>
    
    <link rel="stylesheet" href="../_static/basic.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    <link rel="stylesheet" href="../_static/bootstrap.css" type="text/css" />
    <link rel="stylesheet" href="../_static/bootstrap-sphinx.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '../',
        VERSION:     '1.9.0',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="../_static/jquery.js"></script>
    <script type="text/javascript" src="../_static/underscore.js"></script>
    <script type="text/javascript" src="../_static/doctools.js"></script>
    <script type="text/javascript" src="../_static/bootstrap.js"></script>
    <script type="text/javascript" src="../_static/bootstrap-sphinx.js"></script>
    <link rel="top" title="Translate Toolkit 1.9.0 documentation" href="../index.html" />
    <link rel="up" title="Converters" href="index.html" />
    <link rel="next" title="pocount" href="pocount.html" />
    <link rel="prev" title="poterminology" href="poterminology.html" /> 
  </head>
  <body>
  <div id="navbar" class="navbar navbar-fixed-top">
    <div class="navbar-inner">
      <div class="container-fluid">
        <a class="brand" href="../index.html">Translate Toolkit</a>
        <span class="navbar-text pull-left"><b>1.9.0</b></span>
          <ul class="nav">
            <li class="divider-vertical"></li>
            
              <li class="dropdown">
  <a href="#" class="dropdown-toggle" data-toggle="dropdown">Site <b class="caret"></b></a>
  <ul class="dropdown-menu globaltoc"><ul class="simple">
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../features.html">Features</a></li>
<li class="toctree-l1"><a class="reference internal" href="../installation.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="index.html">Converters</a></li>
<li class="toctree-l1"><a class="reference internal" href="index.html#tools">Tools</a></li>
<li class="toctree-l1"><a class="reference internal" href="index.html#scripts">Scripts</a></li>
<li class="toctree-l1"><a class="reference internal" href="../guides/index.html">Use Cases</a></li>
<li class="toctree-l1"><a class="reference internal" href="../formats/index.html">Supported formats</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../styleguide.html">Translate Styleguide</a></li>
<li class="toctree-l1"><a class="reference internal" href="../styleguide.html#documentation">Documentation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../development/building.html">Building</a></li>
<li class="toctree-l1"><a class="reference internal" href="../development/contributing.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../development/developers.html">Translate Toolkit Developers Guide</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../api/index.html">API</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../changelog.html">Important Changes</a></li>
<li class="toctree-l1"><a class="reference internal" href="../history.html">History of the Translate Toolkit</a></li>
<li class="toctree-l1"><a class="reference internal" href="../license.html">License</a></li>
</ul>
</ul>
</li>
              
<li class="dropdown">
  <a href="#" class="dropdown-toggle" data-toggle="dropdown">Page <b class="caret"></b></a>
  <ul class="dropdown-menu localtoc"><ul>
<li><a class="reference internal" href="#">Stopword file format</a><ul>
<li><a class="reference internal" href="#overview">Overview</a><ul>
<li><a class="reference internal" href="#case-mapping-specifiers">Case mapping specifiers</a></li>
<li><a class="reference internal" href="#stoplist-regular-expressions">Stoplist regular expressions</a></li>
<li><a class="reference internal" href="#stoplist-words">Stoplist words</a></li>
</ul>
</li>
<li><a class="reference internal" href="#default-file-example">Default file example</a></li>
</ul>
</li>
</ul>
</ul>
</li>
            
            
              
  <li><a href="poterminology.html"
         title="previous chapter">&laquo; poterminology</a></li>
  <li><a href="pocount.html"
         title="next chapter">pocount &raquo;</a></li>
            
            
              
            
          </ul>
          
            
<form class="navbar-search pull-right" action="../search.html" method="get">
  <input type="text" name="q" placeholder="Search" />
  <input type="hidden" name="check_keywords" value="yes" />
  <input type="hidden" name="area" value="default" />
</form>
          
          </ul>
        </div>
      </div>
    </div>
  </div>

<div class="container content">
   
  <div class="section" id="stopword-file-format">
<span id="poterminology-stopword-file"></span><h1>Stopword file format<a class="headerlink" href="#stopword-file-format" title="Permalink to this headline">¶</a></h1>
<p class="versionadded">
<span class="versionmodified">New in version 1.2.</span></p>
<p>The default stopword file for <a class="reference internal" href="poterminology.html"><em>poterminology</em></a> describes the syntax of
these files and provides a good default for most applications using English
source text.  You can find the location of the default stopword file by looking
at the output of poterminology <tt class="docutils literal"><span class="pre">--help</span></tt>, or using the following command:</p>
<div class="highlight-python"><pre>poterminology --manpage | sed -n '/STOPFILE/s/.*(\(.*\)).*/\1/p'</pre>
</div>
<div class="section" id="overview">
<span id="poterminology-stopword-file-overview"></span><h2>Overview<a class="headerlink" href="#overview" title="Permalink to this headline">¶</a></h2>
<p>The basic syntax of this file is line-oriented, with the first character of
each line determining its function.  The order of the lines is generally not
significant (with one exception noted below), and the selection of function
characters was made so that an ASCII sort of the file would leave it in a
generally logical order (except for comment lines).</p>
<p>Apart from comment lines (which begin with &#8216;#&#8217;) and empty lines (which are also
ignored), there are three general types of lines, which may appear in any
order:</p>
<ul class="simple">
<li>case mapping specifiers</li>
<li>stoplist regular expressions</li>
<li>stoplist words</li>
</ul>
<div class="section" id="case-mapping-specifiers">
<span id="poterminology-stopword-file-case-mapping-specifiers"></span><h3>Case mapping specifiers<a class="headerlink" href="#case-mapping-specifiers" title="Permalink to this headline">¶</a></h3>
<p>A line beginning with a &#8216;<strong>!</strong>&#8216; specifies upper-/lower-case mapping for words
or phrases before comparison with this stoplist (no mapping is applied to the
words or regular expressions in this file, only to the source messages).  The
second character on this line must be one of the following:</p>
<ul class="simple">
<li><strong>C</strong> no uppercase / lowercase mapping is performed</li>
<li><strong>F</strong> &#8216;Title Case&#8221; words / terms are folded to lower case (default)</li>
<li><strong>I</strong> all words are mapped to lowercase</li>
</ul>
<p>These correspond to the equivalent <tt class="docutils literal"><span class="pre">--preserve-case</span></tt> /
<tt class="docutils literal"><span class="pre">--fold-titlecase</span></tt> / <tt class="docutils literal"><span class="pre">--ignore-case</span></tt> options to poterminology, but
are completely independent and only apply to stoplist matching.  You can run
poterminology with <tt class="docutils literal"><span class="pre">-I</span></tt> to map all terms to lowercase, and if the case
mapping specifier in the stopword file is &#8216;<strong>!C</strong>&#8216; a stoplist with &#8220;pootle&#8221; in
it will not prevent a term containing &#8220;Pootle&#8221; from passing the stoplist (and
then being mapped to &#8220;pootle&#8221;).</p>
<p>There should only be one case mapping specifier in a stoplist file; if more
than one are present, the last one will take precedence over the others, and
its mapping will apply to all entries.  If multiple stoplist files are used,
the last case mapping specifier processed will apply to all entries <strong>in all
files</strong>.</p>
</div>
<div class="section" id="stoplist-regular-expressions">
<span id="poterminology-stopword-file-stoplist-regular-expressions"></span><h3>Stoplist regular expressions<a class="headerlink" href="#stoplist-regular-expressions" title="Permalink to this headline">¶</a></h3>
<p>Lines beginning with a &#8216;<strong>/</strong>&#8216; are regular expression patterns &#8211; any word that
matches will be ignored by itself, and any phrase containing it will be
excluded as well.  The regular expression consists of all characters on the
line following the initial &#8216;/&#8217; &#8211; these are extended regular expressions, so
grouping, alternation, and such are available.</p>
<p>Regular expression patterns are only checked if the word itself does not appear
in the stoplist file as a word entry.  The regular expression patterns are
always applied to individual words, not phrases, and must match the entire word
(i.e. they are anchored both at the start and end).</p>
<p>Use regular expressions sparingly, as evaluating them for every word in the
source files can be expensive.  In addition to stoplist regular expressions,
poterminology has precompiled patterns for C and Python format specifiers (e.g.
%d) and XML/HTML &lt;elements&gt; and &amp;entities; &#8211; these are removed before stoplist
processing and it is not possible to override this.</p>
</div>
<div class="section" id="stoplist-words">
<span id="poterminology-stopword-file-stoplist-words"></span><h3>Stoplist words<a class="headerlink" href="#stoplist-words" title="Permalink to this headline">¶</a></h3>
<p>All other lines should begin with one of the following characters, which
indicate whether the word should be <strong>ignored</strong> (as a word alone),
<strong>disregarded</strong> in a phrase (i.e. a phrase containing it is allowed, and the
word does not count against the <tt class="docutils literal"><span class="pre">--term-words</span></tt> length limit), or any
phrase containing it should be <strong>excluded</strong>.</p>
<ul class="simple">
<li><strong>+</strong> allow word alone, allow phrases containing it</li>
<li><strong>:</strong> allow word alone, disregarded (for <tt class="docutils literal"><span class="pre">--term-word-length</span></tt>) inside
phrase</li>
<li><strong>&lt;</strong> allow word alone, but exclude any phrase containing it</li>
<li><strong>=</strong> ignore word alone, but allow phrases containing it</li>
<li><strong>&gt;</strong> ignore word alone, disregarded (for <tt class="docutils literal"><span class="pre">--term-word-length</span></tt>) inside
phrase</li>
<li><strong>&#64;</strong> ignore word alone, and exclude any phrase containing it</li>
</ul>
<p>Generally &#8216;+&#8217; is only needed for exceptions to regular expression patterns, but
it may also be used to override an entry in a previous stoplist if you are
using multiple stoplists.</p>
<p>Note that if a word appears multiple times in a stoplist file with different
function characters preceding it, the <em>last entry will take precedence</em> over
the others.  This is the only exception to the general rule that order is not
important in stopword files.</p>
</div>
</div>
<div class="section" id="default-file-example">
<span id="poterminology-stopword-file-default-file-example"></span><h2>Default file example<a class="headerlink" href="#default-file-example" title="Permalink to this headline">¶</a></h2>
<div class="highlight-python"><pre># apply title-case folding to words before comparing with this stoplist
!F</pre>
</div>
<p>The fold-titlecase setting is the default, even if it were not explicitly
specified.  This allows capitalized words at the start of a sentence (e.g.
&#8220;Who&#8221;) to match a stopword &#8220;who&#8221; but allows acronyms like WHO (World Health
Organization) to be included in the terminology.  If you are using
poterminology with source files that contain large amounts of ALL UPPERCASE
TEXT you may find the ignore-case setting to be preferable.</p>
<div class="highlight-python"><div class="highlight"><pre><span class="c"># override regex match below for phrases with &#39;no&#39;</span>
<span class="o">+</span><span class="n">no</span>
</pre></div>
</div>
<p>The regular expression /..? below would normally match the word &#8216;no&#8217; and both
ignore it as a term and exclude any phrases containing it.  The above will
allow it to appear as a term and in phrases.</p>
<div class="highlight-python"><pre># ignore all one or two-character words (unless =word appears below)
/..?
# ignore words with parenthesis, typically function() calls and the like
/.*\(.*
# ignore numbers, both cardinal (e.g. 1,234.0) and ordinal (e.g. 1st, 22nd)
/[0-9,.]+(st|nd|rd|th)?</pre>
</div>
<p>These regular expressions ignore a lot of uninteresting terms that are
typically code or other things that shouldn&#8217;t be translated anyhow.  There are
many exceptions to the one or two-character word pattern in the default
stoplist file, not only with = like &#8216;=in&#8217; but also &#8216;+no&#8217; and &#8216;:on&#8217; and &#8216;&lt;ok&#8217;
and &#8216;&gt;of&#8217;.</p>
<div class="highlight-python"><pre># allow these words by themselves and don't count against length for phrases
:off
:on</pre>
</div>
<p>These prepositions are common as button text and thus useful to have as terms;
they also form an important part of phrases so are disregarded for term word
count to allow for slightly longer phrases including them.</p>
<div class="highlight-python"><pre># allow these words by themselves, but ignore any phrases containing them
&lt;first
&lt;hello
&lt;last</pre>
</div>
<p>These are words that are worth including in a terminology, as they are common
in applications, but which aren&#8217;t generally part of idiomatic phrases.</p>
<div class="highlight-python"><pre># ignore these words by themselves, but allow phrases containing them
=able
=about
=actually
=ad
=as
=at</pre>
</div>
<p>This is the largest category of stoplist words, and these are all just rather
common words.  The purpose of a terminology list is to provide specific
translation suggestions for the harder words or phrases, not provide a general
dictionary, so these words are not of interest by themselves, but may well be
part of an interesting phrase.</p>
<div class="highlight-python"><pre># ignore these words by themselves, but allow phrases containing them,   and
# don't count against length for phrases
#
# (possible additions to this list for multi-lingual text: &gt;di &gt;el &gt;le)
#
&gt;a
&gt;an
&gt;and</pre>
</div>
<p>These very common words aren&#8217;t of interest by themselves, but often form an
important part of phrases so are disregarded for term word count to allow for
slightly longer phrases including them.</p>
<div class="highlight-python"><pre># ignore these words and any phrases containing them
@ain't
@aint
@al
@are</pre>
</div>
<p>These are &#8220;junk&#8221; words that are not only uninteresting by themselves, they
generally do not contribute anything to the phrases containing them.</p>
</div>
</div>


</div>
<hr>

<footer class="footer">
  <div class="container">
    <p class="pull-right"><a href="#">Back to top ↑</a></p>
    <ul class="unstyled muted">
      <li><small>
        &copy; 2012, Translate.org.za.<br/>
      </small></li>
      <li><small>
      Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.1.3.
      </small></li>
    </ul>
  </div>
</footer>
  </body>
</html>