Sophie

Sophie

distrib > Mageia > 4 > i586 > by-pkgid > f800694edefe91adea2624f711a41a2d > files > 11011

php-manual-en-5.5.7-1.mga4.noarch.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
 <head>
  <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  <title>Character classes</title>

 </head>
 <body><div class="manualnavbar" style="text-align: center;">
 <div class="prev" style="text-align: left; float: left;"><a href="regexp.reference.dot.html">Dot</a></div>
 <div class="next" style="text-align: right; float: right;"><a href="regexp.reference.alternation.html">Alternation</a></div>
 <div class="up"><a href="reference.pcre.pattern.syntax.html">PCRE regex syntax</a></div>
 <div class="home"><a href="index.html">PHP Manual</a></div>
</div><hr /><div id="regexp.reference.character-classes" class="section">
  <h2 class="title">Character classes</h2>
  <p class="para">
   An opening square bracket introduces a character class,
   terminated  by  a  closing  square  bracket.  A  closing square
   bracket on its own is  not  special.  If  a  closing  square
   bracket  is  required as a member of the class, it should be
   the first data character in the class (after an initial
   circumflex, if present) or escaped with a backslash.
  </p>
  <p class="para">
   A character class matches a single character in the subject;
   the  character  must  be in the set of characters defined by
   the class, unless the first character in the class is a
   circumflex,  in which case the subject character must not be in
   the set defined by the class. If a  circumflex  is  actually
   required  as  a  member  of  the class, ensure it is not the
   first character, or escape it with a backslash.
  </p>
  <p class="para">
   For example, the character class [aeiou] matches  any  lower
   case vowel, while [^aeiou] matches any character that is not
   a lower case vowel. Note that a circumflex is  just  a
   convenient  notation for specifying the characters which are in
   the class by enumerating those that are not. It  is  not  an
   assertion:  it  still  consumes a character from the subject
   string, and fails if the current pointer is at  the  end  of
   the string.
  </p>
  <p class="para">
   When case-insensitive (caseless) matching is set, any letters
   in a class represent both their upper case and lower case
   versions, so for example, an insensitive [aeiou] matches &quot;A&quot;
   as well as &quot;a&quot;, and an insensitive [^aeiou] does not match
   &quot;A&quot;, whereas a sensitive (caseful) version would.
  </p>
  <p class="para">
   The newline character is never treated in any special way in
   character  classes,  whatever the setting of the <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a> 
   or <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_MULTILINE</a>
   options is. A class such as [^a] will always match a newline.
  </p>
  <p class="para">
   The minus (hyphen) character can be used to specify a  range
   of  characters  in  a  character  class.  For example, [d-m]
   matches any letter between d and m, inclusive.  If  a  minus
   character  is required in a class, it must be escaped with a
   backslash or appear in a position where it cannot be
   interpreted as indicating a range, typically as the first or last
   character in the class.
  </p>
  <p class="para">
   It is not possible to have the literal character &quot;]&quot; as  the
   end  character  of  a  range.  A  pattern such as [W-]46] is
   interpreted as a class of two characters (&quot;W&quot; and &quot;-&quot;)
   followed by a literal string &quot;46]&quot;, so it would match &quot;W46]&quot; or
   &quot;-46]&quot;. However, if the &quot;]&quot; is escaped with a  backslash  it
   is  interpreted  as  the end of range, so [W-\]46] is
   interpreted as a single class containing a range followed by  two
   separate characters. The octal or hexadecimal representation
   of &quot;]&quot; can also be used to end a range.
  </p>
  <p class="para">
   Ranges operate in ASCII collating sequence. They can also be
   used  for  characters  specified  numerically,  for  example
   [\000-\037]. If a range that includes letters is  used  when
   case-insensitive (caseless)  matching  is set, it matches the
   letters in either case. For example, [W-c] is equivalent  to
   [][\^_`wxyzabc], matched case-insensitively, and if character
   tables for the &quot;fr&quot; locale are in use, [\xc8-\xcb] matches
   accented E characters in both cases.
  </p>
  <p class="para">
   The character types \d, \D, \s, \S,  \w,  and  \W  may  also
   appear  in  a  character  class, and add the characters that
   they match to the class. For example, [\dABCDEF] matches any
   hexadecimal  digit.  A  circumflex  can conveniently be used
   with the upper case character types to specify a  more
   restricted set of characters than the matching lower case type.
   For example, the class [^\W_] matches any letter  or  digit,
   but not underscore.
  </p>
  <p class="para">
   All non-alphanumeric characters other than \,  -,  ^  (at  the
   start)  and  the  terminating ] are non-special in character
   classes, but it does no harm if they are escaped. The pattern
   terminator is always special and must be escaped when used
   within an expression.
  </p>
  <p class="para">
   Perl supports the POSIX notation for character classes. This uses names
   enclosed by <em>[:</em> and <em>:]</em> within the enclosing square brackets. PCRE also
   supports this notation. For example, <em>[01[:alpha:]%]</em>
   matches &quot;0&quot;, &quot;1&quot;, any alphabetic character, or &quot;%&quot;. The supported class
   names are:
   <table class="doctable table">
    <caption><strong>Character classes</strong></caption>
    
     <tbody class="tbody">
      <tr><td><em>alnum</em></td><td>letters and digits</td></tr>

      <tr><td><em>alpha</em></td><td>letters</td></tr>

      <tr><td><em>ascii</em></td><td>character codes 0 - 127</td></tr>

      <tr><td><em>blank</em></td><td>space or tab only</td></tr>

      <tr><td><em>cntrl</em></td><td>control characters</td></tr>

      <tr><td><em>digit</em></td><td>decimal digits (same as \d)</td></tr>

      <tr><td><em>graph</em></td><td>printing characters, excluding space</td></tr>

      <tr><td><em>lower</em></td><td>lower case letters</td></tr>

      <tr><td><em>print</em></td><td>printing characters, including space</td></tr>

      <tr><td><em>punct</em></td><td>printing characters, excluding letters and digits</td></tr>

      <tr><td><em>space</em></td><td>white space (not quite the same as \s)</td></tr>

      <tr><td><em>upper</em></td><td>upper case letters</td></tr>

      <tr><td><em>word</em></td><td>&quot;word&quot; characters (same as \w)</td></tr>

      <tr><td><em>xdigit</em></td><td>hexadecimal digits</td></tr>

     </tbody>
    
   </table>

   The <em>space</em> characters are HT (9), LF (10), VT (11), FF (12), CR (13),
   and space (32). Notice that this list includes the VT character (code
   11). This makes &quot;space&quot; different to <em>\s</em>, which does not include VT (for
   Perl compatibility).
  </p>
  <p class="para">
   The name <em>word</em> is a Perl extension, and <em>blank</em> is a GNU extension
   from Perl 5.8. Another Perl extension is negation, which is indicated
   by a <em>^</em> character after the colon. For example,
   <em>[12[:^digit:]]</em> matches &quot;1&quot;, &quot;2&quot;, or any non-digit.
  </p>
  <p class="para">
   In UTF-8 mode, characters with values greater than 128 do not match any
   of the POSIX character classes.
  </p>
 </div><hr /><div class="manualnavbar" style="text-align: center;">
 <div class="prev" style="text-align: left; float: left;"><a href="regexp.reference.dot.html">Dot</a></div>
 <div class="next" style="text-align: right; float: right;"><a href="regexp.reference.alternation.html">Alternation</a></div>
 <div class="up"><a href="reference.pcre.pattern.syntax.html">PCRE regex syntax</a></div>
 <div class="home"><a href="index.html">PHP Manual</a></div>
</div></body></html>