Sophie

Sophie

distrib > Mageia > 4 > x86_64 > by-pkgid > f800694edefe91adea2624f711a41a2d > files > 11020

php-manual-en-5.5.7-1.mga4.noarch.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
 <head>
  <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  <title>Performance</title>

 </head>
 <body><div class="manualnavbar" style="text-align: center;">
 <div class="prev" style="text-align: left; float: left;"><a href="regexp.reference.recursive.html">Recursive patterns</a></div>
 <div class="next" style="text-align: right; float: right;"><a href="reference.pcre.pattern.modifiers.html">Possible modifiers in regex patterns</a></div>
 <div class="up"><a href="reference.pcre.pattern.syntax.html">PCRE regex syntax</a></div>
 <div class="home"><a href="index.html">PHP Manual</a></div>
</div><hr /><div id="regexp.reference.performance" class="section">
  <h2 class="title">Performance</h2>
  <p class="para">
   Certain items that may appear in patterns are more efficient
   than  others.  It is more efficient to use a character class
   like [aeiou] than a set of alternatives such as (a|e|i|o|u).
   In  general,  the  simplest  construction  that provides the
   required behaviour is usually the  most  efficient.  Jeffrey
   Friedl&#039;s  book contains a lot of discussion about optimizing
   regular expressions for efficient performance.
  </p>
  <p class="para">
   When a pattern begins with .* and the <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a>  option  is
   set,  the  pattern  is implicitly anchored by PCRE, since it
   can match only at the start of a subject string. However, if
   <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a>   
   is not set, PCRE cannot make this optimization,
   because the . metacharacter does not then match  a  newline,
   and if the subject string contains newlines, the pattern may
   match from the character immediately following one  of  them
   instead of from the very start. For example, the pattern
   
   <em>(.*) second</em>
   
   matches the subject &quot;first\nand second&quot; (where \n stands for
   a newline character) with the first captured substring being
   &quot;and&quot;. In order to do this, PCRE  has  to  retry  the  match
   starting after every newline in the subject.
  </p>
  <p class="para">
   If you are using such a pattern with subject strings that do
   not  contain  newlines,  the best performance is obtained by
   setting <a href="reference.pcre.pattern.modifiers.html" class="link">PCRE_DOTALL</a>,
   or starting the  pattern  with  ^.*  to
   indicate  explicit anchoring. That saves PCRE from having to
   scan along the subject looking for a newline to restart at.
  </p>
  <p class="para">
   Beware of patterns that contain nested  indefinite  repeats.
   These  can  take a long time to run when applied to a string
   that does not match. Consider the pattern fragment
   
   <em>(a+)*</em>
  </p>
  <p class="para">
   This can match &quot;aaaa&quot; in 33 different ways, and this  number
   increases  very  rapidly  as  the string gets longer. (The *
   repeat can match 0, 1, 2, 3, or 4 times,  and  for  each  of
   those  cases other than 0, the + repeats can match different
   numbers of times.) When the remainder of the pattern is such
   that  the entire match is going to fail, PCRE has in principle
   to try every possible variation, and this  can  take  an
   extremely long time.
  </p>
  <p class="para">
   An optimization catches some of the more simple  cases  such
   as
   
   <em>(a+)*b</em>
   
   where a literal character follows. Before embarking  on  the
   standard matching procedure, PCRE checks that there is a &quot;b&quot;
   later in the subject string, and if there is not,  it  fails
   the  match  immediately. However, when there is no following
   literal this optimization cannot be used. You  can  see  the
   difference by comparing the behaviour of
   
   <em>(a+)*\d</em>
   
   with the pattern above. The former gives  a  failure  almost
   instantly  when  applied  to a whole line of &quot;a&quot; characters,
   whereas the latter takes an appreciable  time  with  strings
   longer than about 20 characters.
  </p>
 </div><hr /><div class="manualnavbar" style="text-align: center;">
 <div class="prev" style="text-align: left; float: left;"><a href="regexp.reference.recursive.html">Recursive patterns</a></div>
 <div class="next" style="text-align: right; float: right;"><a href="reference.pcre.pattern.modifiers.html">Possible modifiers in regex patterns</a></div>
 <div class="up"><a href="reference.pcre.pattern.syntax.html">PCRE regex syntax</a></div>
 <div class="home"><a href="index.html">PHP Manual</a></div>
</div></body></html>