<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html> <head> <title>Regular Expression Performance Comparison (gcc 3.2)</title> <meta name="generator" content="HTML Tidy, see www.w3.org"> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5"> <META content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot" name="Template"> <meta name="GENERATOR" content="Microsoft FrontPage Express 2.0"> </head> <body bgcolor="#ffffff" link="#0000ff" vlink="#800080"> <h2>Regular Expression Performance Comparison</h2> <p>The following tables provide comparisons between the following regular expression libraries:</p> <p><a href="http://www.boost.org/">The Boost regex library</a>.</p> <p><a href="http://www.gnu.org">The GNU regular expression library</a>.</p> <p>Philip Hazel's <a href="http://www.pcre.org">PCRE</a> library.</p> <h3>Details</h3> <p>Machine: Intel Pentium 4 2.8GHz PC.</p> <p>Compiler: GNU C++ version 3.2 20020927 (prerelease).</p> <p>C++ Standard Library: GNU libstdc++ version 20020927.</p> <p>OS: Cygwin.</p> <p>Boost version: 1.31.0.</p> <p>PCRE version: 4.1.</p> <p>As ever care should be taken in interpreting the results, only sensible regular expressions (rather than pathological cases) are given, most are taken from the Boost regex examples, or from the <a href="http://www.regxlib.com/">Library of Regular Expressions</a>. In addition, some variation in the relative performance of these libraries can be expected on other machines - as memory access and processor caching effects can be quite large for most finite state machine algorithms. In each case the first figure given is the relative time taken (so a value of 1.0 is as good as it gets), while the second figure is the actual time taken.</p> <h3>Averages</h3> <p>The following are the average relative scores for all the tests: the perfect regular expression library would score 1, in practice anything less than 2 is pretty good.</p> <table border="1" cellspacing="1"> <tr> <td><strong>Boost</strong></td> <td><strong>Boost + C++ locale</strong></td> <td><strong>POSIX</strong></td> <td><strong>PCRE</strong></td> </tr> <tr> <td>1.4503</td> <td>1.49124</td> <td>108.372</td> <td>1.56255</td> </tr> </table> <br> <br> <h3>Comparison 1: Long Search</h3> <p>For each of the following regular expressions the time taken to find all occurrences of the expression within a long English language text was measured (<a href="ftp://ibiblio.org/pub/docs/books/gutenberg/etext02/mtent12.zip">mtent12.txt</a> from <a href="http://promo.net/pg/">Project Gutenberg</a>, 19Mb). </p> <table border="1" cellspacing="1"> <tr> <td><strong>Expression</strong></td> <td><strong>Boost</strong></td> <td><strong>Boost + C++ locale</strong></td> <td><strong>POSIX</strong></td> <td><strong>PCRE</strong></td> </tr> <tr> <td><code>Twain</code></td> <td>3.49<br> (0.205s)</td> <td>4.09<br> (0.24s)</td> <td>65.2<br> (3.83s)</td> <td><font color="#008000">1<br> (0.0588s)</font></td> </tr> <tr> <td><code>Huck[[:alpha:]]+</code></td> <td>3.86<br> (0.203s)</td> <td>4.52<br> (0.238s)</td> <td>100<br> (5.26s)</td> <td><font color="#008000">1<br> (0.0526s)</font></td> </tr> <tr> <td><code>[[:alpha:]]+ing</code></td> <td><font color="#008000">1.01<br> (1.23s)</font></td> <td><font color="#008000">1<br> (1.22s)</font></td> <td>4.95<br> (6.04s)</td> <td>4.67<br> (5.71s)</td> </tr> <tr> <td><code>^[^ ]*?Twain</code></td> <td><font color="#008000">1<br> (0.31s)</font></td> <td><font color="#008000">1.05<br> (0.326s)</font></td> <td>NA</td> <td>3.32<br> (1.03s)</td> </tr> <tr> <td><code>Tom|Sawyer|Huckleberry|Finn</code></td> <td><font color="#008000">1.02<br> (0.125s)</font></td> <td><font color="#008000">1<br> (0.123s)</font></td> <td>165<br> (20.3s)</td> <td><font color="#008000">1.08<br> (0.133s)</font></td> </tr> <tr> <td><code> (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)</code></td> <td><font color="#008000">1<br> (0.345s)</font></td> <td><font color="#008000">1.03<br> (0.355s)</font></td> <td>NA</td> <td>1.71<br> (0.59s)</td> </tr> </table> <br> <br> <h3>Comparison 2: Medium Sized Search</h3> <p>For each of the following regular expressions the time taken to find all occurrences of the expression within a medium sized English language text was measured (the first 50K from mtent12.txt). </p> <table border="1" cellspacing="1"> <tr> <td><strong>Expression</strong></td> <td><strong>Boost</strong></td> <td><strong>Boost + C++ locale</strong></td> <td><strong>POSIX</strong></td> <td><strong>PCRE</strong></td> </tr> <tr> <td><code>Twain</code></td> <td>1.8<br> (0.000519s)</td> <td>2.14<br> (0.000616s)</td> <td>9.08<br> (0.00262s)</td> <td><font color="#008000">1<br> (0.000289s)</font></td> </tr> <tr> <td><code>Huck[[:alpha:]]+</code></td> <td>3.65<br> (0.000499s)</td> <td>4.36<br> (0.000597s)</td> <td><font color="#008000">1<br> (0.000137s)</font></td> <td>1.43<br> (0.000196s)</td> </tr> <tr> <td><code>[[:alpha:]]+ing</code></td> <td><font color="#008000">1<br> (0.00258s)</font></td> <td><font color="#008000">1<br> (0.00258s)</font></td> <td>5.28<br> (0.0136s)</td> <td>5.63<br> (0.0145s)</td> </tr> <tr> <td><code>^[^ ]*?Twain</code></td> <td><font color="#008000">1<br> (0.000929s)</font></td> <td><font color="#008000">1.03<br> (0.000957s)</font></td> <td>NA</td> <td>2.82<br> (0.00262s)</td> </tr> <tr> <td><code>Tom|Sawyer|Huckleberry|Finn</code></td> <td><font color="#008000">1<br> (0.000812s)</font></td> <td><font color="#008000">1<br> (0.000812s)</font></td> <td>60.1<br> (0.0488s)</td> <td>1.28<br> (0.00104s)</td> </tr> <tr> <td><code> (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)</code></td> <td><font color="#008000">1.02<br> (0.00178s)</font></td> <td><font color="#008000">1<br> (0.00174s)</font></td> <td>242<br> (0.421s)</td> <td>1.3<br> (0.00227s)</td> </tr> </table> <br> <br> <h3>Comparison 3: C++ Code Search</h3> <p>For each of the following regular expressions the time taken to find all occurrences of the expression within the C++ source file <a href="../../../boost/crc.hpp"> boost/crc.hpp</a> was measured. </p> <table border="1" cellspacing="1"> <tr> <td><strong>Expression</strong></td> <td><strong>Boost</strong></td> <td><strong>Boost + C++ locale</strong></td> <td><strong>POSIX</strong></td> <td><strong>PCRE</strong></td> </tr> <tr> <td><code> ^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\<\w+\>([ ]*\([^)]*\))?[[:space:]]*)*(\<\w*\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\{|:[^;\{()]*\{)</code></td> <td><font color="#008000">1.04<br> (0.000144s)</font></td> <td><font color="#008000">1<br> (0.000139s)</font></td> <td>862<br> (0.12s)</td> <td>4.56<br> (0.000636s)</td> </tr> <tr> <td><code>(^[ ]*#(?:[^\\\n]|\\[^\n_[:punct:][:alnum:]]*[\n[:punct:][:word:]])*)|(//[^\n]*|/\*.*?\*/)|\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\>|('(?:[^\\']|\\.)*'|"(?:[^\\"]|\\.)*")|\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned|using|virtual|void|volatile|wchar_t|while)\></code></td> <td><font color="#008000">1<br> (0.0139s)</font></td> <td><font color="#008000">1.01<br> (0.0141s)</font></td> <td>NA</td> <td>1.55<br> (0.0216s)</td> </tr> <tr> <td><code>^[ ]*#[ ]*include[ ]+("[^"]+"|<[^>]+>)</code></td> <td><font color="#008000">1.04<br> (0.000332s)</font></td> <td><font color="#008000">1<br> (0.000318s)</font></td> <td>130<br> (0.0413s)</td> <td>1.72<br> (0.000547s)</td> </tr> <tr> <td><code>^[ ]*#[ ]*include[ ]+("boost/[^"]+"|<boost/[^>]+>)</code></td> <td><font color="#008000">1.02<br> (0.000323s)</font></td> <td><font color="#008000">1<br> (0.000318s)</font></td> <td>150<br> (0.0476s)</td> <td>1.72<br> (0.000547s)</td> </tr> </table> <br> <h3></h3> <H3>Comparison 4: HTML Document Search </H3> <p>For each of the following regular expressions the time taken to find all occurrences of the expression within the html file <a href="../../libraries.htm">libs/libraries.htm</a> was measured. </p> <table border="1" cellspacing="1"> <tr> <td><strong>Expression</strong></td> <td><strong>Boost</strong></td> <td><strong>Boost + C++ locale</strong></td> <td><strong>POSIX</strong></td> <td><strong>PCRE</strong></td> </tr> <tr> <td><code>beman|john|dave</code></td> <td><font color="#008000">1.03<br> (0.000367s)</font></td> <td><font color="#008000">1<br> (0.000357s)</font></td> <td>47.4<br> (0.0169s)</td> <td>1.16<br> (0.000416s)</td> </tr> <tr> <td><code><p>.*?</p></code></td> <td>1.25<br> (0.000459s)</td> <td><font color="#008000">1<br> (0.000367s)</font></td> <td>NA</td> <td><font color="#008000">1.03<br> (0.000376s)</font></td> </tr> <tr> <td><code> <a[^>]+href=("[^"]*"|[^[:space:]]+)[^>]*></code></td> <td><font color="#008000">1<br> (0.000509s)</font></td> <td><font color="#008000">1.02<br> (0.000518s)</font></td> <td>305<br> (0.155s)</td> <td><font color="#008000">1.1<br> (0.000558s)</font></td> </tr> <tr> <td><code> <h[12345678][^>]*>.*?</h[12345678]></code></td> <td><font color="#008000">1.04<br> (0.00025s)</font></td> <td><font color="#008000">1<br> (0.00024s)</font></td> <td>NA</td> <td>1.16<br> (0.000279s)</td> </tr> <tr> <td><code> <img[^>]+src=("[^"]*"|[^[:space:]]+)[^>]*></code></td> <td>2.22<br> (0.000489s)</td> <td>1.69<br> (0.000372s)</td> <td>148<br> (0.0326s)</td> <td><font color="#008000">1<br> (0.00022s)</font></td> </tr> <tr> <td><code> <font[^>]+face=("[^"]*"|[^[:space:]]+)[^>]*>.*?</font></code></td> <td>1.71<br> (0.000371s)</td> <td>1.75<br> (0.000381s)</td> <td>NA</td> <td><font color="#008000">1<br> (0.000218s)</font></td> </tr> </table> <br> <br> <h3>Comparison 3: Simple Matches</h3> <p>For each of the following regular expressions the time taken to match against the text indicated was measured. </p> <table border="1" cellspacing="1"> <tr> <td><strong>Expression</strong></td> <td><strong>Text</strong></td> <td><strong>Boost</strong></td> <td><strong>Boost + C++ locale</strong></td> <td><strong>POSIX</strong></td> <td><strong>PCRE</strong></td> </tr> <tr> <td><code>abc</code></td> <td>abc</td> <td>1.36<br> (2.15e-07s)</td> <td>1.36<br> (2.15e-07s)</td> <td>2.76<br> (4.34e-07s)</td> <td><font color="#008000">1<br> (1.58e-07s)</font></td> </tr> <tr> <td><code>^([0-9]+)(\-| |$)(.*)$</code></td> <td>100- this is a line of ftp response which contains a message string</td> <td>1.55<br> (7.26e-07s)</td> <td>1.51<br> (7.07e-07s)</td> <td>319<br> (0.000149s)</td> <td><font color="#008000">1<br> (4.67e-07s)</font></td> </tr> <tr> <td><code>([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4}</code></td> <td>1234-5678-1234-456</td> <td>1.96<br> (9.54e-07s)</td> <td>1.96<br> (9.54e-07s)</td> <td>44.5<br> (2.17e-05s)</td> <td><font color="#008000">1<br> (4.87e-07s)</font></td> </tr> <tr> <td><code> ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td> <td>john@johnmaddock.co.uk</td> <td>1.22<br> (1.51e-06s)</td> <td>1.23<br> (1.53e-06s)</td> <td>162<br> (0.000201s)</td> <td><font color="#008000">1<br> (1.24e-06s)</font></td> </tr> <tr> <td><code> ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td> <td>foo12@foo.edu</td> <td>1.28<br> (1.47e-06s)</td> <td>1.3<br> (1.49e-06s)</td> <td>104<br> (0.00012s)</td> <td><font color="#008000">1<br> (1.15e-06s)</font></td> </tr> <tr> <td><code> ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td> <td>bob.smith@foo.tv</td> <td>1.28<br> (1.47e-06s)</td> <td>1.3<br> (1.49e-06s)</td> <td>113<br> (0.00013s)</td> <td><font color="#008000">1<br> (1.15e-06s)</font></td> </tr> <tr> <td><code>^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td> <td>EH10 2QQ</td> <td>1.38<br> (4.68e-07s)</td> <td>1.41<br> (4.77e-07s)</td> <td>13.5<br> (4.59e-06s)</td> <td><font color="#008000">1<br> (3.39e-07s)</font></td> </tr> <tr> <td><code>^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td> <td>G1 1AA</td> <td>1.28<br> (4.35e-07s)</td> <td>1.25<br> (4.25e-07s)</td> <td>11.7<br> (3.97e-06s)</td> <td><font color="#008000">1<br> (3.39e-07s)</font></td> </tr> <tr> <td><code>^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td> <td>SW1 1ZZ</td> <td>1.32<br> (4.53e-07s)</td> <td>1.31<br> (4.49e-07s)</td> <td>12.2<br> (4.2e-06s)</td> <td><font color="#008000">1<br> (3.44e-07s)</font></td> </tr> <tr> <td><code> ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$</code></td> <td>4/1/2001</td> <td>1.16<br> (3.82e-07s)</td> <td>1.2<br> (3.96e-07s)</td> <td>13.9<br> (4.59e-06s)</td> <td><font color="#008000">1<br> (3.29e-07s)</font></td> </tr> <tr> <td><code> ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$</code></td> <td>12/12/2001</td> <td>1.38<br> (4.49e-07s)</td> <td>1.38<br> (4.49e-07s)</td> <td>16<br> (5.2e-06s)</td> <td><font color="#008000">1<br> (3.25e-07s)</font></td> </tr> <tr> <td><code>^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td> <td>123</td> <td>1.19<br> (7.64e-07s)</td> <td>1.16<br> (7.45e-07s)</td> <td>7.51<br> (4.81e-06s)</td> <td><font color="#008000">1<br> (6.4e-07s)</font></td> </tr> <tr> <td><code>^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td> <td>+3.14159</td> <td>1.32<br> (8.97e-07s)</td> <td>1.31<br> (8.88e-07s)</td> <td>14<br> (9.48e-06s)</td> <td><font color="#008000">1<br> (6.78e-07s)</font></td> </tr> <tr> <td><code>^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td> <td>-3.14159</td> <td>1.32<br> (8.97e-07s)</td> <td>1.31<br> (8.88e-07s)</td> <td>14<br> (9.48e-06s)</td> <td><font color="#008000">1<br> (6.78e-07s)</font></td> </tr> </table> <br> <br> <hr> <p>Copyright John Maddock April 2003, all rights reserved.</p> </body> </html>