<html> <head> <title>MHonArc: Performance Tips</title></head> <link rel="stylesheet" type="text/css" href="../docstyles.css"> <body> <h1>MHonArc: Performance Tips</h1> <!--X-TOC-Start--> <ul> <li><a href="#intro">Introduction</a> <li><a href="#dos">DOs</a> <ul> <li><small><a href="#periods">Break up a large archive into a set of smaller ones</a></small> <li><small><a href="#pagelayout">Minimize page layout settings</a></small> <li><small><a href="#fasttempfiles">Use FASTTEMPFILES</a></small> <li><small><a href="#fieldorder">Use FIELDORDER</a></small> <li><small><a href="#maxsize">Use MAXSIZE</a></small> <li><small><a href="#mimeincs">Use MIMEINCS and/or MIMEEXCS</a></small> <li><small><a href="#quiet">Use QUIET</a></small> <li><small><a href="#tslice">Set TSLICE to smallest range possible</a></small> </ul> <li><a href="#donts">DON'Ts</a> <ul> <li><small><a href="#realtime">Don't do real-time archive updates</a></small> <li><small><a href="#mesg_spec">Don't use message specifications in resource variables</a></small> <li><small><a href="#definederived">Don't use DEFINEDERIVED</a></small> <li><small><a href="#fieldstore">Don't use FIELDSTORE</a></small> <li><small><a href="#folrefs">Don't use FOLREFS</a></small> <li><small><a href="#mailto">Don't use MAILTO</a></small> <li><small><a href="#mimefilters">Don't use certain MIMEFILTERS filters options</a></small> <li><small><a href="#modifybodyaddresses">Don't use MODIFYBODYADDRESSES</a></small> <li><small><a href="#modtime">Don't use MODTIME</a></small> <li><small><a href="#multipg">Don't use MULTIPG</a></small> <li><small><a href="#otherindexes">Don't use OTHERINDEXES</a></small> <li><small><a href="#quiet">Don't use SAVERESOURCES if you specify resource settings every time</a></small> <li><small><a href="#subjectthreads">Don't use SUBJECTTHREADS</a></small> <li><small><a href="#var_tslice">Don't use $TSLICE$</a></small> <li><small><a href="#usinglastpg">Don't use USINGLASTPG</a></small> </ul> <li><a href="#charsets">Character Encodings</a> <ul> <li><small><a href="#avoidchars">Avoid conversion if you do not need it</a></small> <li><small><a href="#latestperl">Use the latest version of Perl</a></small> <li><small><a href="#textencode">Use TEXTENCODE</a></small> </ul> </ul> <!--X-TOC-End--> <!-- ===================================================================== --> <hr> <h2><a name="intro">Introduction</a></h2> <p>This documents is a guide on how to improve the performance of <a href="http://www.mhonarc.org/">MHonArc</a>. </p> <p>The first two sections of this document cover the <a href="#dos">DOs</a> and <a href="#donts">DONT's</a>. The DOs provides things you can do to improve the performance of mhonarc. The DON'Ts provides things you should avoiding doing that decrease the performance of mhonarc. There is no requirement that you must follow all, if any of, the DOs and DON'Ts. Depending on your needs and goals, sometimes accepting a loss in performance is required to achieve a particular goal. </p> <!-- ===================================================================== --> <hr> <h2><a name="dos">DOs</a></h2> <!-- .................................................................... --> <h3><a name="periods">Break up a large archive into a set of smaller ones</a></h3> <p>MHonArc performance degrades as an archive gets larger. Therefore, a common performance improvement practice is use a sequence of MHonArc archives to comprise the complete archive of a mailing list. The smaller archives are generally organized by time period, like by month. </p> <p>An example of this practice is provided by <a href="http://www.mhonarc.org/mharc/"><b>mharc</b></a>, where archives are organized by monthly, or yearly, time periods to avoid performance problems. </p> <!-- .................................................................... --> <h3><a name="pagelayout">Minimize page layout settings</a></h3> <p>The more resource variables you use in page layout settings, the more processing time is required to render layout. Avoid unnecessary uses of resource variables. </p> <!-- .................................................................... --> <h3><a name="fasttempfiles">Use FASTTEMPFILES</a></h3> <p>Use <a href="../resources/fasttempfiles.html">FASTTEMPFILES</a>. Make sure take notice of the security implications before enabling this resource. </p> <!-- .................................................................... --> <h3><a name="fieldorder">Use FIELDORDER</a></h3> <p>Use the <a href="../resources/fieldorder.html">FIELDORDER</a> resource to define which header fields you want to show in message pages. Avoid using the special field value "<tt>-extra-</tt>". Only essential header fields should be specified. </p> <!-- .................................................................... --> <h3><a name="maxsize">Use MAXSIZE</a></h3> <p>Use <a href="../resources/maxsize.html">MAXSIZE</a> to set a limit on the size of your archive. As mentioned earlier, MHonArc performance degrades as an archive gets larger. </p> <p>If you need to keep older message around, then see: <a href="#periods"><cite>Break up a large archive into a set of smaller ones</cite></a>. Also see the <a href="../resources/keeponrmm.html">KEEPONRMM</a> resource. </p> <!-- .................................................................... --> <h3><a name="mimeincs">Use MIMEINCS and/or MIMEEXCS</a></h3> <p> <a href="../resources/mimeincs.html">MIMEINCS</a> and <a href="../resources/mimeexcs.html">MIMEEXCS</a> allows you to explicitly define which media-types you will allow in your archive. Excluding media-types helps reduce message processing overhead, and it can improve the security of your archive. </p> <!-- .................................................................... --> <h3><a name="quiet">Use QUIET</a></h3> <p><a href="../resources/quiet.html">QUIET</a> disables informational diagnostics when processing. Error and warning diagnostics will still get printed. </p> <!-- .................................................................... --> <h3><a name="tslice">Set TSLICE to smallest range possible</a></h3> <p>If you do not use <a href="#var_tslice"><tt>$TSLICE$</tt></a>, set the <a href="../resources/tslice.html">TSLICE</a> resource to "<tt class="icode">0:0:0</tt>" to avoid unnecessary page edits when messages are added to an archive. </p> <p>If you do choose to use <tt>$TSLICE$</tt>, set <a href="../resources/tslice.html">TSLICE</a> to the smallest range you plan on using to minimize the amount of processing overhead. </p> <p><b>See Also</b>: <a href="#var_tslice">Don't use $TSLICE$</a>. </p> <!-- ===================================================================== --> <hr> <h2><a name="donts">DON'Ts</a></h2> <!-- .................................................................... --> <h3><a name="realtime">Don't do real-time archive updates</a></h3> <p>It is tempting to update an archive right when a message arrives, but if mail traffic volume is high, this can cause a bottleneck and a queuing up of multiple mhonarc process waiting to lock the archive. Even if general traffic is not high, a burst of incoming mail can cause problems. </p> <p>It is recommended to update an archive on a well-defined periodic basis. It avoids lock queuing problems and minimizes the overhead of invoking the perl interpreter for each incoming message. Facilities like <b><tt>cron</tt></b> (standard on Unix-like systems) can be used to invoke mhonarc on a periodic basis. </p> <p>Cron-like invocation also make archive administration easier since it is easier to disable archive updates when administration tasks are required. Raw, incoming messages can still be queued up until administrative tasks are complete in order to avoid message bounces. </p> <!-- .................................................................... --> <h3><a name="mesg_spec">Don't use message specifications in resource variables</a></h3> <p>Many message content-related resource variables (e.g. <tt>$SUBJECT$</tt>, <tt>$FROM$</tt>, <tt>$DATE$</tt>, etc), can take a <a href="../rcvars.html#mesg_spec">message specification</a> argument. If a specification is provided that references a message that is not the current message, MHonArc must resolve the specification to reference the proper message information to expand the variable. </p> <p>Some message specifications are more costly than others, they include: <tt>TEND</tt>, <tt>TNEXTTOP</tt>, <tt>TPARENT</tt>, <tt>TPREVTOP</tt>, <tt>TTOP</tt>. </p> <!-- .................................................................... --> <h3><a name="definederived">Don't use DEFINEDERIVED</a></h3> <p><a href="../resources/definederived.html">DEFINEDERIVED</a> allows you to define extra files to generate for each message. The MHonArc documentation shows how this resource can be used to provide frame-based navigation. Although frame-based navigation seems "cool", avoid it. It normally does not provide any effective improvement in archive navigation and it can be prohibitive in some cases: non-frame-aware browsers and people with disabilities. </p> <!-- .................................................................... --> <h3><a name="fieldstore">Don't use FIELDSTORE</a></h3> <p>Only use <a href="../resources/fieldstore.html">FIELDSTORE</a> if you have a real need for it. </p> <p>By default, this resource is nil. </p> <!-- .................................................................... --> <h3><a name="folrefs">Don't use FOLREFS</a></h3> <p>Disable <a href="../resources/folrefs.html">FOLREFS</a> unless you find it useful. The normal next and previous thread links may be sufficient. </p> <p>Also, FOLREFS is not subject-aware like the thread links are. Therefore, FOLREFS can provide reader confusion when no follow-ups are listed but the thread index shows follow-ups (due to same message subject). </p> <table class="note" width="100%"> <tr valign="baseline"> <td><strong>NOTE:</strong></td> <td width="100%"><p>If you do decide to use <a href="#var_tslice"><tt>$TSLICE$</tt></a> in message pages, then you should definitely disable FOLREFS since it would be redundant, and probably inconsistent. </p> </td> </tr> </table> <!-- .................................................................... --> <h3><a name="mailto">Don't use MAILTO</a></h3> <p>Disabling <a href="../resources/mailto.html">MAILTO</a> may provide little, to negligible, performance gain, but if you do not care about email address linking, then no need to keep this resource enabled. </p> <!-- .................................................................... --> <h3><a name="mimefilters">Don't use certain MIMEFILTERS filters options</a></h3> <p> <a href="../resources/mimefilters.html">MIMEFILTERS</a> is used to register filters for media-types. Many of the filters provided with MHonArc support a myriad of options to customize their behavior. However, some of these options increase processing overhead. The following highlights filter options to avoid, or consider, from a performance perspective: </p> <p><b>Filter options to avoid:</b> The following options will degrade performance:</p> <ul> <li><a href="../resources/mimefilters.html#m2h_text_plain" ><tt>m2h_text_plain::filter</tt></a>: <tt>fancyquote</tt>, <tt>htmlcheck</tt>, <tt>keepspace</tt>, <tt>maxwidth</tt>, <tt>quote</tt>, <tt>uudecode</tt>. </ul> <p><b>Filter options to consider:</b> The following options will improve performance:</p> <ul> <li><a href="../resources/mimefilters.html#m2h_text_html" ><tt>m2h_text_html::filter</tt></a>: <tt>disablerelated</tt>. <li><a href="../resources/mimefilters.html#m2h_text_plain" ><tt>m2h_text_plain::filter</tt></a>: <tt>disableflowed</tt>, <tt>nourl</tt>. </ul> <!-- .................................................................... --> <h3><a name="modifybodyaddresses">Don't use MODIFYBODYADDRESSES</a></h3> <p>Disabling <a href="../resources/modifybodyaddresses.html">MODIFYBODYADDRESSES</a> may not be an option if you are protecting your archive from address harvesters. </p> <!-- .................................................................... --> <h3><a name="modtime">Don't use MODTIME</a></h3> <p><a href="../resources/modtime.html">MODTIME</a>, when enabled, causes each message file modification time to be equal to the date of the message. By default, this resource is disabled. </p> <table class="note" width="100%"> <tr valign="baseline"> <td><strong>NOTE:</strong></td> <td width="100%"><p>If you use a search indexer on your archive, enabling MODTIME may actually improve overall performance. Some search indexers key off of a file's modification time to determine if the file needs to be re-indexed. With MODTIME enabled, if a message file is edited by MHonArc (due to new messages being added or EDITIDX), the file will not be unnecessarily re-indexed. </p> <p>Also, some search indexers may key of the file modification time for purposes of date ordering in search results. If this type of functionality is desired, you will need to enable MODTIME. </p> </td> </tr> </table> <!-- .................................................................... --> <h3><a name="multipg">Don't use MULTIPG</a></h3> <p><a href="../resources/multipg.html">MULTIPG</a> causes MHonArc indexes to be printed across multiple pages, requiring more processing work versus a single page. If following the advice provided in, "<a href="#periods"><cite>Break up a large archive into a set of smaller ones</cite></a>," using MULTIPG is generally not necessary. </p> <!-- .................................................................... --> <h3><a name="otherindexes">Don't use OTHERINDEXES</a></h3> <p><a href="../resources/otherindexes.html">OTHERINDEXES</a> provides the ability to generate alternate indexes. Unless the users of your archive have a real need for alternate indexes from the main and thread already provided, avoid using this resource. </p> <p>For example, many believe an author index is useful, along with the date and thread indexes. However, this may be a subjective perception versus knowing the real reading habits of archive readers. If there are definite needs for alternate navigational services, sometimes a search engine (if you are already using one) can indirectly provide these services. </p> <!-- .................................................................... --> <h3><a name="quiet">Don't use SAVERESOURCES if you specify resource settings every time</a></h3> <p>It is common practice to specify resource settings (especially RCFILE) each time mhonarc is invoked. This is generally done to make administration easier since alternate invocations are not required when an archive is first created versus when it is updated. </p> <p>If you specify resource settings every time, disable <a href="../resources/saveresources.html">SAVERESOURCES</a> to avoid the unnecessary saving of resource settings to the <a href="../resources/dbfile.html">database</a>. </p> <!-- .................................................................... --> <h3><a name="subjectthreads">Don't use SUBJECTTHREADS</a></h3> <p>When <a href="../resources/subjectthreads.html">SUBJECTTHREADS</a> is enabled (the default), MHonArc will examine message subjects when computing threads. It is still common for some mail composition software to not include a message-id reference when a user replies to a message. </p> <p>Subject-based detection adds extra processing overhead during thread computation. If you know messages in your archive define <tt>References</tt> and/or <tt>In-Reply-To</tt> header fields for message replies, then disable SUBJECTTHREADS. </p> <!-- .................................................................... --> <h3><a name="var_tslice">Don't use $TSLICE$</a></h3> <p>The <a href="../rcvars.html#TSLICE"><tt>$TSLICE$</tt></a> resource variable generates a slice of the thread relative to the current message. <tt>$TSLICE$</tt> is not part of MHonArc's default resource values, but many users like to include it within message pages as an additional (useful) navigational aid. </p> <p>The following is an example of what a thread slice may look like: </p> <blockquote> <ul> <li><b><a href="javascript:void(0)">Stripping signature / tagline / adline</a></b>, <i>East Coast Coder</i> <ul> <li><b><a href="javascript:void(0)">Re: Stripping signature / tagline / adline</a></b>, <i>Earl Hood</i> <ul> <li><span class="sliceCur"><strong>Re: Stripping signature / tagline / adline</strong>, <em>Jym Dyer</em> <b></span><=</b> <ul> <li><b><a href="javascript:void(0)">Re: Stripping signature / tagline / adline</a></b>, <i>East Coast Coder</i> <li><b><a href="javascript:void(0)">Re: Stripping signature / tagline / adline</a></b>, <i>Jym Dyer</i> </li> </li> </ul> </li> </ul> </li> </ul> </li> </ul> </blockquote> <p>The "next" and "previous" thread links are already provided within message pages, which may be sufficient for your needs. </p> <p><b>See Also</b>: <a href="#tslice">Set TSLICE to smallest range possible</a>. </p> <!-- .................................................................... --> <h3><a name="usinglastpg">Don't use USINGLASTPG</a></h3> <p>If you have disabled <a href="#multipg">MULTIPG</a>, then you do not need to worry about <a href="../resources/usinglastpg.html">USINGLASTPG</a>. If you choose to use MULTIPG, then disable USINGLASTPG if you can. </p> <p>If you need to have a links to the last page of an index listing, alternatives to using <a href="../rcvars.html#PG"><tt>$PG(LAST)$</tt></a> can be implemented. For example, under Unix, a post-processing task can create/update a fixed-named symbolic link to the last index page, with archive pages referencing the symbolic link instead of using <tt>$PG(LAST)$</tt>. </p> <!-- ===================================================================== --> <hr> <h2><a name="charsets">Character Encodings</a></h2> <p>MHonArc provides robust support for dealing with a variety of character encodings. For an overview of how textual data is processed by MHonArc, see the <a href="../resources/textencode.html#vscharsetconverters">TEXTENCODE</a> resource. </p> <table class="note" width="100%"> <tr valign="baseline"> <td><strong>NOTE:</strong></td> <td width="100%"><p>With respect to email, the term <em>character sets</em> (or <em>charsets</em> for short) is used when discussing character encodings. Both terms will be used interchangeably since the technical differences between the two terms is not relevant for this document. </p> </td> </tr> </table> <p>Some charsets may incur a greater cost to performance. If your archive comprises of only English messages — US-ASCII charset — then there are no performance issues. But if your archive has non-English messages, especially Asian-based encodings, there can be noticeable performance hits. </p> <p>The following are suggestions you may follow to minimize the performance impacts of charset processing: </p> <!-- .................................................................... --> <h3><a name="avoidchars">Avoid conversion if you do not need it</a></h3> <p>If your archive will only contain messages in a single encoding, then avoid unnecessary conversion processing. The following resource settings define the absolute minimum in text processing and causes archive messages to be rendered in the default locale of the web browser: </p> <pre class="code"> <!-- DECODEHEADS can be used to improve resource variable expansion. See <a href="../resources/decodeheads.html">DECODEHEADS</a> resource for more information. --> <b><a href="../resources/decodeheads.html"><DecodeHeads></a></b> <!-- Only convert HTML specials --> <b><a href="../resources/charsetconverters.html"><CharsetConverters override></a></b> plain; mhonarc::htmlize; default; -decode- <b></CharsetConverters></b> </pre> <p>If your locale is a non-English, non-Latin-1 one, you may need to specify the locale explicitly in archive pages; the default locale of web browsers may not match the locale of the archive. For example, if your locale is Polish (ISO-8859-2), then something like the following resource settings can be used: </p> <pre class="code"> <!-- The following resource settings are just the default settings for each resource but with the appropriate <meta http-equiv> tag added. --> <b><a href="../resources/idxpgbegin.html"><DefineVar chop></a></b> HTTP-EQUIV <span class="highlight"><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2"></span> <b></DefineVar></b> <b><a href="../resources/idxpgbegin.html"><IdxPgBegin></a></b> <html> <head> <title><b><a href="../rcvars.html#IDXTITLE">$IDXTITLE$</a></b></title> <span class="highlight">$HTTP-EQUIV$</span> </head> <body> <h1><b><a href="../rcvars.html#IDXTITLE">$IDXTITLE$</a></b></h1> <b></IdxPgBegin></b> <b><a href="../resources/tidxpgbegin.html"><TIdxPgBegin></a></b> <html> <head> <title><b><a href="../rcvars.html#TIDXTITLE">$TIDXTITLE$</a></b></title> <span class="highlight">$HTTP-EQUIV$</span> </head> <body> <h1><b><a href="../rcvars.html#TIDXTITLE">$TIDXTITLE$</a></b></h1> <b></TIdxPgBegin></b> <b><a href="../resources/msgpgbegin.html"><MsgPgBegin></a></b> <html> <head> <title><b><a href="../rcvars.html#SUBJECTNA">$SUBJECTNA$</a></b></title> <link rev="made" href="mailto:<b><a href="../rcvars.html#FROMADDR">$FROMADDR$</a></b>"> <span class="highlight">$HTTP-EQUIV$</span> </head> <body> <b></MsgPgBegin></b> </pre> <!-- .................................................................... --> <h3><a name="latestperl">Use the latest version of Perl</a></h3> <p>Perl 5.8, and later, provides built-in support for character encodings, with UTF-8 supported internally and the <tt>Encode</tt> module providing character encoding conversion facilities. MHonArc will leverage such features if available and applicable to improve performance. </p> <p>If using older versions of Perl, MHonArc still provides robust character encoding support, but performance is not as good. </p> <!-- .................................................................... --> <h3><a name="textencode">Use TEXTENCODE</a></h3> <p>If your archive will contain data in multiple encodings, consider using the <a href="../resources/textencode.html">TEXTENCODE</a> resource. The <a href="../resources/textencode.html">TEXTENCODE</a> resource allows you to convert all message data into a single encoding, simplifying subsequent processing done by MHonArc. The most common usage of <a href="../resources/textencode.html">TEXTENCODE</a> is to normalize all message data to UTF-8 (Unicode). See the <tt><a href="../rcfileexs/utf-8-encode.mrc.html">utf-8-encode.mrc</a></tt> example resource file on how to encode all text to UTF-8. </p> <table class="caution" width="100%"> <tr valign="baseline"> <td><strong style="color: red;">CAUTION:</strong></td> <td width="100%"><p>Although most modern web browsers support UTF-8, not all search engines do. If you use a search engine, or plan to use one, verify that UTF-8 is supported. </p> </td> </tr> </table> <p> </p> <!-- ===================================================================== --> <hr> <address> $Date: 2011/01/03 06:42:38 $ <br> <img align="top" src="../monicon.png" alt=""> <a href="http://www.mhonarc.org/" ><strong>MHonArc</strong></a><br> Copyright © 2005, <a href="http://www.earlhood.com/" >Earl Hood</a>, <a href="mailto:mhonarc%40mhonarc.org" >mhonarc<!-- -->@<!-- -->mhonarc.org</a><br> </address> </body> </html>