Customization of notification messages and log entries ====================================================== Mark Martinec <Mark.Martinec@ijs.si>, 2002, 2004, 2006, 2007, 2008 Since March 2002 amavisd-new provides a way to customize e-mail notification messages that are sent in response to a virus (and spam) detection, without having to resort to modifications of Perl code. Three types of messages are generated: - administrator notifications are sent to administrator; (two types: virus, spam) - sender (non)delivery notifications may be sent to the mail originator; (three types: virus, spam, rejects by outgoing MTA) - recipient warnings may be sent to envelope recipients of the e-mail containing a virus. These notifications are normally disabled since they are more of a nuisance than benefit. (only for viruses) Besides the three types of e-mail notifications, the same customization principle is applied to customize the most common and useful log entry (present even at log level 0) that is generated when an e-mail containing an unwanted content is detected. Default Template texts are glued to the end of the 'amavisd' file, separated one from another by __DATA__ lines. Please see comments in these templates. These template texts are assigned to variables: $log_templ, $notify_sender_templ, $notify_virus_sender_templ, $notify_virus_admin_templ, $notify_virus_recips_templ, $notify_spam_sender_templ, $notify_spam_admin_templ The value of these variables may be overruled by assignment or by reading into them in the amavisd.conf file, which is run at amavisd startup time. If assigning to variables, care must be taken to properly quote certain special characters (like backslash), as required by Perl quoting rules. Text read from amavisd file or from external files is not subject to Perl quoting rules. Template text is subject to simple run-time macro expansion as described in the next section. Macro expansion is performed by the routine expand(), which receives substitution text of simple macros from 'amavisd' program. expand() takes a template string as its argument, performs macro expansion, and returns resulting multiline string back to 'amavisd', which uses it to send mail notifications or to write log entries. The substitution text for the following simple macros is built-in: - to be used in forming a mail header (properly quoted addresses as required by RFC2822): f administrator's e-mail address (typically used in 'From:' header of notification messages); T a list of recipients to be used in 'To:' header of the notification; C a list of recipients to be used in 'Cc:' header of the notification; B a list of recipients to be used in 'Bcc:' header of the notification; (the T, C and B lists are determined by each notification subroutine) - to be used in forming a notification mail body or log entry: p the current policy bank name (or empty if a built-in policy bank is still in place); h dns name of this host, or configurable name (variable $myhostname) HOSTNAME same as h n amavis internal task id (also called message id, am_id) as shown in the log and by amavisd-nanny, e.g. 58725-05-2 b message digest of mail body (MD5, hex) date_unix_utc timestamp of the message reception - Unix time (seconds since 1970-01-01T00:00Z as a decimal integer) date_iso8601_utc timestamp of the message reception - ISO 8601 (EN 28601) UTC date-time date_iso8601_local => sub {iso8601_timestamp($MSGINFO->rx_time)}, date_rfc2822_local timestamp of the message reception - RFC 2822 local date-time format week_iso8601 returns an ISO 8601 week number (between 1 and 53) corresponding to date_iso8601_local partition_tag give the current value of a partition_tag attribute which will be stored in SQL records when SQL quarantining or logging is enabled; the value is derived from a setting $sql_partition_tag (global or per-policy bank); a typical usage is to use ISO 8601 week number as a pertition tag; P same as partition_tag u same as date_unix_utc d same as date_rfc2822_local U same as date_iso8601_utc y elapsed time in ms for this message (not including final cleanup) s original envelope sender, rfc2821-quoted and enclosed in angle brackets rfc2822_sender an e-mail address from a Sender header field, or empty rfc2822_from an e-mail address from a From header field (possibly more than one) rfc2822_resent_sender e-mail address from all Resent-Sender header fields rfc2822_resent_from e-mail address from all Resent-From header fields tls_in returns TLS ciphers in use by a SMTP session if mail came to amavisd through a TLS-encrypted session, otherwise result is empty t first entry in the 'Received' trace of the mail header a original SMTP session client IP address(empty if unknown,e.g. no XFORWARD) g original SMTP session client DNS name (empty if unknown, e.g. no XFORWARD) e best guess of the originator IP address collected from the Received trace l (letter ell, suggesting 'local') is true if a variable 'originating' is true, and is an empty string otherwise; the boolean variable 'originating' is under policy bank control, and usually corresponds to a sending host (SMTP client's IP address) matching @mynetworks_maps, or the client being an authenticated roaming user; o best attempt at determining true sender of the virus - normally same as %s S address that will get sender notification; this is normally a one-entry list containing sender address (same as %s), but may be unmangled/reconstructed in an attempt to undo the address forging done by some viruses; in case of unknown (e.g. forged) sender address, the result is empty. R a list of original envelope recipients (for use in notification body, not headers) D a list of recipients with successful delivery status (will get mail) O a list of recipients with unsuccessful delivery status (will not get mail) N a list of recipients with UNsuccessful delivery status (will NOT get mail) with included short per-recipient delivery reports as used in the free format first MIME part of delivery status notifications j 'Subject' header field body m 'Message-ID' header field body r first 'Resent-Message-ID' header field body header_field field body of the header field specified in the argument; can only obtain header field stored in $msginfo->orig_header_fields, and only its first occurrence; currently these are: from, to, cc, sender, subject, received, message-id, resent-message-id, in-reply-to, references, precedence, list-id, user-agent, x-mailer, dkim-signature, domainkey-signature, authentication-results; argument is case-insensitive useragent returns 'User-Agent: ...' or 'X-Mailer: ...' header field (whichever is present); an optional argument specifies whether an entire field is to be returned (empty or unrecognized argument), or just a field name (argument: 'name'), e.g. 'X-Mailer'; or just a field body (argument 'body'), e.g. 'Thunderbird_1.5.0.9'; x-mailer 'X-Mailer' header field body (just body, without header field name), deprecated, use macros header_field or useragent instead; Q MTA queue ID of the message if available (Courier, sendmail milter/AM.PDP) H a list of all header lines (field may be wrapped over more than one line); this does not include the 'Return-Path:' or 'Delivered-To:' header fields, which would have been added (or will be added later) by the local delivery agent if mail would have been delivered to a mailbox. z original mail size (in bytes) i long-term unique mail_id on this system, possibly used in log and in quarantine names (also in releasing from a quarantine), e.g. jaUETfyBMJHG q list of quarantine mailbox names, or empty if not quarantined ccat_maj (deprecated, use [:ccat|major]) a major category number of the blocking or a main contents category, see constants CC_* in file amavisd ccat_min (deprecated, use [:ccat|minor]) a minor category number of the blocking or a main contents category, usually 0 unless more specific information is available (e.g. details about bad header or tag3 spam level) ccat_name (deprecated, use [:ccat|name]) a display name of the blocking or main contents category best describing mail contents ccat a new general-purpose macro providing access to information about a mail contents category. Macro 'ccat' takes two optional fixed-string arguments, which are interpreted case-insensitively. In their absence it expands to a string "(maj,min)" which shows a major and a minor contents category number of a blocking ccat for a blocked message, and of a main contents category for a passed message. The first argument specifies which attribute of a ccat is to be provided, the second argument specifies whether a main or a blocking contents category is to be consulted: The first argument may be any of the following strings: name ... provide a human-readable name of a ccat (%ccat_display_names) major ... provide a number: a major contents category, values correspond to CC_* constants in the program minor ... provide a number: a minor contents category, often a 0 <empty>... empty argument (also a default) results in a string "(maj,min)" is_blocking ... '1' if blocking_ccat is true (message is being blocked), or an empty string when a message is being passed; is_nonblocking .. the opposite: '1' if blocking_ccat false, '' otherwise is_blocked_by_nonmain .. '1' if blocking_ccat is true _and_ is different from a main contents category; The second argument may be any of the following strings: main ... provide information on main contents category when asked for name/major/minor/<empty> blocking.. provide information on blocking contents category if it exists, otherwise it falls back to providing info on main ccat; this is also a default in the absence of this argument; v output of the (last) virus checking program V a list of virus names found; contains at least one entry (possibly empty) if a virus was found, otherwise a null list banning_rule_key a lookup key (a regexp) of a banning rule which matched, e.g.: (?-xism:^\\.(exe-ms|dll)$(?# rule #9)) banning_rule_comment a comment from a regexp in banning_rule_key, or the whole banning_rule_key if it does not contain a comment, e.g.: rule #9 banning_rule_rhs right-hand side of a banning rule which matched, often just a '1', but can be any string which evaluates to true banned_parts a list of banned parts, with their full path in a MIME/archive e.g.: multipart/mixed | application/octet-stream,.exe,.exe-ms,videos.exe F just the first leaf node from banned_parts, with a prepended rule comment (if any), e.g.: rule #9:application/octet-stream,.exe,.exe-ms,videos.exe W a list of av scanner names detecting a virus X a list of header syntax violations A a list of SpamAssassin report lines for body report, similar to macro SUMMARY, but provides a list instead of a single string c spam level/hits (mnemonic: sCore) as provided by SpamAssassin, including a per-recipient boost (shown as explicit sum); see also: SCORE 1 above tag level for any recipient? 2 above tag2 level for any recipient? k any recipient declared the message be killed ? T list of triggered SA tests (only in $log_templ and $log_recip_templ) wrap with arguments: width, prefix, indent, string will wrap a string to a multiline string of the specified width; for details see comments before the 'sub wrap_string' in file amavisd; lc lowercases arguments and concatenates them to a single string uc uppercases arguments and concatenates them to a single string substr same as Perl function substr: returns a substring of the first argument, starting at character position specified by argument 2, limited to arg3 characters unless arg3 is not specified; character positions start at 0 index same as Perl function index: locates substring arg2 within arg1, returns -1 if not found, or a character position of the first match len returns string length of its first argument incr returns arg1 incremented by 1 (non-numeric argument interpreted as 0); in the presence of arguments beyond arg1, their sum is added to arg1 decr returns arg1 decremented by 1 (non-numeric argument interpreted as 0) in the presence of arguments beyond arg1, their sum is subtracted from ar1 min returns the smallest number from its arguments, all-whitespace arguments are ignored max returns the largest number from its arguments, all-whitespace arguments are ignored sprintf (just like a Perl function sprintf): formats its arguments, after the first, under control of the format, which is the first argument; the format is a character string which contains three types of objects: plain characters, which are simply copied to resulting string, character escape sequences which are converted and copied to the resulting string, and format specifications, each of which takes the next successive argument and inserts its formatted representation to the resulting string; Note that to get a percent character it needs to be doubled, to avoid its special meaning as a macro call introducer, e.g. [:sprintf|%%s %%.1f|text|[:SCORE]] join (just like a Perl function join): the first argument is a separator string, remaining arguments are strings to be concatenated, with a separator string inserted at every concatenation point; dquote encloses its argument in double quotes and doubles existing double quotes within a string (suitable to sanitize Subject header field, e.g. ab"oh"cd -> "ab""oh""cd"); provisional - exact interpretation may change (and has changed, prior to 2.4.5 double quotes were replaced by \", which made parsing tricky as backquotes themselves were not escaped); uquote replaces one or more consecutive space or tab characters by '_', but does not protect existing underlines, which makes it a lossy transformation (suitable for logging of From or To header fields); provisional - exact interpretation may change; limit takes two arguments: a string size limit and some string, returning the string from the second argument unchanged if its size is below the limit, or cropped to the size limit, with '[...]' appended; supplementary_info gives access to some additional information provided by content scanners, such as a provided by SpamAssassin API routine get_tag. The macro takes two arguments, the first is a tag name (a name of some attribute which is expected to provide an associated value), the second argument is a sprintf format string and is optional, if missing a %s is assumed. Currently the only available attributes are AUTOLEARN, AUTOLEARNSCORE, SC, SCRULE, SCTYPE, RELAYCOUNTRY, and LANGUAGES. These are nonempty only when an associated SpamAssassin plugin or function is enabled. report_format gives one of the: dsn, arf, attach, plain, resend, according to a report or notification format being generated; feedback_type is expected to give one of the ARF (draft) strings: abuse, fraud, miscategorized, not-spam, opt-out, opt-out-list, virus, other, as supplied in a call to delivery_status_notification(); dkim takes one argument and returns the requested information about valid DKIM or DomainKeys signatures found in a mail header: any (or an empty argument) returns true if there was at least one valid signature present; author is nonempty if a valid author signature is present (matching a From header field), actual value happens to be a signing domain; sender is nonempty if a valid sender signature is present (signature matching a Sender header field, or matching a From if Sender is not present), the actual value happens to be a signing domain; thirdparty is nonempty if there is no valid author signature but at least one valid signature is present, the actual value happens to be a signing domain; envsender is nonempty if a valid signature is present which is matching the envelope sender address, the actual value happens to be a signing domain; identity returns a comma-separated list of signing identies ('i' tag) found in all valid signatures; domain returns a comma-separated list of signing domains ('d' tag) found in all valid signatures; any other value (typically containing an '@') is interpreted as a 'verifier acceptable identity' ('mailbox@domain', or just '@domain'), and the result is true if there exists a valid signature whose signing identity matches the supplied verifier acceptable identity, otherwise the result is false; if mailbox@domain form is given, it must be exactly equal to the signing identity for a match; when only a @domain form is given, only the domain must match (mailbox is ignored), allowing a partial match where a domain of a signing identity is a subdomain of the verifier acceptable identity; A couple of SpamAssassin look-alike macros with the same names and arguments as in SA, see 'man Mail::SpamAssassin::Conf' for details: AUTOLEARN autolearn status (deprecated, use supplementary_info instead) DATE same as date_rfc2822_local SCORE similar to macro 'c', but returns a single number (sum of SA score and boost), and allows padding as per SA documentation. In a per-message log ($log_templ) when a message has multiple recipients, a minimum value across all recipients is given; STARS score as in macro SCORE, but represented as a bar of characters REPORT a SA terse report of tests hit (for header reports) SUMMARY similar to macro A, but provides a single multiline string, a SA summary of tests hit for standard body reports REMOTEHOSTADDR IP address of your connecting MTA (often 127.0.0.1) REMOTEHOSTNAME your connecting MTA (often [127.0.0.1] or localhost) TESTS similar to %T in logging, but without scores TESTSSCORES similar to %T in logging, but allows to specify a separator YESNO similar to macro '2', but provides Yes/No instead of 1/0 YESNOCAPS similar to macro '2', but provides YES/NO instead of 1/0 REQD minimal tag2 level of all recipients HEADER same as a header_field macro and two additional SA-lookalikes, but with no counterparts in SpamAssassin: LOGID log id (a.k.a. am_id) e.g. 58725-05-2, synonym for macro n MAILID mail_id as in quarantine e.g. jaUETfyBMJHG, synonym for macro i - when $log_recip_templ is expanded (by-recipient log entry), certain macros keep their general semantics but reflect a value for that recipient: %. value is a recipient counter (starting by 1) when $log_recip_templ is expanded, and is undef when other templates are expanded; %R current recipient email address %D recipient email address if mail will be delivered, otherwise empty %O recipient email address if mail will NOT be delivered, otherwise empty %N short DSN if mail will not be delivered for this recip, otherwise empty %c spam level/hits including a per-recipient boost %0 recipient email address belong to local_domains_maps: L or 0 %1 above tag level for this recipient: Y or 0 %2 above tag2 level for this recipient: Y or 0 %k above kill level for this recipient: Y or 0 REQD recipient's tag2_level ccat_maj major category number, takes into account per-recip bypass_* ccat_min minor category number, takes into account per-recip bypass_* ccat_name display name of the c.cat, takes into account per-recip bypass_* remote_mta MTA to which a message was forwarded remote_mta_smtp_response MTA's SMTP response on accepting forwarded message smtp_response either an MTA's SMTP response for forwarded mail, or internally generated SMTP response for mail that was not forwarded score_boost internally generated score points to be added to a SA score The choice of capital letters for lists, and lower case letters for simple strings is purely a convention and is not enforced, neither do all built-in macros adhere to the convention. Further built-in macros are easy to add if special need arises, just append new key/value pairs to the hash which is passed to expand(). Besides a simple string or an array reference, a hash value may also be a subroutine reference which will be called later during macro expansion. This way one can provide a method for obtaining information which is not yet available during initial construction of the hash (such as AV scanner results), or provide a lazy evaluation for more expensive calculations. Subroutine will be evaluated in scalar context. It may return a string or an array reference. The rest of this text explains what expand() does with its arguments. EXPAND This is a simple, yet fully fledged macro processor with proper lexical analysis, call stack, quoting levels, user supplied and builtin macros, three builtin flow-control macros: selector, regexp selector and iterator, plus a macro #, which discards input tokens until newline (like 'dnl' in m4). Also recognized are the usual \c and \nnn forms for specifying special characters, where c can be any of: r, n, f, b, e, a, t. Lexical analysis of the input string is performed only once, macro result values are not in danger of being lexically re-parsed and are treated as plain characters, loosing any special meaning they might have. New macros (with arguments) can be defined by a built-in macro [= name | body ]. Macro calls can be neutral (not re-evaluating the result) or active (pushing result back to input for re-evaluation). Simple caller-provided macros can evaluate to a string (possibly empty or undef), or an array of strings. It can also be a subroutine reference, in which case the subroutine will be called whenever macro is evaluated, supplying it with arguments from the macro call (first argument is a macro name). The subroutine must return a scalar: a string, or an array reference. The result will be treated as if it were specified directly. Two simple forms of macro calls are known: %x and %#x (where x is a single letter macro name, i.e. a key in a user-supplied associative array): %x evaluates to the hash value associated with the name x; if the value is an array ref, the result is a single concatenated string of values separated with comma-space pairs and is not re-evaluated (i.e. it is a form of a neutral macro call) %#x evaluates to a number: if a macro value is a scalar, returns 0 for all-whitespace value, and 1 otherwise. If a value is an array ref, evaluates to the number of elements in the array. Neutral call. A literal percent character can be produced by %% or \%. To take away any special meaning of other characters they can be quoted by a backslash, e.g. \[ or \\ or \"] . A more general form of a macro call provides an ability to call macros with longer names, to provide arguments to a call, and to specify whether the call is a neutral or an active call: [@ name |arg1|arg2|...|argn] is an active macro call, i.e. the result is pushed back to input and evaluated like usual; there may be any number of actual arguments supplied, argument 0 is a macro name; Neither a macro name nor arguments are implicitly quoted, so if it is desired to prevent their evaluation before a call, they should be quoted by enclosing them in a pair of [" and "], e.g.: [@["name"]|["arg1"]|["arg2"]| ... |["argn"]] [: name |arg1|arg2|...|argn] is an neutral macro call, i.e. the result is NOT pushed back to input for evaluation, but goes directly to destination (either straight to output string, or collected as argument if this is a nested macro call); Note that whitespace around macro name is allowed (is optional), and is automatically removed, but space within arguments receives no special treatment, and is passed on to a macro like other characters. There is a further simple form of a neutral macro call which emulates SpamAssassin syntax: _NAME_ or _NAME(...)_ where a macro name must be composed of capital letters only, and an optional string in () is passed to a macro as its first argument (besides argument 0, which is a macro name). Commas within (...) are not special, calls like _TESTS(,)_ and _SPAMMYTOKENS(2,short)_ still provide a single argument: "," or "2,short" respectively, to accommodate SA peculiarity. These all-capitals macros can still be called by a normal general-purpose form of a macro call for greater flexibility, as described above. A macro is evaluated only in non-quoted context. Enclosing strings between tokens [" and "] prevents its evaluation. Quoting may be nested, quote tokens must be balanced. Evaluating a quoted input strips off one level of quotes. As a special feature two built-in macros (selector and iterator) provide implicit quoting of their arguments to keep clutter in these two most common macros calls down to a minimum. Unfortunately this causes some complications in the code, but the feature is kept for backwards compatibility. Built-in macros selector, regexp selector and iterator have the following syntax: [? arg1 | arg2 | ... ] a selector [~ arg1 | arg2 | ... ] a regexp selector [ arg1 | arg2 | ... ] an iterator where [, [?, [~, | and ] are required tokens. Arguments are arbitrary text, possibly multiline, whitespace counts. Nested macro calls are permitted, proper bracket nesting must be observed. SELECTOR lets its first argument be evaluated immediately, and implicitly quotes remaining arguments. The evaluated first argument chooses which of the remaining arguments is selected as a result value. The chosen result is only then evaluated (i.e. this is an active macro call), non-selected arguments are discarded without evaluation. The first argument is usually a number (with optional leading and trailing whitespace). If it is a non-numeric string, it is treated as 0 for all-whitespace and as 1 otherwise. Value 0 selects the very next (second) argument, value 1 selects the one after it, etc. If the value is greater than the number of available arguments, the last one (unless it is the only one) is selected. If there is only one (the first) alternative available but the value is greater than 0, an empty string is returned. Examples: [? 2 | zero | one | two | three ] -> two [? foo | none | any | two | three ] -> any [? 24 | 0 | one | many ] -> many [? 2 |No recipients] -> (empty string) [? %#R |No recipients|One recipient|%#R recipients] [? %q |No quarantine|Quarantined as %q] Note that a selector macro call can be considered a form of if-then-else, except that the 'then' and 'else' parts are swapped! ITERATOR in its full form takes three arguments (and ignores any extra arguments after that): [ %x | body-usually-containing-%x | separator ] All iterator's arguments are implicitly quoted, iterator performs its own substitutions on provided arguments, as described below. The result of an iterator call is a body (the second argument) repeated as many times as there are elements in the array denoted by the first argument. In each instance of a body all occurrences of a token %x in the body are replaced with each consecutive element of the array. Resulting body instances are then glued together with a string given as the third argument. The result is finally pushed back to input (active macro call) for possible further expansion. As a somewhat ugly hack (upwards compatible), it is possible to iterate on built-in macros with names longer than one character: [ long-macro-name | body-usually-containing-%x | separator ] This only works in a full three-argument form of iterator call, and the iterator variable name becomes a hard-wired literal x. There are two simplified forms of iterator call: [ body | separator ] or [ body ] where missing separator is considered a null string, and a missing formal argument name is obtained by looking for the first token of the form %x in the body. If there is no formal argument specified (neither explicitly nor in the body), the result is an empty string, which is potentially useful as a null lexical separator. Examples: [%V| ] a space-separated list of virus names [%V|\n] a newline-separated list of virus names [%V| ] same thing: a newline-separated list of virus names [ %V] a list of virus names, each preceded by NL and spaces [ %R |%s --> <%R>|, ] a comma-space separated list of sender/recipient name pairs where recipient is iterated over the list of recipients. (Only the (first) token %x in the first argument is significant, other characters are ignored.) [%V|[%R|%R + %V|, ]|; ] produce all combinations of %R + %V elements A combined example: [? %#C |#|Cc: [<%C>|, ]] [? %#C ||Cc: [<%C>|, ]\n]# ... same thing evaluates to an empty string if there are no elements in the %C array, otherwise it evaluates to a line: Cc: <addr1>, <addr2>, ...\n Whitespace (including newlines) around the first argument %#C of selector call is ignored and can be used for clarity. These all produce the same result: To: [%T|%T|, ] To: [%T|, ] To: %T The '#' removes input characters until and including newline after it. It can be used for clarity to allow newlines be placed in the source text but not resulting in empty lines in the expanded text. In the second example above, a backslash at the end of the line would achieve the same result, although the method is different: \NEWLINE is removed during initial lexical analysis, while # is an internal macro which, when called, actively discards tokens following it, until a NEWLINE (or end of input) is encountered. Further build-in macros are regexp selector and a macro-defining macro. REGEXP SELECTOR macro is somewhat similar to a plain SELECTOR, but it performs a regular expression match on its first argument. The syntax is: [~string|regexp1|then1|regexp2|then2|...|regexpN|thenN|else] It compares string to each regexp in turn, and if a match is found the macro expands to the corresponding 'then' argument, otherwise it expands to the 'else' argument (or empty if the 'else' is missing). For example: [~string|^s.*$|["matches"]|["no match"]] Unlike SELECTOR, arguments are not implicitly quoted, quoting must be explicit if desired (there was no backwards-compatibility need for this newer macro). This is an active macro call, results are pushed back to input for re-evaluation. Tokens %1, %2, ... %9 in arg3 and arg4 are replaced by captured subexpressions (in parenthesis) of a regexp, and %0 is replaced by arg1. DEFINE macro call allows new macros to be dynamically defined: [= name |body] It creates a new macro with a specified name (whitespace around name is trimmed), giving it the argument as a macro body string. No arguments are implicitly quoted, quoting must be explicit (for most uses at least the body string should be quoted in [" ... "] ). Body string may contain tokens %0, %1, %2, ... %9 as formal arguments. These will be substituted with actual arguments (or empty strings for missing arguments) at the time of a call. In most respects these dynamically defined macros behave just like other pre-defined macros. One distinction is that they can only produce a scalar string, they can not produce an array.