Sophie: amavisd-new-2.5.4-1mdv2008.1 noarch

amavisd-new-2.5.4-1mdv2008.1.noarch.rpm

Some random notes and considerations on providing
a SNMP agent for amavisd-new.




Two tasks need to be addressed:
- devising and specifying a content-filtering MIB
- providing access to the MIB (a SNMP agent)


Devising and specifying a content-filtering MIB
-----------------------------------------------

While designing a MIB, the following should be taken as a reference:

- collect ideas from Mail Monitoring MIB (RFC 2789, aka MTA-MIB)

    As amavisd-new behaves in many respects as a true MTA, a
    subquestion is whether a SNMP agent should provide two subtrees:
    one a pure RFC 2789, the other a specialized amavisd-new MIB,
    or would a non-standard specialized amavisd-new MIB suffice?
    Would clients benefit from a MTA-MIB view of amavisd?

- see what variables are currently collected by amavisd-new

    * by running amavisd-agent, or
    * by directly browsing a database by: db_dump -p /var/amavis/db/snmp.db
    * by checking the source for variables which only rarely show;

- a MIB description must be compliant to the SMIv2 (which can be
  automatically converted to SMIv1 if needed).

- consider potential future needs and broader use when designing a
  content-filtering MIB, so that it might be useful to other content
  filtering applications;

The currently provided variables collected and provided by amavisd
are mostly of types Counter32 and octet string. It is quite likely that
the following types will be used in the future: integer, Gauge32, TimeTicks,
and IpAddress. It it unlikely that Counter64 would be needed. The type
of each variable is already provided in a database.

Most variables are simple non-aggregate objects. Besides these there
is a set of counters to be represented as a table, i.e. a table of
encountered virus names and associated counts.

Furthermore there is a set of counters of a form OpsDecType-xxx, where xxx
could be one or more 'short content types', e.g. OpsDecType-audio.mpa.mp3.
It is not obvious what would be a best representation of these.
Some aggregate representation would probaby be desired.

There are currently no variables of type Gauge32, although it is
not hard to imagine that such could be useful in the near future.
An example may be a current size of a quarantine or tempdir area,
It is also easy to imagine that variables of a TimeInterval type
would be useful (e.g. LastOutboundActivity, LastConnectToSQL).

It is unlikely that traps would be added in a near future, and even
less likely that write or create access would be needed for a variable.

It is unclear whether a concept of 'groups' from MTA-MIB would be useful
in the context of amavisd-new. Perhaps considering groups like (spam, virus,
etc.) or groups according to a policy bank path selected? Would that be
too far from the spirit of MTA-MIB groups?

The view of amavisd-new variables should represent a running instance
of amavisd-new as a single entity, aggregating information from all
child processes into one set of variables. This aggregation is already
provided by using a Berkeley db, so this has already been taken care of.
This need for aggregation is also one of the reasons why a SNMP agent
or a subagent can not be implemented in each child process.

When amavisd is restarted, all counters in a database are reset, and the
sysUpTime timeticks variable is also reset. This may be shown to SNMP
clients unchanged, or a SNMP agent may decide to implement cumulative
counters which survive across daemon restarts (proper unsigned wraparounds
would need to be done correctly). If the agent also provides other MIBs
of the system and supplies a 'system' MIB reflecting the machine uptime,
it may be more logical to hide amavisd restarts to clients unless when
this specific information is being sought.


A SNMP agent - providing SNMP access to the MIB
-----------------------------------------------

Writing a SNMP agent from scratch would be a nightmare. SNMP is a can of
worms when one looks behind simple variable associations and descends
into details. As far as I'm aware there isn't any Perl implementation
or significant support for writing a SNMP agent (although there are several
modules to support client-side). It also appears that the only noteworthy
community efforts are concentrated around the Net-SNMP project, which has
good and active tradition and offers a support for a range of Unix platforms.

Net-SNMP
  http://www.net-snmp.org/
  http://www.net-snmp.org/docs/FAQ.html

It supports the SNMPv1, SNMPv2c and SNMPv3, which is what we need. It
provides a library, a set of tools, a client and an extensible agent.

Extending an existing agent (compared to writing one from scratch) hides
us from the intricacies of the protocol and differences in its versions,
access controls, MIB views, BER encodings, ...

When deciding what mechanism should be used to extent the Net-SNMP agent
a need to decouple Berkeley db accesses from SNMP client requests must be
kept in view. Access to amavisd-new Berkeley database snmp.db requires
careful attention to locking: on accessing a database a cursor lock must
be requested, then whatever access is needed must be done fairly quickly
and in bulk, then lock must be released. Locks must be relatively infrequent
and not at a mercy of a frequency of SNMP client requests. Failing to do so
can result in lock contentions between amavisd-new child processes and the
agent and could slow down or block mail processing. Example of a proper
way to access the database should be seen in the 'amavisd-agent' sample
utility, including a proper way of masking signals while holding a lock.

Net-SNMP offers several mechanisms to extend its SNMP agent:
- calling external scripts;
- adding static C code;
- loading dynamic modules (dlmod);
- embedded perl;
- SMUX (RFC 1227)
- Agent Extensibility (AgentX) Protocol (RFC 2741)

Calling scripts is only useful for simple tasks. Adding static C code and
recompiling is too impractical. Loading dynamic modules is more promising,
although I see a drawback with ensuring the above mentioned decoupling
between db access and SNMP requests. Embedded perl sounds nice but suffers
from a similar problem, and is at the moment rather experimental.
Nevertheless, the decoupling could be achieved by forking a subprocess and
separating the two tasks. Multi-threading in Net-SNMP is not recommended.

It seems that separating SNMP agent from the process which is accessing
the amavisd-new database is needed one way or another. Besides forking
a subprocess and implementing some kind of IPC, there are already two
standard mechanisms provided: SMUX (RFC 1227) and AgentX (RFC 2741), of
which the RFC 1227 has a status of historic, so what remains is AgentX.

The idea is to split SNMP agent into a master agent, dealing with clients
and the SNMP protocol, and one or more subagents, which are only concerned
with implementing a MIB (binds OIDs within registered MIB to amavisd-new
variables accessed through Berkeley database) and talking a simpler
protocol to the master agent, while being shielded from dangers and
intricacies of the 'real world'. A subagent is not concerned with BER
encodings, SNMP protocol, views, access controls, etc.

There is a wealth of documentation and examples on the subject.
Here are the more relevant documents:

  RFC 2741 (includes introductory material)
  http://www.net-snmp.org/tutorial/tutorial-5/
  http://www.net-snmp.org/tutorial/tutorial-5/demon/
  http://www.net-snmp.org/tutorial/tutorial-5/toolkit/demon/
  AGENT.txt
  README.agentx
  http://www.scguild.com/agentx/  - IETF Agentx Working Group 


Caching, consistent view of variables
-------------------------------------

When deciding when to access the amavisd-new snmp.d database the following
should be considered:

SNMP requests for reading variables are often independent, possibly in
short bursts. The agent has no control on how often and in what manner
clients poll the MIB. There may also be one or several independent
management stations, each doing its polling schedule. It could be
unacceptable if each SNMP read request would require its independent
lock/read/unlock access to a database. It is more efficient to access
the database on controlled intervals and collect the whole set of variables
within a single lock/unlock. This also insures that all variable values
are consistent, as the amavisd daemon goes to great lengths to only do
atomic updates and always maintain the database in a consistent state,
reflecting state before/after each message processing, and never let
clients view an intermediate state.

The following strategy seems appropriate:
- when short interval (say ten or twenty seconds) has passed since the last
  access to the database, agent supplies the requester a cached view;
- when cached data expires, the next requests causes the database
  reading to be done, refreshing the whole cached MIB.

This also gives a SNMP client a possibility to obtain a consistent snapshot
of a databases despite using a stateless protocol and sending independent
stream of get and get-next requests. The Net-SMTP does provide support
for caching, and it remains to be checked whether the supplied mechanism
is appropriate for our needs or has to be re-implemented.