Sophie: mirrorbrain-2.17.0-1.mga4 x86

mirrorbrain-2.17.0-1.mga4.x86_64.rpm


.. _initial_configuration:

Initial configuration steps on all platforms
============================================


.. _initial_configuration_file_tree:

Creating a file tree
--------------------

If you haven't got a file tree yet, you should create it now.

Make the directory for the file tree and fill it::

  mkdir /srv/mytree
  rsync .... /srv/mytree

Note that this file tree is a necessary prerequisite to running MirrorBrain,
even though it intercepts the requests to those files and redirects them to
mirrors.

Having said that, there *is* a way to get by without local files, which is by
using the :program:`null-rsync` (found `in the source tree 
<http://svn.mirrorbrain.org/viewvc/mirrorbrain/trunk/tools/null-rsync?view=markup>`_) 
tool instead of
:program:`rsync` to pull the files. :program:`null-rsync` is used exactly as
rsync, but it will create a pseudo file tree that requires very few local
space. However, since those files are filled with zeroes (!), it is important
to make sure that MirrorBrain *never* delivers content from those files. That
is achieved by using the ``MirrorBrainFallback`` directive to define some
mirrors that are *always* available and are guaranteed to have *all* those
files. (The directive can be configured individually per directory in Apache
config.) See the `2.11.0 release notes`_ for details.

Note that if you *do* have the *real* files locally, you can automatically maintain
cryptographic hashes of them in the database; running with pseudo files cuts on
some very useful features. In addition, the local files are always available to
deliver them directly, which is a good fallback behaviour for files that are
not mirrored at all, files that have not arrived on any mirror just yet, and so
on. Of course, you can also make sure that files are never delivered from the
redirector (in other words, it redirects always).

.. note::
   In summary: a tree with real files is required, if you want to serve any
   hashes, zsync, or torrents. But you can make sure that the content is always
   redirected.  The "fake tree" that you can create with null-rsync is good
   *only* for pure redirection. (And Metalinks without hashes.) The server
   doesn't know any content then; only file path, size, mtime, nothing else.

.. _`2.11.0 release notes`: http://mirrorbrain.org/docs/changes/#release-2-11-0-r7896-dec-2-2009



Creating mirrorbrain.conf
-------------------------

Create a configuration file named :file:`/etc/mirrorbrain.conf` with the content below::

    [general]
    instances = main
    
    [main]
    dbuser = mirrorbrain
    dbpass = 12345
    dbdriver = postgresql
    dbhost = 127.0.0.1
    # optional: dbport = ...
    dbname = mirrorbrain
    
    [mirrorprobe]
    # logfile = /var/log/mirrorbrain/mirrorprobe.log
    # loglevel = INFO
  

.. note::
   The database password in the above template is only a placeholder and you
   need to edit it: change it to the actual password, the one that you gave
   when you ran PostgreSQL's :program:`createuser` command. Likewise, make sure
   that you picked the same username.

Set the following permissions and privileges on the file::

  sudo chmod 0640 /etc/mirrorbrain.conf 
  sudo chown root:mirrorbrain /etc/mirrorbrain.conf



Other possible options per MirrorBrain instance are:

.. describe:: scan_top_include

   Directory names separated by spaces. Meaning: Scan only these directories,
   and ignore all other directories at the top level.

.. describe:: scan_exclude_rsync

   Exclude list for rsync scans (same rules as for rsyncs option ``--exclude``
   apply). Meaning: Ignore all directories or path names that match, everywhere
   in the tree.

.. describe:: scan_exclude

   Exclude list for FTP scans. Meaning: Ignore all directories or path names
   that match, everywhere in the tree.




Testing the database admin tool
-------------------------------

At this point, you should be able to type the following command  without getting an
error::

  mb list
  
It'll produce no output, but exit with 0. If it gives an error, something is
wrong.

.. note:: 
   Do this to verify that the previous steps have been completed successfully.

Likewise, the following command should not return any error, but rather
displays its usage info. If so, the installation should be quite fine::

  mb help


Also, the following should work (you might have to change the path to 
:file:`/usr/share/GeoIP` for your system)::

   % geoiplookup_continent -f /var/lib/GeoIP/GeoIP.dat www.slashdot.org
  NA

The ``NA`` stands for North America and indicates that the GeoIP lookup works
correctly.


Creating some mirrors
---------------------

Collect a list of mirrors (their HTTP baseurl, and their rsync or FTP baseurl
for scanning). For example::

  http://ftp.isr.ist.utl.pt/pub/MIRRORS/ftp.suse.com/projects/
  rsync://ftp.isr.ist.utl.pt/suse/projects/

  http://ftp.kddilabs.jp/Linux/distributions/ftp.suse.com/projects/
  rsync://ftp.kddilabs.jp/suse/projects/



Now you need to enter the mirrors into the database; it could be done using the
"mb" mirrorbrain tool. (See 'mb help new' for full option list.)::

  mb new ftp.isr.ist.utl.pt \
         --http http://ftp.isr.ist.utl.pt/pub/MIRRORS/ftp.suse.com/projects/ \
         --rsync rsync://ftp.isr.ist.utl.pt/suse/projects/

  mb new ftp.kddilabs.jp \
         --http http://ftp.kddilabs.jp/Linux/distributions/ftp.suse.com/projects/ \
         --rsync rsync://ftp.kddilabs.jp/suse/projects/



The tool automatically figures out the GeoIP location of each mirror by itself.
But you could also specify them on the commandline.

If you want to edit a mirror later, use::

  mb edit <identifier>

To simply display a mirror, you could use 'mb show kddi', for instance.

Finally, each mirror needs to be scanned and enabled::

  mb scan --enable <identifier>


See the output of :program:`mb help` for more commands. Refer to
:ref:`maintaining_the_mirror_database` for a full reference documentation to
the :program:`mb` tool.



Setting up required cron jobs
-----------------------------

Setting up mirror monitoring
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Mirror monitoring needs to be set up to run automatically. Put this into
:file:`/etc/crontab`:

The following cron job is needed to check which mirrors are reachable. This
command is responsible for checking the mirrors in short intervals, and marking
them online/offline in the database::

  * * * * *                 mirrorbrain   mirrorprobe

Setting up mirror scanning
~~~~~~~~~~~~~~~~~~~~~~~~~~

Configure mirror scanning::

  45 * * * *                mirrorbrain   mb scan --quiet --jobs 4 --all

Use more parallel scanners (``-j|--jobs ...``) if you have a beefy machine.

The ``--quiet`` option can be used twice (e.g. as ``-qq``), which will totally
silence the scanner, except for error messages. This means that you get a mail
only when there is something wrong.


Maintenance
~~~~~~~~~~~

Another cron job is useful to remove unreferenced files from the database::

  # Monday: database clean-up day...
  30 1 * * mon              mirrorbrain   mb db vacuum


Keeping the GeoIP database uptodate
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The GeoIP database is changed at least once a month, so a new copy should be
downloaded regularly::

  # update GeoIP database on Mondays
  31 2 * * mon              root  sleep $(($RANDOM/1024)); /usr/bin/geoip-lite-update

(The 'sleep' is there so you can copy the line, don't need to adjust the time,
and still the GeoIP servers will not get a lot of simultaneous hits at exactly
the same time. That's all.)



Testing
-------

TODO: describe how to test that the install was successful
    (When testing, consider any excludes that you configured, and which might
    introduce confusion.)

* Many HTTP clients can be used for testing, but `cURL`_ is a most helpful tool
  for that. Here are some examples.

  Showy the HTTP response code and the Location header pointing to the new location::

    curl -sI <url>

  Display the metalink::

    curl -s <url>.metalink

  Show a HTML list with the available mirrors::

    curl -s <url>?mirrorlist

.. _`cURL`: http://curl.haxx.se/


.. _initial_configuration_logging_setup:

Setting up logging
------------------

You may want to log more details than Apache normally logs into the access_log
file. You can define a new log format that gives you an access_log, with
details from MirrorBrain added::

  LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \
  want:%{WANT}e give:%{GIVE}e r:%{MB_REALM}e %{X-MirrorBrain-Mirror}o \
  %{MB_CONTINENT_CODE}e:%{MB_COUNTRY_CODE}e ASN:%{ASN}e P:%{PFX}e \
  size:%{MB_FILESIZE}e %{Range}i" combined_redirect



This defines a new log format called "combined_redirect", which you can use in
your virtual hosts with the CustomLog directive. 

Instead of::

  CustomLog /var/log/apache2/myhost/access_log combined

you would use::

  CustomLog /var/log/apache2/myhost/access_log combined_redirect

.. TODO: describe a good logging setup with cronolog



.. _creating_hashes:

Creating hashes
---------------

First, add some configuration::

  MirrorBrainMetalinkPublisher "openSUSE" http://download.opensuse.org

You need to create a directory where to store the hashes. For instance,
:file:`/srv/hashes/srv/opensuse`. Note that the full pathname to the filetree
(``/srv/opensuse``) is part of this target path.
      
Make the directory owned by the ``mirrorbrain`` user.

Now, create the hashes with the following command. It is best run as
unprivileged user (``mirrorbrain``)::

  mb makehashes /srv/opensuse -t /srv/hashes/srv/opensuse

Add the hashing command to /etc/crontab to be run every few hours.
Alternatively, run it after changes in the file tree happen, coupled to some
trigger etc.

(This command was called ``metalink-hasher`` in previous releases of
MirrorBrain.)

.. TODO: show how to run this command (and others) under withlock


Optional things you might want
------------------------------

* further things that you might want to configure:

  * mod_autoindex_mb, a replacement for the standard module mod_autoindex::

      a2dismod autoindex
      a2enmod autoindex_mb
      Add IndexOptions Metalink Mirrorlist
      # or IndexOptions +Metalink +Mirrorlist, depending on your config

  * add a link to a CSS stylesheet for mirror lists::

      MirrorBrainMirrorlistStylesheet "http://static.opensuse.org/css/mirrorbrain.css"

    and for the autoindex::

      IndexStyleSheet "http://static.opensuse.org/css/mirrorbrain.css"



Configuring GeoIP
-----------------

.. note::
   It is better to use the larger `GeoLiteCity
   <http://www.maxmind.com/app/geolitecity>`_ database, instead of the minimal
   GeoIP database that contains only country information. With the more
   detailed info in the former database, a better mirror selection is achieved
   in many cases.

Edit /etc/apache2/conf.d/mod_geoip.conf::

  <IfModule mod_geoip.c>
     GeoIPEnable On
     GeoIPDBFile /var/lib/GeoIP/GeoLiteCity.dat.updated
     #GeoIPOutput [Notes|Env|All]
     GeoIPOutput Env
  </IfModule>

(Change GeoIPOutput All to GeoIPOutput Env)

Note that a caching mode like MMapCache needs to be used, when Apache runs with
the worker MPM.In this case, use::

  <IfModule mod_geoip.c>
     GeoIPEnable On
     GeoIPDBFile /var/lib/GeoIP/GeoLiteCity.dat.updated MMapCache
     GeoIPOutput Env
  </IfModule>



.. configure GeoIP database updates

Seting up automatic updates of the GeoIP database
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

New versions of the GeoIP database are released each month. You can set up a
cron job to automatically fetch new updates as follows. If you do that, make
sure to set the GeoIPDBFile path (see above) to
:file:`/var/lib/GeoIP/GeoLiteCity.dat.updated`::

  # update GeoIP database on Mondays
  31 2 * * mon   root    sleep $(($RANDOM/1024)); /usr/bin/geoip-lite-update




Creating a virtual host
-----------------------

Maybe create a DNS alias for your web host, if needed.

.. note::
   A complete reference of all Apache directives can be found `here
   <http://svn.mirrorbrain.org/viewvc/mirrorbrain/trunk/mod_mirrorbrain/mod_mirrorbrain.conf?view=markup>`_.

The following snippet would create a new site as virtual host::

  sudo sh -c "cat > /etc/apache2/sites-available/mirrorbrain << EOF
  <VirtualHost 127.0.0.1>
      ServerName mirrors.example.org
      ServerAdmin webmaster@example.org
      DocumentRoot /var/www/downloads
      ErrorLog     /var/log/apache2/mirrors.example.org/error.log
      CustomLog    /var/log/apache2/mirrors.example.org/access.log combined
      <Directory /var/www/downloads>
          MirrorBrainEngine On
          MirrorBrainDebug Off
          FormGET On
          MirrorBrainHandleHEADRequestLocally Off
          MirrorBrainMinSize 2048
          MirrorBrainExcludeUserAgent rpm/4.4.2*
          MirrorBrainExcludeUserAgent *APT-HTTP*
          MirrorBrainExcludeMimeType application/pgp-keys
          Options FollowSymLinks Indexes
          AllowOverride None
          Order allow,deny
          Allow from all
      </Directory>
  </VirtualHost>
  EOF
  "

Another example::

  <VirtualHost your.host.name:80>
      ServerName samba.mirrorbrain.org
  
      ServerAdmin webmaster@example.org
  
      DocumentRoot /srv/samba/pub/projects
  
      ErrorLog     /var/log/apache/samba.mirrorbrain.org/logs/error_log
      CustomLog    /var/log/apache/samba.mirrorbrain.org/logs/access_log combined

      <Directory /srv/samba/pub/projects>
          MirrorBrainEngine On
          MirrorBrainDebug Off
          FormGET On
          MirrorBrainHandleHEADRequestLocally Off
          MirrorBrainMinSize 2048
          MirrorBrainExcludeUserAgent rpm/4.4.2*
          MirrorBrainExcludeUserAgent *APT-HTTP*
          MirrorBrainExcludeMimeType application/pgp-keys

          Options FollowSymLinks Indexes
          AllowOverride None
          Order allow,deny
          Allow from all
      </Directory>
  
  </VirtualHost>



Make the log directory for the virtual host::

  sudo mkdir /var/log/apache2/mirrors.example.org/


Enable the site::

  sudo a2ensite mirrorbrain

Restart Apache, best while watching the error log::

  sudo tail -f /var/log/apache2/error.log &
  sudo /etc/init.d/apache2 restart