Sophie

Sophie

distrib > Mandriva > 2007.1 > x86_64 > media > contrib-release > by-pkgid > 3d0d0177db421ffde0b64948d214366a > files > 101

polyxmass-doc-0.9.0-1mdv2007.0.noarch.rpm

\chapter[\pxmcommon]{\pxmcommon: The Configuration and Data Files
  Hierarchy}
\label{chap:polyxmass-common}

The \pxm\ software suite is designed to be compatible with any polymer
chemistry that the user cares to define.  To be that flexible, \pxm
has to be able to store polymer chemistry definition files and related
data files in a very clearly-designed ``file-system''. The structure
of this ``file-system'' is what this chapter is all about.

When the user defines a polymer chemistry definition (typically using
\pxd), the contents of that definition are saved in a text file (using
the \fileformat{xml} format). Once a polymer chemistry definition is
saved and registered in the \pxm system, the user can create a new
polymer sequence of that polymer chemistry:\footnote{Editing of
  polymer sequences is typically performed in \pxe.} when entering
monomer codes at the keyboard, the user sees monomer icons (small
graphical images) being displayed in the sequence editor (we call that
process the ``graphical rendering'' of the polymer sequence).  So,
here are a number of typical questions:

\begin{itemize}
\item Where is defined the correspondence between any monomer code (as
  keyed-in during a polymer sequence editing session) and the monomer
  icon\footnote{We call that icon a ``monicon.''} that is displayed in
  the \pxe sequence editor?
\item Where are located all the graphics files that are used to
  graphically render a sequence in the editor? And the sound files
  used to let a polymer sequence speak itself out?
\item Where is located the atom definition that should be used with
  this specific polymer chemistry definition, and were is defined the
  correspondence between a polymer chemistry and the atom definition
  file that is required?
\end{itemize}

\noindent Within \pxm, there are two different kinds of
data/configuration data files:

\begin{itemize}
\item Compulsory data/configuration files that \emph{must} be on the
  system at precise locations whatever the chemical definitions being
  used on that system;
\item Optional data/configuration files that are installed by users
  (or system administrators) so as to comply with requirements
  specific of each installation.\footnote{For example, a synthetic
    polymer lab will almost certainly not install data packages about
    proteins or nucleic acids, while a biochemistry lab will almost
    certainly not install packages about poly-methylmetacrylate\dots}
\end{itemize}

\noindent The present chapter is about the first item in the above
bulleted list: compulsory data/configuration files that are all
shipped with the \pxmcommon package. We will review the locations
where data/configuration files are installed and the mechanics that
make \pxm work on any kind of polymer chemistry.

\renewcommand{\sectitle}{Overview Of The Files Installed}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

Let us first review all the files that are installed by the \pxmcommon
package:

\begin{alltt}
  \begin{footnotesize}
    /usr/share/polyxmass/atom-defs/atoms.xml

    /usr/share/polyxmass/polchem-defs/protein
    /usr/share/polyxmass/polchem-defs/protein/acetyl.png
    /usr/share/polyxmass/polchem-defs/protein/acetyl.svg
    /usr/share/polyxmass/polchem-defs/protein/acetyl-text.svg
    /usr/share/polyxmass/polchem-defs/protein/alanine.png
    /usr/share/polyxmass/polchem-defs/protein/alanine.svg
    /usr/share/polyxmass/polchem-defs/protein/alanine-text.svg

    /usr/share/polyxmass/polchem-defs/protein/monicons.dic
    /usr/share/polyxmass/polchem-defs/protein/sounds/alanine.ogg
    /usr/share/polyxmass/polchem-defs/protein/sounds/a.ogg
    /usr/share/polyxmass/polchem-defs/protein/sounds/methyl.ogg
    /usr/share/polyxmass/polchem-defs/protein/sounds/sounds.dic

    /usr/share/polyxmass/polchem-defs/protein/chempad.conf

    /usr/share/polyxmass/polchem-defs/protein/acidobasic.xml

    /usr/share/polyxmass/polchem-defs/protein/cursor.svg

    /usr/share/polyxmass/polchem-defs/protein/protein.xml
    /usr/share/polyxmass/polchem-defs/protein/peptide.xml

    /usr/share/polyxmass/pol-seqs/protein-sample.pxm
    /usr/share/polyxmass/pol-seqs/protein-fragments-sample.pxm

    /etc/polyxmass/atom-defs/polyxmass-common-atom-defs-cat
    /etc/polyxmass/polchem-defs/polyxmass-common-polchem-defs-cat
    /etc/polyxmass/polchem-defs/polyxmass-common-polchem-defs-atom-defs-dic

    /etc/polyxmass/chempad.conf

    /usr/share/doc/polyxmass-common/AUTHORS
    /usr/share/doc/polyxmass-common/COPYING
    /usr/share/doc/polyxmass-common/INSTALL
    /usr/share/doc/polyxmass-common/NEWS
    /usr/share/doc/polyxmass-common/README
    /usr/share/doc/polyxmass-common/TODO
    /usr/share/doc/polyxmass-common/THANKS

    /usr/share/man/man7/polyxmass-common.7

    /usr/lib/pkgconfig/polyxmass-common.pc
  \end{footnotesize}
\end{alltt}

All the text above is the output (edited for clarity) of the
\command{make install} command that is performed to install the
\pxmcommon package on the system. It is taken for granted that the
user did not change the \option{---sysconfdir=/etc} option to the
\command{configure} script and that he passed the following option to
that same \command{configure} script: \option{---prefix=/usr}. If the
\pxmcommon package is installed as a binary package, then the user
needs not worry: the packager did choose the best installation
directories.  Let us review each file that is installed one by one,
telling what it is meant for:

\begin{itemize}

\item Files located in \filename{/etc/polyxmass}:
  
  \begin{itemize}
    
  \item \filename{atom-defs/polyxmass-common-atom-defs-cat}
    {\footnotesize This file is the catalog file corresponding to the
      \pxmcommon package. It contains the list of the atom definition
      files that are brought by the \pxmcommon package.}
    
  \item \filename{polchem-defs/polyxmass-common-polchem-defs-cat}
    {\footnotesize This file is the catalog file corresponding to the
      \pxmcommon package. It contains the list of the polymer
      chemistry definition files that are brought by the \pxmcommon
      package.}
    
  \item
    \filename{polchem-defs/polyxmass-common-polchem-defs-atom-defs-dic}
    {\footnotesize This file is the dictionary file corresponding to
      the \pxmcommon package. It contains the relations between each
      polymer chemistry definition file shipped with the package and
      its cognate atom definition file.}

  \item \filename{chempad.conf} {\footnotesize This file describes the
      layout of the chemical pad of the \pxc module, in case the
      polymer chemistry definition does not have one and the user does
      not have one neither. This file can thus be called the
      ``default'' layout definition file for the \pxc's chemical pad.}

  \end{itemize}

\item Files located in \filename{/usr/share/polyxmass}:
  
  \begin{itemize}
    
  \item \filename{polyxmass/atom-defs/atoms.xml} {\footnotesize This
      file is the \emph{``basic''} atom definition file. The \pxm
      software suite mandates that one atom definition file be present
      in the system.}
    
  \item \filename{polyxmass/polchem-defs/protein/acetyl.png}
    {\footnotesize This is one of the raster files that are used in
      the polymer chemistry definition to graphically render the
      ``Acetylation'' chemical modification. Note that also installed
      is a file by the same name but with extension \filename{.svg},
      instead of \filename{.png}. This file is a scalar vector
      graphics version from which the \filename{.png} file was
      generated.}
    
  \item \filename{polyxmass/polchem-defs/protein/alanine.png}
    {\footnotesize One of the graphics files that are used to render
      graphically the monomers defined in the polymer chemistry
      definition (in this case the monomer is ``alanine''). Same remark
      as above for the \filename{.svg} extension file.}
    
  \item \filename{polyxmass/polchem-defs/protein/chempad.conf}
    {\footnotesize This file describes the layout of the chemical pad
      of the \pxc module. Each polymer chemistry definition might have
      a \filename{chempad.conf} file associated to it. This file is
      optional.}
    
  \item \filename{polyxmass/polchem-defs/protein/acidobasic.xml}
    {\footnotesize This file describes the chemistry of all the
      monomers and modifications in the polymer chemistry definition
      that might bring charges. The data contained in this file are
      used by the functions that compute either the charge level of a
      polymer sequence at a given pH value, or the isoelectric point
      of a polymer sequence (that is the pH value at which the net
      charge of the protein is near zero).}
    
  \item \filename{polyxmass/polchem-defs/protein/monicons.dic}
    {\footnotesize This file is the one that lists the correspondence
      between the monomer codes/modifications and the files used to
      render the monomers/modifications graphically in the sequence
      editor. The lines in this file look like:\\
      \verb+monomer;A=alanine.svg|alanine.png+ for a monomer, and like:\\
      \verb+modif;Phosphorylation%T%phospho.svg|phospho.png+ for a
      modification.  The latter line indicates that when a monomer is
      modified using the ``Phosphorylation'' modification, the
      to-be-modified monomer icon get modified by transparently
      pasting onto it the monicon contained in the file
      \filename{phospho.svg} (see the
      \verb|%T%|).}
    
    \item \filename{polyxmass/polchem-defs/protein/sounds/sounds.dic}
      {\footnotesize This file is the one that lists the
        correspondence between the monomer codes (or names) and their
        corresponding sound files. The same is true for modifications.
        The file contains lines in the form:\\
        \verb+monomer;Y=tyrosine.ogg|y.ogg+ for monomers and in the
        form:\\ \verb+modif;Phosphorylation=phospho.ogg+ for
        modifications. The ``monomer;'' line indicates that the
        monomer `Y' has its name vocalized in the
        \filename{tyrosine.ogg} file, while its monomer is vocalized
        in the \filename{y.ogg} file. The ``modif;'' line indicates
        that that the ``Phosphorylation'' modification is vocalized in
        the \filename{phospho.ogg} file.}
    
  \item \filename{polyxmass/polchem-defs/protein/cursor.svg}
    {\footnotesize This file is a graphics file that describes how the
      cursor should be rendered graphically in the polymer sequence
      editor. Each polymer chemistry definition must provide this
      file.}
    
  \item \filename{polyxmass/polchem-defs/protein/protein.xml}
    {\footnotesize This is the actual polymer chemistry definition
      file.  This file is a text file formatted according to the
      \fileformat{xml} standard. It contains a description of all the
      chemical entities that make up the polymer chemistry
      definition.}
    
  \item \filename{polyxmass/pol-seqs/protein-sample.pxm}
    {\footnotesize This is an example polymer sequence file. It can be
      used by the user to learn how to use the \pxe module. This
      protein sequence is of polymer chemistry definition ``protein'',
      that is defined in the file that we described above
      (\filename{protein.xml}).}
    
  \item \filename{man/man7/polyxmass-common.7} {\footnotesize This
      file is the manual page that accompanies the \pxmcommon
      package.}
    
  \end{itemize}
  
\item Files located in \filename{/usr/lib}:
  
  \begin{itemize}
    
  \item \filename{pkgconfig/polyxmass-common.pc} {\footnotesize This
      file is the \software{pkg-config} configuration file that will
      allow other packages to check if \pxmcommon is installed
      correctly and what is its version.}

  \end{itemize}

  % \item \filename{}
  %   {\footnotesize .}

\end{itemize}

\renewcommand{\sectitle}{Detailed Explanations About Installed Files}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

\noindent Now that we have an overview of what each one of the files
that get installed does, we may want to take a closer look at some of
the files that were listed above.


\subsection*{File polyxmass-common-atom-defs-cat}

Each package that brings atom definition files, should list ---in a
similar file (ending with the \filename{atom-defs-cat} suffix)--- all
the atom definitions that are made available to the system.  The
\pxmcommon package installs

\filename{polyxmass-common-atom-defs-cat} in

\filename{/etc/polyxmass/atom-defs}. Its contents are:

\begin{mynoindent}
  \begin{alltt}
    basic=/usr/share/polyxmass/atom-defs/atoms.xml
  \end{alltt}
\end{mynoindent}

Thus, we see that \pxmcommon brings one atom definition file
(\filename{atoms.xml}), installed in
\filename{/usr/share/polyxmass/atom-defs} and made available to the
\pxm system under the name ``basic''. This latter name is the one used
by polymer definitions to specify with what atom definition file they
should be working. We'll see that later.


\subsection*{File polyxmass-common-polchem-defs-cat}

Each package that brings polymer chemistry definition files, should
list ---in a similar file (ending with the \filename{polchem-defs-cat}
suffix)--- all the polymer chemistry definitions that are made
available to the system. The \pxmcommon package installs

\filename{polyxmass-common-polchem-defs-cat} in

\filename{/etc/polyxmass/polchem-defs}. Its contents are (each polymer
chemistry definition name and its corresponding data \emph{must} be on
a single line without space; here, for clarity the line was broken, as
symbolised with the ``\verb|\\|'' characters that are absent in the
file):

\begin{mynoindent}
  \begin{alltt}
    protein=/usr/share/polyxmass/polchem-defs/protein/protein.xml\verb|\\|
    \%/usr/share/polyxmass/polchem-defs/protein
    peptide=/usr/share/polyxmass/polchem-defs/protein/peptide.xml\verb|\\|
    \%/usr/share/polyxmass/polchem-defs/protein
  \end{alltt}
\end{mynoindent}

Thus, we see that \pxmcommon brings two polymer chemistry definition
files (\filename{protein.xml} and \filename{peptide.xml}), installed
in \filename{/usr/share/polyxmass/polchem-defs} and made available to
the \pxm system under the names ``protein'' and ``peptide'',
respectively.  As can be seen by the example above, the polymer
chemistry definition file names are absolute file names (that means
that they are preceded by the whole path leading to the file in
question) and are separated ---by a
\verb|%| character--- from the absolute name of the directory where
      %all their
corresponding data reside.

Thus, the ``protein'' polymer chemistry definition file is
\filename{protein.xml}, that is located at

\begin{mynoindent}
  \filename{/usr/share/polyxmass/polchem-defs}.
\end{mynoindent}

The directory where all the ``protein'' polymer chemistry
definition-related data are located is:

\begin{mynoindent}
  \filename{/usr/share/polyxmass/polchem-defs/protein}.
\end{mynoindent}

We will see later how this catalogue file is used, in order to create
a main catalogue file that is use to read the proper polymer chemistry
definition file that is needed when the user asks, for example, that a
``protein'' sequence be displayed in \pxe.


\subsection*{File polyxmass-common-polchem-defs-atom-defs-dic}

Each package that brings polymer chemistry definition files, should
list ---in a similar file (ending with the
\filename{polchem-defs-atom-defs-dic} suffix)--- all the relations
that govern the use of a determinate atom definition file by any given
polymer chemistry definition.  The \pxmcommon package installs

\filename{polyxmass-common-polchem-defs-atom-defs-dic} in

\filename{/etc/polyxmass/polchem-defs}. Its contents are:

\begin{mynoindent}
  \begin{alltt}
    protein=basic
    peptide=basic
  \end{alltt}
\end{mynoindent}

\noindent The first line of this file stipulates that
---\textsl{``When working on polymer sequences of polymer chemistry
  ``protein'', the atom definition to be used is the one having name
  ``basic.''}  Since the system knows what actual file corresponds to
the atom definition ``basic'', as we already have seen above, it is
not difficult to load that specific atom definition file from disk.

\subsection*{File chempad.conf}

This file is responsible for governing the chemical pad layout in the
\pxc module. Each polymer chemistry definition may have one such file
in its directory (for example, for the ``protein'' polymer chemistry
definition, we have the

\filename{/usr/share/polyxmass/polchem-defs/protein/chempad.conf}
file.

When a polymer chemistry definition with no \filename{chempad.conf} is
used in \pxc, the program automatically tries to read the file from
the user's \filename{.polyxmass/chempad.conf} file. If that user's
file is not found, the last resort is to read the

\filename{/etc/polyxmass/chempad.conf} file.

This file contains lines like the following:

\begin{mynoindent}
  \begin{alltt}
    chempad_columns\$3
    
    chempadkey=protonate\%+H1\%adds a proton
    chempadkey=hydrate\%+H2O1\%adds a water molecule
    chempadkey=0H-ylate\%+O1H1\%adds an hydroxyl group
    chempadkey=acetylate\%-H1+C2H3O1\%adds an acetyl group
    chempadkey=protonate\%+H1\%adds a proton
    chempadkey=hydrate\%+H2O1\%adds a water molecule
  \end{alltt}
\end{mynoindent}

The first line tells that the chemical pad buttons should be laid out
in three columns. Each following line configures one button, that will
sit on the chemical pad. Thus, the syntax of a line is the following:

\verb|chempadkey=button_label%action-formula%button_tooltip|

\medskip
\noindent The first button-defining line, for example, configures the
creation of a button with the label \guival{``protonate''} which
---when mouse-clicked--- will elicit the addition of the contents of
the action-formula \guival{``+H1''} in the \pxc module. The string
\guival{``adds a proton''} is the text that will appear as a tooltip
when the mouse cursor sits on the button.


\subsection*{File acidobasic.xml}

This file contains all the pKa data about all the different chemical
groups beared either by monomers or modifications defined in the
polymer chemistry definition. This file is used when computations
about net charges of polymer sequence at a given pH value are asked.
Also, this file is used when isoelectric point calculations are
performed. See section~\vref{sect:acido-basic-calculations}.


\subsection*{File monicons.dic}

This file is obligatory for each polymer chemistry definition. So, for
our example of the ``protein'' polymer chemistry, it would be found in
that polymer chemistry definition directory:
\filename{/usr/share/polyxmass/polchem-defs/protein/monicons.dic}.

See below for detailed explanations of its contents.

\subsection*{File atoms.xml}

This file, that is located at

\filename{/usr/share/polyxmass/atom-defs},

is obligatory for \pxm to operate normally. Indeed, if there were no
atom definitions, we would be in trouble to compute masses for any
chemical entity that is represented by its formula (or
action-formula). There might be other atom definitions files, located
in that same directory, but with other names. As we have seen above,
there is one atom definition, called ``basic'', that is used by the
``protein'' and ``peptide'' polymer chemistry definitions. This
``basic'' atom definition is actually this \filename{atoms.xml} file.

In more details: this file is an atom definition file, where atoms are
defined by defining their individual data. An atom is the resultant of
the isotope(s) that it is comprised of. Some atoms only have one
isotope, other atoms have as much as seven or eight different
isotopes. An isotope is characterized by its mass and its abundance.
Hence, the structure of an atom definition, in this file:

\begin{mynoindent}
  \begin{alltt}
  <atom>
    <name>Carbon</name>
    <symbol>C</symbol>
    <isotope>
      <mass>12.0000000000</mass>
      <abund>98.9300000000</abund>
    </isotope>
    <isotope>
      <mass>13.0033548390</mass>
      <abund>1.0700000000</abund>
    </isotope>
  </atom>
  \end{alltt}
\end{mynoindent}

\noindent There might be as many such atom definitions ---in this atom
definition file--- as required for the polymer chemistry definition
with which it is to be used. Indeed, we already have mentioned that
any polymer chemistry definition must specify the atom definition with
which it must work specifically for things to behave properly (that
association is specified in the

\filename{polyxmass-common-polchem-defs-atom-defs.dic} file (for the
polymer chemistry definitions brought by the \pxmcommon package; see
above).


\subsection*{Directory protein}

This directory is the directory where the example ``protein'' polymer
chemistry definition data are located
(\filename{/usr/share/polyxmass/polchem-defs/protein}). Indeed,
\pxmcommon comes with a full polymer chemistry definition; that is: a
polymer chemistry definition file (\filename{protein.xml}) and all the
data files that permit a polymer sequence of that polymer chemistry to
be rendered graphically in the \pxe editor module. Also, comes with
the ``protein'' polymer chemistry data, a \filename{chempad.conf} file
that describes ---for this specific polymer chemistry--- how to lay
out the chemical pad used in the \pxc module. Let's review all the
files that make up the ``protein'' polymer chemistry definition as a
functional set of data.


\subsubsection*{File protein.xml}

This file is located in the ``protein'' polymer chemistry
definition directory:

\filename{/usr/share/polyxmass/polchem-defs/protein}.

It is the file where the ``protein'' polymer chemistry definition is
detailed. Its contents look like this (omitting the DTD of the
\fileformat{xml}-format file):

\begin{mynoindent}
  \begin{alltt}
    <polchemdefdata>
    <type>protein</type>
    <leftcap>+H</leftcap>
    <rightcap>+OH</rightcap>
    <codelen>1</codelen>
    <ionizerule>
    <actform>+H</actform>
    <charge>1</charge>
    <level>1</level>
    </ionizerule>
    <monomers>
    <mnm>
    <name>Glycine</name>
    <code>G</code>
    <formula>C2H3NO</formula>
    </mnm>
    <mnm>
    <name>Alanine</name>
    <code>A</code>
    <formula>C3H5NO</formula>
    </mnm>
    \vdots
    </monomers>
    <modifs>
    <mdf>
    <name>Phosphorylation</name>
    <actform>-H+H2PO3</actform>
    </mdf>
    <mdf>
    <name>Acetylation</name>
    <actform>-H+C2H3O</actform>
    </mdf>
    <mdf>
    <name>Amidation</name>
    <actform>-OH+NH2</actform>
    </mdf>
    </modifs>
    <cleavespecs>
    <cls>
    <name>CyanogenBromide</name>
    <pattern>M/</pattern>
    <clr>
    <re-mnm-code>M</re-mnm-code>
    <re-actform>-CH2S+O</re-actform>
    </clr>
    </cls>
    \vdots
    <cls>
    <name>Trypsin</name>
    <pattern>K/;R/;-K/P</pattern>
    </cls>
    </cleavespecs>
    <fragspecs>
    <fgs>
    <name>a</name>
    <end>LE</end>
    <actform>-C1O1</actform>
    <fgr>
    <name>a-fgr-1</name>
    <actform>+H200</actform>
    <prev-mnm-code>E</prev-mnm-code>
    <this-mnm-code>D</this-mnm-code>
    <next-mnm-code>F</next-mnm-code>
    <comment>comment here!</comment>
    </fgr>
    <fgs>
    <name>z</name>
    <end>RE</end>
    <actform>-N1H1</actform>
    <comment>Not in CID high En. frag</comment>
    </fgs>
    \vdots
    <fgs>
    <name>imm</name>
    <end>NE</end>
    <actform>-C1O1+H1</actform>
    </fgs>
    </fragspecs>
    </polchemdefdata>
  \end{alltt}
\end{mynoindent}

\noindent As can be seen, the chemical entities that make up the
``protein'' polymer chemistry definition are listed here in a very
structured way. This file is written by the \pxd module, described in
another chapter of this manual. Note that some data shown here are
fake ---as far as the ``protein'' polymer chemistry is concerned---
and are only listed as examples of the fine-grain with which chemical
data can be defined in this file.

When a polymer sequence is either loaded from disk, or created
\textit{ex nihilo}, the \pxm program will manage to know of what
polymer chemistry definition it is. Once it knows what polymer
chemistry definition is involved for the polymer sequence at hand, the
program loads the corresponding file from disk (if it has not already
done so; no polymer chemistry definition file is read from disk more
than once, to preserve the smallest memory footprint for the whole
\pxm software suite).


\subsubsection{Files alanine.svg and alanine.png}

These two files are located in the ``protein'' polymer chemistry
definition directory:

\filename{/usr/share/polyxmass/polchem-defs/protein}.

There are two such files for any monomer that is defined in the
polymer definition file (for our example that is the
\filename{protein.xml} file). 

These two files are responsible for the graphical rendering ---in the
polymer sequence editor, the \pxe module--- of the monomers that
constitute a polymer sequence. For each monomer in a polymer sequence,
its graphical representation is performed by the graphical rendering
of a ``monomer icon'' file (that we call ``monicon''). These two files
are ``monicon files'' (see the chapter~\vref{chap:polyxedit}).

It should be noted right now that the user may ask that the rendering
of the monomers in a polymer sequence be performed at a given size (in
pixel units). Thus the size of the monicons has to be regulatable
---preferably without loss of resolution: we will see now how this is
achieved.

Of these two files, the first has a name ending with the
\filename{.svg} extension: it is a \emph{scalar vector graphics} file
(\fileformat{svg}-format file) that describes vector-graphically how
the corresponding monomer should be displayed in the \pxe sequence
editor.  The fact that this file is of that \fileformat{svg} format is
interesting because it makes it possible to render in the editor the
monicon at any size asked by the user without loosing the image
resolution.

The second of these two files has a \filename{.png} extension: it is a
\emph{portable network graphics} file (\fileformat{png}-format file)
that describes raster-graphically how the corresponding monomer should
be displayed in the \pxe sequence editor. Since this file describes
the rendering of a monomer icon in a ``static'' raster/bitmap graphics
format, it cannot scale properly without loss of resolution.

It is noteworthy that in theory, if all the scalar vector graphics
files (\fileformat{svg} files) were correctly interpreted by the
polymer sequence editor, the raster vector graphics files
(\fileformat{png} files) should be totally redundant and useless.
However, the \fileformat{png} file-reading libraries are much more
robust than the \fileformat{svg} file-reading libraries
(\fileformat{svg} is a rather recent standard). This is why it is
required to always provide the polymer sequence editor with a
fall-back solution in the form of a raster graphics \fileformat{png}
file to be used in case the monicon rendering from the scalar vector
graphics file failed.

Finally, we should mention that because the user may draw himself
these small graphics files, the graphical rendering of a polymer
sequence is totally customizable. For the user to be guided in this
process, I would simply mention that the \fileformat{svg} files were
all drawn using the \software{sodipodi} software program, and that the
raster \fileformat{png} files were obtained using the ``export''
function in this same program.

We will see later how the correspondence between a monomer in the
polymer chemistry definition and its corresponding graphics files is
established, so that when the user edits a polymer sequence ---by
typing the monomer codes at the keyboard--- the proper monicon is
displayed in the \pxe sequence editor.

\subsubsection{Files acetyl.svg and acetyl.png}

According to the same token as above, for monomer icon files, these
two files are respectively the \fileformat{svg} and \fileformat{png}
versions of the file that is used to graphically render the
``Acetylation'' monomer modification. These files are with a
transparent background and the small ``Ac'' red text that appears on
them is the only graphical element that will be visible when the files
are used for compositing their contents \emph{onto} the monomer icon
file that is used to render the monomer being chemically modified
using the ``Acetylation'' modification.

We will see later how the correspondence between a chemical
modification and its graphical file is performed, so that when the
user selects a monomer in the sequence editor and modifies it, the
proper graphical modification of its monicon is performed in order to
give the user a proper feedback that the monomer has effectively been
modified.


\subsubsection*{File cursor.svg}

This file is responsible for the representation of the editing cursor
in the \pxe module. Depending on the color of the monicons, it might
be necessary to modify the graphical rendering of the cursor in the
polymer sequence editor. This is necessary so that the graphical
rendering of polymer chemistries during polymer sequence editing can
be totally themeable. The cursor graphics file is necessarily a
\fileformat{svg} file because it \emph{has to scale up/down properly}
when the user changes the dimension of the monicons that render the
polymer sequence in the editor. The cursor always scales with the
monomer icons and adopts the same dimensions as theirs.


\subsubsection*{File chempad.conf}

We have already explained what this file is for. It might exist in the
polymer chemistry definition directory, in which case it is used to
lay out the chemical pad in the \pxc module. Note that this file is
used only if \pxc is run with specifying that a polymer chemistry
definition be loaded in it.


\subsubsection*{File monicons.dic}

\label{subsect:monicons.dic}

This file, also located in the ``protein'' polymer chemistry
definition directory, contains critical correspondences between
monomer codes and the graphics files used to render these monomers in
the sequence editor. Also, this file lists the correspondences between
the chemical modifications that might be set to monomers and the
graphical operations to perform so that the user is provided with a
visual feedback. Its contents are:

\begin{alltt}
monomer;A=alanine.svg|alanine.png
monomer;C=cysteine.svg|cysteine.png
monomer;D=aspartate.svg|aspartate.png
monomer;E=glutamate.svg|glutamate.png
monomer;F=phenylalanine.svg|phenylalanine.png 
\vdots
modif;Phosphorylation\%T\%phospho.svg|phospho.png
modif;Acetylation\%T\%acetyl.svg|acetyl.png
modif;AmidationAsp\%O\%asparagine.svg|asparagine.png
modif;AmidationGlu\%O\%glutamine.svg|glutamine.png
\end{alltt}


The first line of this file is saying ---\textsl{``Whenever the user
  wants to insert ---in the polymer sequence--- a monomer by keying-in
  \emph{`A'}, that monomer should be rendered using the
  \filename{alanine.svg} file or, if that rendering fails, using the
  \filename{alanine.png} file''.} The same wording is true for all the
monomers in the polymer chemistry definition.

The sixth line indicates that the monomers that are chemically
modified using a modification called ``Phosphorylation'' should have
their monicon graphically altered by compositing
\verb|%T%|ransparently (onto the monicon of the monomer being
modified) either the transparent scalar vector graphics
\filename{phospho.svg} file, or ---if something is wrong with this
file--- the raster \filename{phospho.png} file (see the
chapter~\vref{chap:polyxedit}).

The eighth line shows another graphical compositing rule.  The rule is
not \verb|%T%|ransparency, but involves an \verb|%O%|paque
graphical compositing. This line says that when a monomer is modified
using an ``AmidationAsp'' modification, its monomer icon should be
\emph{replaced} using a monomer icon rendered \textit{ex novo} by
reading either the scalar vector graphics file
\filename{asparagine.svg}, or ---if something is wrong with this
file--- by using the raster graphics file \filename{asparagine.png}.


\subsubsection*{File sounds.dic}

\label{subsect:sounds.dic}

This file, located in the ``protein'' polymer chemistry definition
directory (in the \filename{sounds} sub-directory), contains critical
correspondences between monomers' code/name or modifications' names
and their corresponding sound files. The format of the file is very
simple, as shown below:

\begin{alltt}
silence-sound\$silence.ogg

monomer;A=alanine.ogg|a.ogg
monomer;C=cysteine.ogg|c.ogg
monomer;D=aspartate.ogg|d.ogg
monomer;E=glutamate.ogg|e.ogg
monomer;F=phenylalanine.ogg|f.ogg
\vdots
modif;Phosphorylation=phospho.ogg
modif;AmidationAsp=amidation.ogg
modif;Acetylation=acetyl.ogg
modif;AmidationGlu=amidation.ogg

\end{alltt}


The first line of this file is saying ---\textsl{``Whenever the user
  wants to insert ---in the polymer sequence self-speak playlist--- a
  silent delay, that file \filename{silence.ogg} is to be used''.}

The second line indicates that when a sequence that is speaking itself
out encounters a monomer of code `A', then the file to be used should
be either:
\begin{itemize}
  \item \filename{alanine.ogg} if the user asks that the monomer names
    be vocalized in the playlist;
    \item \filename{a.ogg} if the user asks that the monomer codes be
      vocalized in the playlist.
\end{itemize}

The same wording is true for all the other monomers in the polymer
chemistry definition (see the chapter~\vref{chap:polyxedit}).

The seventh line states that if modifications are to speak themselves
out, the ``Phosphorylation'' modification should use the sound file
\filename{phospho.ogg}.


\subsection*{Polymer Sequence Sample Files }

There are two polymer sequence sample files that are shipped with
\pxmcommon. We'll detail one of the two in this section.

\subsubsection*{File protein-sample.pxm}

This file is a sample ``protein''-polymer chemistry polymer sequence.
It is shipped with \pxmcommon in order to let the user experiment with
the \pxm software package right after installation. This polymer
sequence file is of polymer chemistry definition ``protein'' as can be
seen from part of its contents:

\begin{mynoindent}
  \begin{alltt}
    <polseqdata>
    <polseqinfo>
    <type>protein</type>
    <name>Sample</name>
    <code>SP2003</code>
    <author>rusconi</author>
    <date>
    <year>2004</year>
    <month>01</month>
    <day>19</day>
    </date>
    </polseqinfo>
    </polseqinfo>
    <polseq>
    <monomer>
    <code>M</code>
    <prop>
    <name>MODIF</name>
    <data>Acetylation</data>
    </prop>
    </monomer>
    <codes>EFEEDF</codes>
    \vdots
    <monomer>
    <code>V</code>
    <prop>
    <name>NOTE</name>
    <data>SAMPLE-NOTE</data>
    <data type="str">This monomer belongs to the KPVV 
                     peptide [30-->33]</data>
    </prop>
    </monomer>
    </polseq>
    <prop>
    <name>NOTE</name>
    <data>COMMENT</data>
    <data type="str">this polymer is partly membranous.</data>
    </prop>
    <prop>
    <name>LEFT_END_MODIF</name>
    <data>Acetylation</data>
    </prop>
    <prop>
    <name>NOTE</name>
    <data>COMMENT</data>
    <data type="str">This protein is responsible 
                     for the multi-drug resistance effect.</data>
    </prop>
    </polseqdata>
  \end{alltt}
\end{mynoindent}

As can be seen here, one \fileformat{xml} element, tagged ``<type>''
contains a datum ``protein'', that tells \progname{\pxmng} that, when
this polymer sequence file is loaded, it should ensure that the
``protein'' polymer chemistry definition file is used to interpret its
data.

Other data follow that detail what the user has put in this polymer
sequence (monomer modifications ---see element
``<name>MODIF</name>''---, polymer modifications ---see element
``<name>LEFT\_END\_MODIF</name>''---, etc\ldots)


\renewcommand{\sectitle}{Example Of A New Atom Definition}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

The \pxm package has to ensure that users can either develop their own
polymer chemistry definitions or install packages that ship polymer
chemistry definition files (along with their configuration files and
data files; the whole set of files is collectively called the
``polymer chemistry definition''). To achieve that goal, the
\pxm software suite needs to be able to screen catalogue files on the
system in search for these atom/polymer chemistry definitions.

\begin{center}
  \fbox{\parbox{0.9\textwidth}{The user willing to understand the
      process that leads to the creation of a polymer chemistry
      definition package, can study the \pxmdata package that is part
      of the \pxm software suite. This package brings a number of new
      polymer chemistry definitions, like the ``dna'', ``rna'',
      ``saccharide'' polymer chemistry definitions. When installed,
      this package will make sure that catalogue files are installed
      in the configuration directory of the \pxm software suite. This
      way, when \pxmng is executed, it can parse these catalogue files
      in search for all the available polymer chemistry/atom
      definitions. The \pxmdata package is an excellent tutorial for
      the user willing to learn how to package polymer chemistry
      definitions.}}
\end{center}

\medskip
\noindent Packages can either bring atom definition files or polymer
chemistry definition files or both. In each case, different catalogue
files are to be installed in different configuration directories. In
this section we are exploring the ways to install a new atom
definition. Users are --- once again --- invited to peruse the chapter
about \pxm customization for detailed instructions about creating and
installing new polymer chemistry definition packages.

\bigskip

When an atom definition package brings one or more atom definition
files to the \pxm software suite, it should bring the equivalent of
the 

\filename{polyxmass-common-atom-defs-cat} 

file that is brought by the \pxmcommon package.

Let's see an example where a new atom definition file might be of
great use. Imagine that we are using mass spectrometry to fully
characterize bacterially-synthesized polypeptides for use in nuclear
magnetic resonance studies. These polypeptides were \emph{almost fully}
[$\mathrm{^{15}}$N]-labelled by growing the bacteria in
[$\mathrm{^{15}}$N]-saturated culture medium. Of course, the way
masses should be computed is very different than the usual way,
because the isotopic $\frac{[\mathrm{^{14}N}]}{[\mathrm{^{15}N}]}$
ratio of the nitrogen element has changed dramatically from the
naturally-occurring one.

How would this situation be dealt with in \pxm? The first action would
be to create a new atom definition file, say
\filename{atoms-n-nmr.xml}, for example. This
\filename{atoms-n-nmr.xml} atom definition file would list ---amongst
all the other atoms--- only one isotope for the nitrogen atom, the
[$\mathrm{^{15}}$N] isotope: its mass would thus be 15.0001089780 and
its abundance would be set to 100. We may give that new atom
definition the following name: ``n-nmr'' (see below).

How to let \pxm know that we may want to use this new atom definition
file? We would make a package, put that file into it, name the package
in a \pxm-consistent way, like ``polyxmass-n-nmr'' for example. We also
would have to put in that package a file listing the name of the atom
definition that should be correlated to the shipped atom definition
file. This file should look like the file that we already described
earlier, which is shipped with \pxmcommon:
\noindent\filename{polyxmass-common-atom-defs-cat} (see above about
the requirement that atom definition catalogues \emph{must} have a
filename ending with the \filename{atom-defs-cat} suffix. Typically,
the prefix should be the name of the package that brings that
catalogue file, such as \filename{polyxmass-common}, which thus yields
the \filename{polyxmass-common-atom-defs-cat} catalogue name). Thus we
may ship a catalogue file that should be named
\filename{polyxmass-n-nmr-atom-defs-cat} listing these contents:

\begin{mynoindent}
  \begin{alltt}
    n-nmr=/usr/share/polyxmass/atom-defs/polyxmass-n-nmr-atom-def.xml
  \end{alltt}
\end{mynoindent}

\noindent This \filename{polyxmass-n-nmr-atom-defs-cat} catalogue file
should be installed in the 

\filename{/etc/polyxmass/atom-defs} directory, thus its absolute file
name should be:

\filename{/etc/polyxmass/atom-defs/polyxmass-n-nmr-atom-defs-cat}.

When our atom definition package is installed, and \pxmng is executed,
its catalogue file will be parsed and the atom definition will
automatically be made available for use in the whole \pxm software
suite.

\bigskip

At this point, we have to make sure that this new atom definition is
used to compute masses when we are working on the polypeptides of
interest (the ones that are [$\mathrm{^{15}N}$]-rich); that is, we
must let \pxm know that there exists a new notion of a polymer
chemistry definition, say ``n-nmr-protein'', for example.  As the
system administrator, we can create a new polymer chemistry catalogue
file, like the one we described earlier:
\filename{polyxmass-common-polchem-defs-cat}, but naming it this way,
for example:

\noindent\filename{polyxmass-n-nmr-polchem-defs-cat}. These files are
located in

\noindent\filename{/etc/polyxmass/polchem-defs}. The new file should contain
this line (each polymer chemistry definition name and its
corresponding data \emph{must} be on a single line without space;
here, for clarity the line was broken, as symbolised with the
``\verb|\\|'' characters that are absent in the file):

\begin{mynoindent}
  \begin{alltt}
    n-nmr-protein=/usr/share/polyxmass/polchem-defs/protein/protein.xml\verb|\\|
    % /usr/share/polyxmass/polchem-defs/protein
  \end{alltt}
\end{mynoindent}

What this line says is that there now exists a new polymer chemistry
definition, named ``n-nmr-protein'', that uses a pre-existing polymer
chemistry definition named ``protein''.

When our new polymer chemistry definition catalogue file is installed,
and that \pxmng is run, it will parse that catalogue file along with
all the other ones and will thus aknowledge that the ``n-nmr-protein''
polymer chemistry definition should use the polymer chemistry
definition data located in the directory mentioned on the line above.

Now comes the really interesting configuration: we have to let \pxm
know that, whenever a polymer chemistry definition ``nmr-protein'' is
used, the atom definition ``n-nmr'' is to be used in order to compute
masses.

To do that we have to create, as root, a new dictionary file, similar
to the one that we have described earlier:

\filename{polyxmass-common-polchem-defs-atom-defs-dic}, but naming it
this way, for example:

\filename{n-nmr-protein-polchem-defs-atom-defs-dic}.  These files are
located in 

\filename{/etc/polyxmass/polchem-defs}.  That file should contain this
line:

\begin{mynoindent}
  \begin{alltt}
    n-mnr-protein=n-nmr
  \end{alltt}
\end{mynoindent}

When our new polymer chemistry definition/atom definition dictionary
file is installed, this new dictionary file will be parsed by \pxmng
and it will thus be known that when using the ``n-nmr-protein''
polymer chemistry definition, the atom definition ``n-nmr'' should be
used for any computation.

The last action that we should take in order to automatically compute
the masses in the ``n-nmr''-specialized way we want, is to tell the
polypeptide sequences we are working on that they are of polymer
chemistry definition ``n-nmr-protein''. To that end, make a copy of
the polymer sequence of interest and change, using a text editor, the
contents of the \texttt{<type>} element (that is ``protein'') to
``n-nmr-protein''. Open that new sequence file in a freshly started
\pxm program, and the masses should be computed with the new atom
definitions.

That's the end of the story here. 


\renewcommand{\sectitle}{Conclusion}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

In this chapter, we have described what file-system hierarchy governs
the \pxm understanding of different polymer chemistries. The described
set of data/configuration files (and directories) is the minimal set
of information that is required for \pxm to operate.

The user willing to learn how to create brand new packages that bring
to the user new atom definitions and/or new polymer chemistry
definitions is invited to carefully study the \pxmdata package, that
is optional in the \pxm software suite.

The data/configuration files that are brought by both packages
(\pxmcommon and \pxmdata) are installed ---by default--- in system
directories, like \filename{/usr}, \filename{/etc},
\filename{/usr/local}\dots\ and are thus available to all the users
gaining access to the system. \pxm manages a creativity space for the
individual user to add atom definitions and/or polymer chemistry
definitions for his own use \emph{exclusively}.

The next chapter describes in detail the process leading to the
customization of the \pxm software suite, by going through the example
of adding the ``saccharide'' polymer chemistry definition to the \pxm
software suite for individual use (not globally available to the
system).




\cleardoublepage



%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "polyxmass"
%%% End: