\chapter[\pxmcommon]{\pxmcommon: The Configuration and Data Files Hierarchy} \label{chap:polyxmass-common} The \pxm\ software suite is designed to be compatible with any polymer chemistry that the user cares to define. To be that flexible, \pxm has to be able to store polymer chemistry definition files and related data files in a very clearly-designed ``file-system''. The structure of this ``file-system'' is what this chapter is all about. When the user defines a polymer chemistry definition (typically using \pxd), the contents of that definition are saved in a text file (using the \fileformat{xml} format). Once a polymer chemistry definition is saved and registered in the \pxm system, the user can create a new polymer sequence of that polymer chemistry:\footnote{Editing of polymer sequences is typically performed in \pxe.} when entering monomer codes at the keyboard, the user sees monomer icons (small graphical images) being displayed in the sequence editor (we call that process the ``graphical rendering'' of the polymer sequence). So, here are a number of typical questions: \begin{itemize} \item Where is defined the correspondence between any monomer code (as keyed-in during a polymer sequence editing session) and the monomer icon\footnote{We call that icon a ``monicon.''} that is displayed in the \pxe sequence editor? \item Where are located all the graphics files that are used to graphically render a sequence in the editor? And the sound files used to let a polymer sequence speak itself out? \item Where is located the atom definition that should be used with this specific polymer chemistry definition, and were is defined the correspondence between a polymer chemistry and the atom definition file that is required? \end{itemize} \noindent Within \pxm, there are two different kinds of data/configuration data files: \begin{itemize} \item Compulsory data/configuration files that \emph{must} be on the system at precise locations whatever the chemical definitions being used on that system; \item Optional data/configuration files that are installed by users (or system administrators) so as to comply with requirements specific of each installation.\footnote{For example, a synthetic polymer lab will almost certainly not install data packages about proteins or nucleic acids, while a biochemistry lab will almost certainly not install packages about poly-methylmetacrylate\dots} \end{itemize} \noindent The present chapter is about the first item in the above bulleted list: compulsory data/configuration files that are all shipped with the \pxmcommon package. We will review the locations where data/configuration files are installed and the mechanics that make \pxm work on any kind of polymer chemistry. \renewcommand{\sectitle}{Overview Of The Files Installed} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} Let us first review all the files that are installed by the \pxmcommon package: \begin{alltt} \begin{footnotesize} /usr/share/polyxmass/atom-defs/atoms.xml /usr/share/polyxmass/polchem-defs/protein /usr/share/polyxmass/polchem-defs/protein/acetyl.png /usr/share/polyxmass/polchem-defs/protein/acetyl.svg /usr/share/polyxmass/polchem-defs/protein/acetyl-text.svg /usr/share/polyxmass/polchem-defs/protein/alanine.png /usr/share/polyxmass/polchem-defs/protein/alanine.svg /usr/share/polyxmass/polchem-defs/protein/alanine-text.svg /usr/share/polyxmass/polchem-defs/protein/monicons.dic /usr/share/polyxmass/polchem-defs/protein/sounds/alanine.ogg /usr/share/polyxmass/polchem-defs/protein/sounds/a.ogg /usr/share/polyxmass/polchem-defs/protein/sounds/methyl.ogg /usr/share/polyxmass/polchem-defs/protein/sounds/sounds.dic /usr/share/polyxmass/polchem-defs/protein/chempad.conf /usr/share/polyxmass/polchem-defs/protein/acidobasic.xml /usr/share/polyxmass/polchem-defs/protein/cursor.svg /usr/share/polyxmass/polchem-defs/protein/protein.xml /usr/share/polyxmass/polchem-defs/protein/peptide.xml /usr/share/polyxmass/pol-seqs/protein-sample.pxm /usr/share/polyxmass/pol-seqs/protein-fragments-sample.pxm /etc/polyxmass/atom-defs/polyxmass-common-atom-defs-cat /etc/polyxmass/polchem-defs/polyxmass-common-polchem-defs-cat /etc/polyxmass/polchem-defs/polyxmass-common-polchem-defs-atom-defs-dic /etc/polyxmass/chempad.conf /usr/share/doc/polyxmass-common/AUTHORS /usr/share/doc/polyxmass-common/COPYING /usr/share/doc/polyxmass-common/INSTALL /usr/share/doc/polyxmass-common/NEWS /usr/share/doc/polyxmass-common/README /usr/share/doc/polyxmass-common/TODO /usr/share/doc/polyxmass-common/THANKS /usr/share/man/man7/polyxmass-common.7 /usr/lib/pkgconfig/polyxmass-common.pc \end{footnotesize} \end{alltt} All the text above is the output (edited for clarity) of the \command{make install} command that is performed to install the \pxmcommon package on the system. It is taken for granted that the user did not change the \option{---sysconfdir=/etc} option to the \command{configure} script and that he passed the following option to that same \command{configure} script: \option{---prefix=/usr}. If the \pxmcommon package is installed as a binary package, then the user needs not worry: the packager did choose the best installation directories. Let us review each file that is installed one by one, telling what it is meant for: \begin{itemize} \item Files located in \filename{/etc/polyxmass}: \begin{itemize} \item \filename{atom-defs/polyxmass-common-atom-defs-cat} {\footnotesize This file is the catalog file corresponding to the \pxmcommon package. It contains the list of the atom definition files that are brought by the \pxmcommon package.} \item \filename{polchem-defs/polyxmass-common-polchem-defs-cat} {\footnotesize This file is the catalog file corresponding to the \pxmcommon package. It contains the list of the polymer chemistry definition files that are brought by the \pxmcommon package.} \item \filename{polchem-defs/polyxmass-common-polchem-defs-atom-defs-dic} {\footnotesize This file is the dictionary file corresponding to the \pxmcommon package. It contains the relations between each polymer chemistry definition file shipped with the package and its cognate atom definition file.} \item \filename{chempad.conf} {\footnotesize This file describes the layout of the chemical pad of the \pxc module, in case the polymer chemistry definition does not have one and the user does not have one neither. This file can thus be called the ``default'' layout definition file for the \pxc's chemical pad.} \end{itemize} \item Files located in \filename{/usr/share/polyxmass}: \begin{itemize} \item \filename{polyxmass/atom-defs/atoms.xml} {\footnotesize This file is the \emph{``basic''} atom definition file. The \pxm software suite mandates that one atom definition file be present in the system.} \item \filename{polyxmass/polchem-defs/protein/acetyl.png} {\footnotesize This is one of the raster files that are used in the polymer chemistry definition to graphically render the ``Acetylation'' chemical modification. Note that also installed is a file by the same name but with extension \filename{.svg}, instead of \filename{.png}. This file is a scalar vector graphics version from which the \filename{.png} file was generated.} \item \filename{polyxmass/polchem-defs/protein/alanine.png} {\footnotesize One of the graphics files that are used to render graphically the monomers defined in the polymer chemistry definition (in this case the monomer is ``alanine''). Same remark as above for the \filename{.svg} extension file.} \item \filename{polyxmass/polchem-defs/protein/chempad.conf} {\footnotesize This file describes the layout of the chemical pad of the \pxc module. Each polymer chemistry definition might have a \filename{chempad.conf} file associated to it. This file is optional.} \item \filename{polyxmass/polchem-defs/protein/acidobasic.xml} {\footnotesize This file describes the chemistry of all the monomers and modifications in the polymer chemistry definition that might bring charges. The data contained in this file are used by the functions that compute either the charge level of a polymer sequence at a given pH value, or the isoelectric point of a polymer sequence (that is the pH value at which the net charge of the protein is near zero).} \item \filename{polyxmass/polchem-defs/protein/monicons.dic} {\footnotesize This file is the one that lists the correspondence between the monomer codes/modifications and the files used to render the monomers/modifications graphically in the sequence editor. The lines in this file look like:\\ \verb+monomer;A=alanine.svg|alanine.png+ for a monomer, and like:\\ \verb+modif;Phosphorylation%T%phospho.svg|phospho.png+ for a modification. The latter line indicates that when a monomer is modified using the ``Phosphorylation'' modification, the to-be-modified monomer icon get modified by transparently pasting onto it the monicon contained in the file \filename{phospho.svg} (see the \verb|%T%|).} \item \filename{polyxmass/polchem-defs/protein/sounds/sounds.dic} {\footnotesize This file is the one that lists the correspondence between the monomer codes (or names) and their corresponding sound files. The same is true for modifications. The file contains lines in the form:\\ \verb+monomer;Y=tyrosine.ogg|y.ogg+ for monomers and in the form:\\ \verb+modif;Phosphorylation=phospho.ogg+ for modifications. The ``monomer;'' line indicates that the monomer `Y' has its name vocalized in the \filename{tyrosine.ogg} file, while its monomer is vocalized in the \filename{y.ogg} file. The ``modif;'' line indicates that that the ``Phosphorylation'' modification is vocalized in the \filename{phospho.ogg} file.} \item \filename{polyxmass/polchem-defs/protein/cursor.svg} {\footnotesize This file is a graphics file that describes how the cursor should be rendered graphically in the polymer sequence editor. Each polymer chemistry definition must provide this file.} \item \filename{polyxmass/polchem-defs/protein/protein.xml} {\footnotesize This is the actual polymer chemistry definition file. This file is a text file formatted according to the \fileformat{xml} standard. It contains a description of all the chemical entities that make up the polymer chemistry definition.} \item \filename{polyxmass/pol-seqs/protein-sample.pxm} {\footnotesize This is an example polymer sequence file. It can be used by the user to learn how to use the \pxe module. This protein sequence is of polymer chemistry definition ``protein'', that is defined in the file that we described above (\filename{protein.xml}).} \item \filename{man/man7/polyxmass-common.7} {\footnotesize This file is the manual page that accompanies the \pxmcommon package.} \end{itemize} \item Files located in \filename{/usr/lib}: \begin{itemize} \item \filename{pkgconfig/polyxmass-common.pc} {\footnotesize This file is the \software{pkg-config} configuration file that will allow other packages to check if \pxmcommon is installed correctly and what is its version.} \end{itemize} % \item \filename{} % {\footnotesize .} \end{itemize} \renewcommand{\sectitle}{Detailed Explanations About Installed Files} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} \noindent Now that we have an overview of what each one of the files that get installed does, we may want to take a closer look at some of the files that were listed above. \subsection*{File polyxmass-common-atom-defs-cat} Each package that brings atom definition files, should list ---in a similar file (ending with the \filename{atom-defs-cat} suffix)--- all the atom definitions that are made available to the system. The \pxmcommon package installs \filename{polyxmass-common-atom-defs-cat} in \filename{/etc/polyxmass/atom-defs}. Its contents are: \begin{mynoindent} \begin{alltt} basic=/usr/share/polyxmass/atom-defs/atoms.xml \end{alltt} \end{mynoindent} Thus, we see that \pxmcommon brings one atom definition file (\filename{atoms.xml}), installed in \filename{/usr/share/polyxmass/atom-defs} and made available to the \pxm system under the name ``basic''. This latter name is the one used by polymer definitions to specify with what atom definition file they should be working. We'll see that later. \subsection*{File polyxmass-common-polchem-defs-cat} Each package that brings polymer chemistry definition files, should list ---in a similar file (ending with the \filename{polchem-defs-cat} suffix)--- all the polymer chemistry definitions that are made available to the system. The \pxmcommon package installs \filename{polyxmass-common-polchem-defs-cat} in \filename{/etc/polyxmass/polchem-defs}. Its contents are (each polymer chemistry definition name and its corresponding data \emph{must} be on a single line without space; here, for clarity the line was broken, as symbolised with the ``\verb|\\|'' characters that are absent in the file): \begin{mynoindent} \begin{alltt} protein=/usr/share/polyxmass/polchem-defs/protein/protein.xml\verb|\\| \%/usr/share/polyxmass/polchem-defs/protein peptide=/usr/share/polyxmass/polchem-defs/protein/peptide.xml\verb|\\| \%/usr/share/polyxmass/polchem-defs/protein \end{alltt} \end{mynoindent} Thus, we see that \pxmcommon brings two polymer chemistry definition files (\filename{protein.xml} and \filename{peptide.xml}), installed in \filename{/usr/share/polyxmass/polchem-defs} and made available to the \pxm system under the names ``protein'' and ``peptide'', respectively. As can be seen by the example above, the polymer chemistry definition file names are absolute file names (that means that they are preceded by the whole path leading to the file in question) and are separated ---by a \verb|%| character--- from the absolute name of the directory where %all their corresponding data reside. Thus, the ``protein'' polymer chemistry definition file is \filename{protein.xml}, that is located at \begin{mynoindent} \filename{/usr/share/polyxmass/polchem-defs}. \end{mynoindent} The directory where all the ``protein'' polymer chemistry definition-related data are located is: \begin{mynoindent} \filename{/usr/share/polyxmass/polchem-defs/protein}. \end{mynoindent} We will see later how this catalogue file is used, in order to create a main catalogue file that is use to read the proper polymer chemistry definition file that is needed when the user asks, for example, that a ``protein'' sequence be displayed in \pxe. \subsection*{File polyxmass-common-polchem-defs-atom-defs-dic} Each package that brings polymer chemistry definition files, should list ---in a similar file (ending with the \filename{polchem-defs-atom-defs-dic} suffix)--- all the relations that govern the use of a determinate atom definition file by any given polymer chemistry definition. The \pxmcommon package installs \filename{polyxmass-common-polchem-defs-atom-defs-dic} in \filename{/etc/polyxmass/polchem-defs}. Its contents are: \begin{mynoindent} \begin{alltt} protein=basic peptide=basic \end{alltt} \end{mynoindent} \noindent The first line of this file stipulates that ---\textsl{``When working on polymer sequences of polymer chemistry ``protein'', the atom definition to be used is the one having name ``basic.''} Since the system knows what actual file corresponds to the atom definition ``basic'', as we already have seen above, it is not difficult to load that specific atom definition file from disk. \subsection*{File chempad.conf} This file is responsible for governing the chemical pad layout in the \pxc module. Each polymer chemistry definition may have one such file in its directory (for example, for the ``protein'' polymer chemistry definition, we have the \filename{/usr/share/polyxmass/polchem-defs/protein/chempad.conf} file. When a polymer chemistry definition with no \filename{chempad.conf} is used in \pxc, the program automatically tries to read the file from the user's \filename{.polyxmass/chempad.conf} file. If that user's file is not found, the last resort is to read the \filename{/etc/polyxmass/chempad.conf} file. This file contains lines like the following: \begin{mynoindent} \begin{alltt} chempad_columns\$3 chempadkey=protonate\%+H1\%adds a proton chempadkey=hydrate\%+H2O1\%adds a water molecule chempadkey=0H-ylate\%+O1H1\%adds an hydroxyl group chempadkey=acetylate\%-H1+C2H3O1\%adds an acetyl group chempadkey=protonate\%+H1\%adds a proton chempadkey=hydrate\%+H2O1\%adds a water molecule \end{alltt} \end{mynoindent} The first line tells that the chemical pad buttons should be laid out in three columns. Each following line configures one button, that will sit on the chemical pad. Thus, the syntax of a line is the following: \verb|chempadkey=button_label%action-formula%button_tooltip| \medskip \noindent The first button-defining line, for example, configures the creation of a button with the label \guival{``protonate''} which ---when mouse-clicked--- will elicit the addition of the contents of the action-formula \guival{``+H1''} in the \pxc module. The string \guival{``adds a proton''} is the text that will appear as a tooltip when the mouse cursor sits on the button. \subsection*{File acidobasic.xml} This file contains all the pKa data about all the different chemical groups beared either by monomers or modifications defined in the polymer chemistry definition. This file is used when computations about net charges of polymer sequence at a given pH value are asked. Also, this file is used when isoelectric point calculations are performed. See section~\vref{sect:acido-basic-calculations}. \subsection*{File monicons.dic} This file is obligatory for each polymer chemistry definition. So, for our example of the ``protein'' polymer chemistry, it would be found in that polymer chemistry definition directory: \filename{/usr/share/polyxmass/polchem-defs/protein/monicons.dic}. See below for detailed explanations of its contents. \subsection*{File atoms.xml} This file, that is located at \filename{/usr/share/polyxmass/atom-defs}, is obligatory for \pxm to operate normally. Indeed, if there were no atom definitions, we would be in trouble to compute masses for any chemical entity that is represented by its formula (or action-formula). There might be other atom definitions files, located in that same directory, but with other names. As we have seen above, there is one atom definition, called ``basic'', that is used by the ``protein'' and ``peptide'' polymer chemistry definitions. This ``basic'' atom definition is actually this \filename{atoms.xml} file. In more details: this file is an atom definition file, where atoms are defined by defining their individual data. An atom is the resultant of the isotope(s) that it is comprised of. Some atoms only have one isotope, other atoms have as much as seven or eight different isotopes. An isotope is characterized by its mass and its abundance. Hence, the structure of an atom definition, in this file: \begin{mynoindent} \begin{alltt} <atom> <name>Carbon</name> <symbol>C</symbol> <isotope> <mass>12.0000000000</mass> <abund>98.9300000000</abund> </isotope> <isotope> <mass>13.0033548390</mass> <abund>1.0700000000</abund> </isotope> </atom> \end{alltt} \end{mynoindent} \noindent There might be as many such atom definitions ---in this atom definition file--- as required for the polymer chemistry definition with which it is to be used. Indeed, we already have mentioned that any polymer chemistry definition must specify the atom definition with which it must work specifically for things to behave properly (that association is specified in the \filename{polyxmass-common-polchem-defs-atom-defs.dic} file (for the polymer chemistry definitions brought by the \pxmcommon package; see above). \subsection*{Directory protein} This directory is the directory where the example ``protein'' polymer chemistry definition data are located (\filename{/usr/share/polyxmass/polchem-defs/protein}). Indeed, \pxmcommon comes with a full polymer chemistry definition; that is: a polymer chemistry definition file (\filename{protein.xml}) and all the data files that permit a polymer sequence of that polymer chemistry to be rendered graphically in the \pxe editor module. Also, comes with the ``protein'' polymer chemistry data, a \filename{chempad.conf} file that describes ---for this specific polymer chemistry--- how to lay out the chemical pad used in the \pxc module. Let's review all the files that make up the ``protein'' polymer chemistry definition as a functional set of data. \subsubsection*{File protein.xml} This file is located in the ``protein'' polymer chemistry definition directory: \filename{/usr/share/polyxmass/polchem-defs/protein}. It is the file where the ``protein'' polymer chemistry definition is detailed. Its contents look like this (omitting the DTD of the \fileformat{xml}-format file): \begin{mynoindent} \begin{alltt} <polchemdefdata> <type>protein</type> <leftcap>+H</leftcap> <rightcap>+OH</rightcap> <codelen>1</codelen> <ionizerule> <actform>+H</actform> <charge>1</charge> <level>1</level> </ionizerule> <monomers> <mnm> <name>Glycine</name> <code>G</code> <formula>C2H3NO</formula> </mnm> <mnm> <name>Alanine</name> <code>A</code> <formula>C3H5NO</formula> </mnm> \vdots </monomers> <modifs> <mdf> <name>Phosphorylation</name> <actform>-H+H2PO3</actform> </mdf> <mdf> <name>Acetylation</name> <actform>-H+C2H3O</actform> </mdf> <mdf> <name>Amidation</name> <actform>-OH+NH2</actform> </mdf> </modifs> <cleavespecs> <cls> <name>CyanogenBromide</name> <pattern>M/</pattern> <clr> <re-mnm-code>M</re-mnm-code> <re-actform>-CH2S+O</re-actform> </clr> </cls> \vdots <cls> <name>Trypsin</name> <pattern>K/;R/;-K/P</pattern> </cls> </cleavespecs> <fragspecs> <fgs> <name>a</name> <end>LE</end> <actform>-C1O1</actform> <fgr> <name>a-fgr-1</name> <actform>+H200</actform> <prev-mnm-code>E</prev-mnm-code> <this-mnm-code>D</this-mnm-code> <next-mnm-code>F</next-mnm-code> <comment>comment here!</comment> </fgr> <fgs> <name>z</name> <end>RE</end> <actform>-N1H1</actform> <comment>Not in CID high En. frag</comment> </fgs> \vdots <fgs> <name>imm</name> <end>NE</end> <actform>-C1O1+H1</actform> </fgs> </fragspecs> </polchemdefdata> \end{alltt} \end{mynoindent} \noindent As can be seen, the chemical entities that make up the ``protein'' polymer chemistry definition are listed here in a very structured way. This file is written by the \pxd module, described in another chapter of this manual. Note that some data shown here are fake ---as far as the ``protein'' polymer chemistry is concerned--- and are only listed as examples of the fine-grain with which chemical data can be defined in this file. When a polymer sequence is either loaded from disk, or created \textit{ex nihilo}, the \pxm program will manage to know of what polymer chemistry definition it is. Once it knows what polymer chemistry definition is involved for the polymer sequence at hand, the program loads the corresponding file from disk (if it has not already done so; no polymer chemistry definition file is read from disk more than once, to preserve the smallest memory footprint for the whole \pxm software suite). \subsubsection{Files alanine.svg and alanine.png} These two files are located in the ``protein'' polymer chemistry definition directory: \filename{/usr/share/polyxmass/polchem-defs/protein}. There are two such files for any monomer that is defined in the polymer definition file (for our example that is the \filename{protein.xml} file). These two files are responsible for the graphical rendering ---in the polymer sequence editor, the \pxe module--- of the monomers that constitute a polymer sequence. For each monomer in a polymer sequence, its graphical representation is performed by the graphical rendering of a ``monomer icon'' file (that we call ``monicon''). These two files are ``monicon files'' (see the chapter~\vref{chap:polyxedit}). It should be noted right now that the user may ask that the rendering of the monomers in a polymer sequence be performed at a given size (in pixel units). Thus the size of the monicons has to be regulatable ---preferably without loss of resolution: we will see now how this is achieved. Of these two files, the first has a name ending with the \filename{.svg} extension: it is a \emph{scalar vector graphics} file (\fileformat{svg}-format file) that describes vector-graphically how the corresponding monomer should be displayed in the \pxe sequence editor. The fact that this file is of that \fileformat{svg} format is interesting because it makes it possible to render in the editor the monicon at any size asked by the user without loosing the image resolution. The second of these two files has a \filename{.png} extension: it is a \emph{portable network graphics} file (\fileformat{png}-format file) that describes raster-graphically how the corresponding monomer should be displayed in the \pxe sequence editor. Since this file describes the rendering of a monomer icon in a ``static'' raster/bitmap graphics format, it cannot scale properly without loss of resolution. It is noteworthy that in theory, if all the scalar vector graphics files (\fileformat{svg} files) were correctly interpreted by the polymer sequence editor, the raster vector graphics files (\fileformat{png} files) should be totally redundant and useless. However, the \fileformat{png} file-reading libraries are much more robust than the \fileformat{svg} file-reading libraries (\fileformat{svg} is a rather recent standard). This is why it is required to always provide the polymer sequence editor with a fall-back solution in the form of a raster graphics \fileformat{png} file to be used in case the monicon rendering from the scalar vector graphics file failed. Finally, we should mention that because the user may draw himself these small graphics files, the graphical rendering of a polymer sequence is totally customizable. For the user to be guided in this process, I would simply mention that the \fileformat{svg} files were all drawn using the \software{sodipodi} software program, and that the raster \fileformat{png} files were obtained using the ``export'' function in this same program. We will see later how the correspondence between a monomer in the polymer chemistry definition and its corresponding graphics files is established, so that when the user edits a polymer sequence ---by typing the monomer codes at the keyboard--- the proper monicon is displayed in the \pxe sequence editor. \subsubsection{Files acetyl.svg and acetyl.png} According to the same token as above, for monomer icon files, these two files are respectively the \fileformat{svg} and \fileformat{png} versions of the file that is used to graphically render the ``Acetylation'' monomer modification. These files are with a transparent background and the small ``Ac'' red text that appears on them is the only graphical element that will be visible when the files are used for compositing their contents \emph{onto} the monomer icon file that is used to render the monomer being chemically modified using the ``Acetylation'' modification. We will see later how the correspondence between a chemical modification and its graphical file is performed, so that when the user selects a monomer in the sequence editor and modifies it, the proper graphical modification of its monicon is performed in order to give the user a proper feedback that the monomer has effectively been modified. \subsubsection*{File cursor.svg} This file is responsible for the representation of the editing cursor in the \pxe module. Depending on the color of the monicons, it might be necessary to modify the graphical rendering of the cursor in the polymer sequence editor. This is necessary so that the graphical rendering of polymer chemistries during polymer sequence editing can be totally themeable. The cursor graphics file is necessarily a \fileformat{svg} file because it \emph{has to scale up/down properly} when the user changes the dimension of the monicons that render the polymer sequence in the editor. The cursor always scales with the monomer icons and adopts the same dimensions as theirs. \subsubsection*{File chempad.conf} We have already explained what this file is for. It might exist in the polymer chemistry definition directory, in which case it is used to lay out the chemical pad in the \pxc module. Note that this file is used only if \pxc is run with specifying that a polymer chemistry definition be loaded in it. \subsubsection*{File monicons.dic} \label{subsect:monicons.dic} This file, also located in the ``protein'' polymer chemistry definition directory, contains critical correspondences between monomer codes and the graphics files used to render these monomers in the sequence editor. Also, this file lists the correspondences between the chemical modifications that might be set to monomers and the graphical operations to perform so that the user is provided with a visual feedback. Its contents are: \begin{alltt} monomer;A=alanine.svg|alanine.png monomer;C=cysteine.svg|cysteine.png monomer;D=aspartate.svg|aspartate.png monomer;E=glutamate.svg|glutamate.png monomer;F=phenylalanine.svg|phenylalanine.png \vdots modif;Phosphorylation\%T\%phospho.svg|phospho.png modif;Acetylation\%T\%acetyl.svg|acetyl.png modif;AmidationAsp\%O\%asparagine.svg|asparagine.png modif;AmidationGlu\%O\%glutamine.svg|glutamine.png \end{alltt} The first line of this file is saying ---\textsl{``Whenever the user wants to insert ---in the polymer sequence--- a monomer by keying-in \emph{`A'}, that monomer should be rendered using the \filename{alanine.svg} file or, if that rendering fails, using the \filename{alanine.png} file''.} The same wording is true for all the monomers in the polymer chemistry definition. The sixth line indicates that the monomers that are chemically modified using a modification called ``Phosphorylation'' should have their monicon graphically altered by compositing \verb|%T%|ransparently (onto the monicon of the monomer being modified) either the transparent scalar vector graphics \filename{phospho.svg} file, or ---if something is wrong with this file--- the raster \filename{phospho.png} file (see the chapter~\vref{chap:polyxedit}). The eighth line shows another graphical compositing rule. The rule is not \verb|%T%|ransparency, but involves an \verb|%O%|paque graphical compositing. This line says that when a monomer is modified using an ``AmidationAsp'' modification, its monomer icon should be \emph{replaced} using a monomer icon rendered \textit{ex novo} by reading either the scalar vector graphics file \filename{asparagine.svg}, or ---if something is wrong with this file--- by using the raster graphics file \filename{asparagine.png}. \subsubsection*{File sounds.dic} \label{subsect:sounds.dic} This file, located in the ``protein'' polymer chemistry definition directory (in the \filename{sounds} sub-directory), contains critical correspondences between monomers' code/name or modifications' names and their corresponding sound files. The format of the file is very simple, as shown below: \begin{alltt} silence-sound\$silence.ogg monomer;A=alanine.ogg|a.ogg monomer;C=cysteine.ogg|c.ogg monomer;D=aspartate.ogg|d.ogg monomer;E=glutamate.ogg|e.ogg monomer;F=phenylalanine.ogg|f.ogg \vdots modif;Phosphorylation=phospho.ogg modif;AmidationAsp=amidation.ogg modif;Acetylation=acetyl.ogg modif;AmidationGlu=amidation.ogg \end{alltt} The first line of this file is saying ---\textsl{``Whenever the user wants to insert ---in the polymer sequence self-speak playlist--- a silent delay, that file \filename{silence.ogg} is to be used''.} The second line indicates that when a sequence that is speaking itself out encounters a monomer of code `A', then the file to be used should be either: \begin{itemize} \item \filename{alanine.ogg} if the user asks that the monomer names be vocalized in the playlist; \item \filename{a.ogg} if the user asks that the monomer codes be vocalized in the playlist. \end{itemize} The same wording is true for all the other monomers in the polymer chemistry definition (see the chapter~\vref{chap:polyxedit}). The seventh line states that if modifications are to speak themselves out, the ``Phosphorylation'' modification should use the sound file \filename{phospho.ogg}. \subsection*{Polymer Sequence Sample Files } There are two polymer sequence sample files that are shipped with \pxmcommon. We'll detail one of the two in this section. \subsubsection*{File protein-sample.pxm} This file is a sample ``protein''-polymer chemistry polymer sequence. It is shipped with \pxmcommon in order to let the user experiment with the \pxm software package right after installation. This polymer sequence file is of polymer chemistry definition ``protein'' as can be seen from part of its contents: \begin{mynoindent} \begin{alltt} <polseqdata> <polseqinfo> <type>protein</type> <name>Sample</name> <code>SP2003</code> <author>rusconi</author> <date> <year>2004</year> <month>01</month> <day>19</day> </date> </polseqinfo> </polseqinfo> <polseq> <monomer> <code>M</code> <prop> <name>MODIF</name> <data>Acetylation</data> </prop> </monomer> <codes>EFEEDF</codes> \vdots <monomer> <code>V</code> <prop> <name>NOTE</name> <data>SAMPLE-NOTE</data> <data type="str">This monomer belongs to the KPVV peptide [30-->33]</data> </prop> </monomer> </polseq> <prop> <name>NOTE</name> <data>COMMENT</data> <data type="str">this polymer is partly membranous.</data> </prop> <prop> <name>LEFT_END_MODIF</name> <data>Acetylation</data> </prop> <prop> <name>NOTE</name> <data>COMMENT</data> <data type="str">This protein is responsible for the multi-drug resistance effect.</data> </prop> </polseqdata> \end{alltt} \end{mynoindent} As can be seen here, one \fileformat{xml} element, tagged ``<type>'' contains a datum ``protein'', that tells \progname{\pxmng} that, when this polymer sequence file is loaded, it should ensure that the ``protein'' polymer chemistry definition file is used to interpret its data. Other data follow that detail what the user has put in this polymer sequence (monomer modifications ---see element ``<name>MODIF</name>''---, polymer modifications ---see element ``<name>LEFT\_END\_MODIF</name>''---, etc\ldots) \renewcommand{\sectitle}{Example Of A New Atom Definition} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} The \pxm package has to ensure that users can either develop their own polymer chemistry definitions or install packages that ship polymer chemistry definition files (along with their configuration files and data files; the whole set of files is collectively called the ``polymer chemistry definition''). To achieve that goal, the \pxm software suite needs to be able to screen catalogue files on the system in search for these atom/polymer chemistry definitions. \begin{center} \fbox{\parbox{0.9\textwidth}{The user willing to understand the process that leads to the creation of a polymer chemistry definition package, can study the \pxmdata package that is part of the \pxm software suite. This package brings a number of new polymer chemistry definitions, like the ``dna'', ``rna'', ``saccharide'' polymer chemistry definitions. When installed, this package will make sure that catalogue files are installed in the configuration directory of the \pxm software suite. This way, when \pxmng is executed, it can parse these catalogue files in search for all the available polymer chemistry/atom definitions. The \pxmdata package is an excellent tutorial for the user willing to learn how to package polymer chemistry definitions.}} \end{center} \medskip \noindent Packages can either bring atom definition files or polymer chemistry definition files or both. In each case, different catalogue files are to be installed in different configuration directories. In this section we are exploring the ways to install a new atom definition. Users are --- once again --- invited to peruse the chapter about \pxm customization for detailed instructions about creating and installing new polymer chemistry definition packages. \bigskip When an atom definition package brings one or more atom definition files to the \pxm software suite, it should bring the equivalent of the \filename{polyxmass-common-atom-defs-cat} file that is brought by the \pxmcommon package. Let's see an example where a new atom definition file might be of great use. Imagine that we are using mass spectrometry to fully characterize bacterially-synthesized polypeptides for use in nuclear magnetic resonance studies. These polypeptides were \emph{almost fully} [$\mathrm{^{15}}$N]-labelled by growing the bacteria in [$\mathrm{^{15}}$N]-saturated culture medium. Of course, the way masses should be computed is very different than the usual way, because the isotopic $\frac{[\mathrm{^{14}N}]}{[\mathrm{^{15}N}]}$ ratio of the nitrogen element has changed dramatically from the naturally-occurring one. How would this situation be dealt with in \pxm? The first action would be to create a new atom definition file, say \filename{atoms-n-nmr.xml}, for example. This \filename{atoms-n-nmr.xml} atom definition file would list ---amongst all the other atoms--- only one isotope for the nitrogen atom, the [$\mathrm{^{15}}$N] isotope: its mass would thus be 15.0001089780 and its abundance would be set to 100. We may give that new atom definition the following name: ``n-nmr'' (see below). How to let \pxm know that we may want to use this new atom definition file? We would make a package, put that file into it, name the package in a \pxm-consistent way, like ``polyxmass-n-nmr'' for example. We also would have to put in that package a file listing the name of the atom definition that should be correlated to the shipped atom definition file. This file should look like the file that we already described earlier, which is shipped with \pxmcommon: \noindent\filename{polyxmass-common-atom-defs-cat} (see above about the requirement that atom definition catalogues \emph{must} have a filename ending with the \filename{atom-defs-cat} suffix. Typically, the prefix should be the name of the package that brings that catalogue file, such as \filename{polyxmass-common}, which thus yields the \filename{polyxmass-common-atom-defs-cat} catalogue name). Thus we may ship a catalogue file that should be named \filename{polyxmass-n-nmr-atom-defs-cat} listing these contents: \begin{mynoindent} \begin{alltt} n-nmr=/usr/share/polyxmass/atom-defs/polyxmass-n-nmr-atom-def.xml \end{alltt} \end{mynoindent} \noindent This \filename{polyxmass-n-nmr-atom-defs-cat} catalogue file should be installed in the \filename{/etc/polyxmass/atom-defs} directory, thus its absolute file name should be: \filename{/etc/polyxmass/atom-defs/polyxmass-n-nmr-atom-defs-cat}. When our atom definition package is installed, and \pxmng is executed, its catalogue file will be parsed and the atom definition will automatically be made available for use in the whole \pxm software suite. \bigskip At this point, we have to make sure that this new atom definition is used to compute masses when we are working on the polypeptides of interest (the ones that are [$\mathrm{^{15}N}$]-rich); that is, we must let \pxm know that there exists a new notion of a polymer chemistry definition, say ``n-nmr-protein'', for example. As the system administrator, we can create a new polymer chemistry catalogue file, like the one we described earlier: \filename{polyxmass-common-polchem-defs-cat}, but naming it this way, for example: \noindent\filename{polyxmass-n-nmr-polchem-defs-cat}. These files are located in \noindent\filename{/etc/polyxmass/polchem-defs}. The new file should contain this line (each polymer chemistry definition name and its corresponding data \emph{must} be on a single line without space; here, for clarity the line was broken, as symbolised with the ``\verb|\\|'' characters that are absent in the file): \begin{mynoindent} \begin{alltt} n-nmr-protein=/usr/share/polyxmass/polchem-defs/protein/protein.xml\verb|\\| % /usr/share/polyxmass/polchem-defs/protein \end{alltt} \end{mynoindent} What this line says is that there now exists a new polymer chemistry definition, named ``n-nmr-protein'', that uses a pre-existing polymer chemistry definition named ``protein''. When our new polymer chemistry definition catalogue file is installed, and that \pxmng is run, it will parse that catalogue file along with all the other ones and will thus aknowledge that the ``n-nmr-protein'' polymer chemistry definition should use the polymer chemistry definition data located in the directory mentioned on the line above. Now comes the really interesting configuration: we have to let \pxm know that, whenever a polymer chemistry definition ``nmr-protein'' is used, the atom definition ``n-nmr'' is to be used in order to compute masses. To do that we have to create, as root, a new dictionary file, similar to the one that we have described earlier: \filename{polyxmass-common-polchem-defs-atom-defs-dic}, but naming it this way, for example: \filename{n-nmr-protein-polchem-defs-atom-defs-dic}. These files are located in \filename{/etc/polyxmass/polchem-defs}. That file should contain this line: \begin{mynoindent} \begin{alltt} n-mnr-protein=n-nmr \end{alltt} \end{mynoindent} When our new polymer chemistry definition/atom definition dictionary file is installed, this new dictionary file will be parsed by \pxmng and it will thus be known that when using the ``n-nmr-protein'' polymer chemistry definition, the atom definition ``n-nmr'' should be used for any computation. The last action that we should take in order to automatically compute the masses in the ``n-nmr''-specialized way we want, is to tell the polypeptide sequences we are working on that they are of polymer chemistry definition ``n-nmr-protein''. To that end, make a copy of the polymer sequence of interest and change, using a text editor, the contents of the \texttt{<type>} element (that is ``protein'') to ``n-nmr-protein''. Open that new sequence file in a freshly started \pxm program, and the masses should be computed with the new atom definitions. That's the end of the story here. \renewcommand{\sectitle}{Conclusion} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} In this chapter, we have described what file-system hierarchy governs the \pxm understanding of different polymer chemistries. The described set of data/configuration files (and directories) is the minimal set of information that is required for \pxm to operate. The user willing to learn how to create brand new packages that bring to the user new atom definitions and/or new polymer chemistry definitions is invited to carefully study the \pxmdata package, that is optional in the \pxm software suite. The data/configuration files that are brought by both packages (\pxmcommon and \pxmdata) are installed ---by default--- in system directories, like \filename{/usr}, \filename{/etc}, \filename{/usr/local}\dots\ and are thus available to all the users gaining access to the system. \pxm manages a creativity space for the individual user to add atom definitions and/or polymer chemistry definitions for his own use \emph{exclusively}. The next chapter describes in detail the process leading to the customization of the \pxm software suite, by going through the example of adding the ``saccharide'' polymer chemistry definition to the \pxm software suite for individual use (not globally available to the system). \cleardoublepage %%% Local Variables: %%% mode: latex %%% TeX-master: "polyxmass" %%% End: