Sophie: polyxmass-doc-0.9.0-1mdv2007.0 noarch

polyxmass-doc-0.9.0-1mdv2007.0.noarch.rpm

\chapter[Basics in Polymer Chemistry]{Basics in\\ Polymer Chemistry}

\label{chap:basics-polymer-chemistry}

This chapter will introduce the basics of polymer chemistry. The way
this topic is going to be covered is admittedly biased towards mass
spectrometry and biological polymers. Moreover, the aim of this
chapter is to provide the reader with the specialized words that will
later be used to describe and explain the (inner) workings of the
\pxm\ program. This manual is not a ``crash course'' in biochemistry!

\renewcommand{\sectitle}{Polymers? Where? Everywhere!}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

Indeed, polymers are everywhere. If you ask somebody to show you
something polymeric, he/she will point you at the first plastic object
in the vicinity. Right, plastic materials are made of hydrocarbon
polymers. But we have many different polymers in our body. Proteins
are polymers, complex sugars are polymers, DNA (the so-called
``molecule of heredity'' is a \emph{huge} polymer. There are polymers
in wine, in wood... Where? Everywhere!

\bigskip The \textsl{Oxford Advanced Learner's Dictionary of Current
  English} gives for \emph{polymer} the following definition:
\textit{\textbf{natural or artificial compound made up of large
    molecules which are themselves made from combinations of small
    simple molecules}}.

\bigskip A polymer is indeed made by covalently linking small simple
molecules together. These small simple molecules are called
\emph{monomer}s, and it is immediate that a \emph{polymer} is made of
a number of monomers. A general term to describe the process that
leads to the formation of a polymer is \emph{polymerization}. It
should be noted that there are many ways to polymerize monomers
together. For example, a polymer might be either linear or branched. A
polymer is linear if the monomers that are polymerized can be joined
at most two times. The first junction links the monomer to an
elongating polymer (thus making it the new end of the elongating
polymer which, by the way, is longer than before by one unit) and the
second junction links the new elongating polymer's end to another
monomer. This process goes on until the reaction is stopped, the point
at which the polymer reaches its \emph{finished state}. A branched
polymer is a polymer in which at least one monomer is able to contract
more than two bonds. It is thus clear that a single monomer linked
three times to other monomers will yield a ``T-structure'', which is
nothing but a branched structure.

In the following sections we'll describe a number of different kinds
of polymers. Each time, they will be described by initially detailing
the structure of their constitutive monomers; next the formation of
the polymer is described. At each step we shall try to set forth each
polymer characteristics in such a manner as to introduce the way \pxm'
``thinks polymers'' and to introduce specialized terminologies.

Once the basic chemistries (of the different polymers) have all been
described, we will enter a more complex subject that is of enormous
importance to the mass spectrometry specialist: polymer chain
disrupting chemistry. We shall see that this terminology actually
involves two kinds of chemistries: cleavage on the one hand and
fragmentation on the other hand.

While \pxm\ is basically oriented to linear single stranded polymer
chemistries, it also can be used to simulate highly complex polymer
chemistries. Biological polymers are the main focus of this manual,
however all the concepts described here may be applied with no
modification (or so slight) to synthetic polymer chemistries.

Well, time has come to make a ``biochemical polymers'' tour. The
reader who feels at home with biopolymers may skip joyfully the next
sections. However, the section pertaining to polymer lysis and
fragmentation should be of interest even to the expert because they
are the opportunity to introduce a ``funny'' terminology that is not
encountered anywhere else (have you ever heard of
\emph{``leftrighrules''} or of \emph{``fragrules''}?!).
 

\renewcommand{\sectitle}{Various Biopolymer Structures}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

Biopolymers are amongst the most sophisticated and complex polymers on
earth and it certainly is not a mistake to take them as examples of
how monomers (be these complex or not) can assemble covalently into
life-enabling polymers. In this section we will visit three different
polymers encountered in the living world: proteins, nucleic acids and
polysaccharides. We shall be concerned with 1) the monomers'
structure, 2) the polymerization reaction and 3) the final capping
reaction responsible for putting the polymer in its \emph{finished
  state}.

\subsection*{Proteins}



These biopolymers are made of amino acids. There are twenty major
amino acids in nature, and each protein is made of a number of these
amino acids. The combinations are infinite, providing enormous
diversity of proteins to the living world.

A protein is a polar polymer: it has a left end and a right end. This
means that the polymerization process is something ordered, from left
to right.

The Figure~\ref{fig:peptbond-formation} shows that the chemical
reaction at the basis of protein synthesis is a \emph{condensation}. A
protein is the result of the condensation of amino acids with each
other in an orderly polar fashion. A protein has a left end (called
\emph{N terminus; amino terminal end}) and a right end (called \emph{C
  terminus; carboxyl terminal end}). The left end is an amino group
($\mathrm{_2}$HN--) corresponding to the amino group of the
non-reacted amino acid. Upon condensation of a new amino acid onto the
first one, the carboxyl group of the first amino acid reacts with the
amino group of the second amino acid. A water molecule is released,
and the formation of a bond between the two amino acids yields a
dipeptide. The right end of the dipeptide (and of a polypeptide
--\textit{i.e.} of a protein-- also, of course) is a carboxyl group
(--COOH) corresponding to the un-reacted carboxyl group of the last
amino acid to have ``polymerized in''.

The bond formed by condensation of two amino acids is an amide bond,
also called --in protein chemistry-- a \emph{peptidic bond}. The
elongation of the protein is a simple repetition of the condensation
reaction shown in Figure~\ref{fig:peptbond-formation}, granted that
the elongation \emph{always} proceeds in the described direction (a
new monomer arrives to the right end of the elongating polymer, and
elongation is done from left to right).

\begin{figure}
  \begin{center}
    \includegraphics[scale=0.25]{figures/raster/peptbond-formation.png}
  \end{center}
  \caption[Peptidic bond formation]{\textbf{Peptidic bond formation by
      condensation.} The left end monomer $\mathrm{R_1}$ is condensed
    to the right end monomer $\mathrm{R_2}$ to yield a peptidic bond.
    A water molecule is lost during the process.}
  \label{fig:peptbond-formation}
\end{figure}

Now we should point at a protein chemistry-specific terminology issue:
we have seen that a protein is a polymer made of a number of monomers,
called amino acids. In protein chemistry, there is a subtlety: once a
monomer is polymerized into a protein it is no more called a monomer,
it is called a \emph{residue}. We could say that a residue is an amino
acid less a water molecule.

From what we have seen until now, we could define a protein this way:
---\textsl{``A protein is a chain of residues linked together in an
  orderly polar fashion, with the residues being numbered starting
  from 1 and ending at n, from the first residue on the left end to
  the last one on the right end''}. This definition is still partly
inexact, however.  Indeed, from what is shown in
Figure~\ref{fig:prot-polymer}, there is still a problem with the
extremities of the polymer chain: what about the amino group on the
left end of a protein (the amino group sits right onto the first amino
acid of the protein), and what about the carboxyl group of the right
end of a protein (the carboxyl group sits right onto the last amino
acid of the protein)? These two groups are un-reacted, in fact. If we
followed the new ``residue-based'' definition of a protein polymer, we
would say that there is a proton in \emph{excess} on the left end and
a hydroxyl in \emph{excess} on the right end. However, these two
chemical groups are not actually in \emph{excess}, they are called (in
\pxm) the \emph{cappings} or \emph{caps} of the polymer (this
terminology is also used in polymer science). They ensure that the
polymer is in a \emph{finished state}, which means that it cannot be
elongated anymore, on whichever end. The proton is the \emph{left cap}
of the protein polymer and the hydroxyl is the \emph{right cap} of the
protein polymer.

\begin{figure}
  \begin{center}
    \includegraphics[scale=0.25]{figures/raster/prot-polymer.png}
   \end{center}
  \caption[A protein is a capped residue chain]{\textbf{End capping
      chemistry of the protein polymer.} A protein is made of a chain
    of residues and of two caps. The left cap is the N-terminal proton
    and the right cap is the C-terminal hydroxyl. Altogether, the
    residual chain (enclosed here in the blue polygon) and both
    red-colored caps (H and OH) do form a complete protein polymer.}
  \label{fig:prot-polymer}
\end{figure}

Now comes the question of unambiguously defining the structure of a
protein. It is commonly accepted that the simple ordered sequence of
each residue code in the protein, from left to right, constitutes an
unambiguous description of the protein's \emph{primary structure}. Of
course, proteins have three-dimensional structures, but this is of no
interest to a program like \pxm, which is aimed at calculating masses
of polymers. To enunciate unambiguously the \emph{sequence} of a
protein, you would use a symbology like this:

\begin{mynoindent}
  {\footnotesize using the 3-letter code of the amino acids:}\\
  Ala Gly Trp Tyr Glu Gly Lys\\
  {\footnotesize or, using the 1-letter code of the amino acids:}\\
  A G W Y E G K\\
  Alanine is thus the residue 1 and Lysine is the last residue
  ($\mathrm{n = 7}$).
\end{mynoindent}

This primer in protein chemistry should be sufficient for the moment.
Let us now go to see how nucleic acids differ from the proteins (and
they do no little).


\subsection*{Nucleic Acids}



These biopolymers are more complex than the proteins are. This is
mainly due to the fact that nucleic acids are composed of monomers
that have three different parts, and because those parts differ in DNA
and RNA. Nucleic acids are made of \emph{nucleotide}s. A nucleotide is
the nucleic acid's brick: \emph{a nucleotide consists of a nitrogenous
  base combined with a ribose/deoxyribose sugar and with a phosphate
  group}. There are two different kinds of nucleic acids:
deoxyribonucleic acid, also known as DNA (the sugar is a deoxyribose)
and ribonucleic acid, also known as RNA (the sugar is a ribose). DNA
is most often found in its double stranded form, while RNA is most
often found in single strand form. There are four nitrogenous bases
for each: Adenine, Thymine, Guanine, Cytosine for DNA; in RNA only one
of these bases changes: Thymine is replaced by Uracile.

A nucleic acid is a polar polymer: it has a left end and a right end
(same as for proteins, remember?). This means that the polymerization
process is something ordered, from left to right (sometimes left is up
and right is down in certain vertical representations found mainly in
textbooks).

This manual is not to teach biochemistry, which is why I am not going
to describe the structure of the monomers in atomic detail. However,
since it is important to understand how the polymerization occurs, I
drew the Figure~\ref{fig:nucacbond-formation} which shows the
polymerization reaction mechanism between a nucleotide and another
one, to yield a dinucleotide.

The Figure~\ref{fig:nucacbond-formation} shows that the chemical
reaction that is at the basis of nucleic acid synthesis is an
\emph{esterification}. A nucleic acid has a left end (called \emph{5'
  end; often this end is phosphorylated}) and a right end (called
\emph{3' end; hydroxyl end}\/). The reaction is the attack of the
phosphorus of the new (deoxy)nucleotide triphosphate by the 3'OH of
the right end of the elongating nucleotidic chain. Upon
esterification, an \emph{inorganic pyrophosphate} (PP$\mathrm{_i}$) is
released, and the formation of a phosphodiester bond between the two
nucleotides yields a dinucleotide. The elongation of the nucleic acid
polymer is a simple repetition of this esterification reaction so that
the chain growth is always in the 5'$\Longrightarrow$3' direction.
This is achieved in the living cells by what is called the
\emph{5'$\Longrightarrow$3' polymerase enzymatic activity}.

The conventional representation of a nucleic acid involves showing the
5' end on the left, and the 3' end on the right, horizontally.
Sometimes, to clearly indicate that the left end is phosphorylated,
while the right end is not, the ends are indicated as ``5'P'' and
``3'OH''.

\begin{figure}
  \begin{center}
    \includegraphics[scale=0.2]{figures/raster/nucacbond-formation.png}
   \end{center}
  \caption[Phosphodiester bond formation]{\textbf{Phosphodiester bond
      formation by esterification.} The arriving monomer (on the
    right) has its triphosphate on the 5' carbon of the sugar
    esterified by nucleophilic attack of the first phosphorus by the
    alcohol function beared by the 3' carbon of the (deoxy)ribose
    sugar ring of the left monomer. The bond that is formed is a
    phosphodiester bond, with release of a pyrophosphate group
    ($\mathrm{P_i}$). Note that the sugar and nitrogenous bases are
    schematically represented in this figure.}
  \label{fig:nucacbond-formation}
\end{figure}

Figure~\ref{fig:nucac-polymer} shows a simple way to formalize what a
nucleic acid polymer is. The molecule represented on the left is the
representation of the ``monomer'' in the sense that the polymer is
made of a number of these monomers (if you put in the presented
formula the proper nitrogenous base and the proper sugar --ribose or
deoxyribose--, you will get the nucleotide of your choice). We have
seen previously that, in the specific case of the protein polymer
chemistry, the monomer is called residue once it is polymerized into
the polymer chain. In the case of the nucleic acids, there is no such
specific term, we just call the monomeric unit a nucleotide. The
formula represented on the left of the Figure~\ref{fig:nucac-polymer}
shows the repetitive element in a nucleic acid polymer, exactly the
same way as we had shown the residue formula in the protein polymer
chemistry section. Indeed, as we had explained earlier with proteins,
the formula shown on the right of the Figure~\ref{fig:nucac-polymer}
illustrates that the nucleic acid polymer needs to be set to a
\emph{finished state}. The atoms shown in red (outside the boxed
repetitive elements) are the nucleic acid \emph{caps}. Thus, we see
clearly that in the case of the nucleic acid polymers, the left cap is
a hydroxyl and the right cap is a proton. This anecdotically happens
to be the exact converse of what we described earlier for proteins.

\begin{figure}
  \begin{center}
    \includegraphics[scale=0.25]{figures/raster/nucac-polymer.png}
      \end{center}
  \caption[A nucleic acid is a capped nucleotide chain]{\textbf{End
      capping chemistry of the nucleic acid polymer.} A nucleic acid
    is made of a chain of nucleotides (left formula) and of two caps.
    The left cap is the hydroxyl group that belongs to the terminal
    phosphate of the 5' carbon of the sugar. The right cap is the
    proton that belongs to the hydroxyl group of the 3' carbon of the
    sugar ring (right formula). Altogether, a finished nucleic acid
    polymer is made of the nucleotidic chain (enclosed here in the
    blue polygon), made of the repetitive elements (one of which is
    shown on the left), and of the two caps (red-colored OH and H, out
    of the box on the right).}
  \label{fig:nucac-polymer}
\end{figure}

Now comes the question of unambiguously defining the structure of a
nucleic acid. It is commonly accepted that the simple ordered sequence
of the named nitrogenous bases in the nucleic acid, from left (5' end)
to right (3' end), constitutes an unambiguous description of the
nucleic acid sequence. To enunciate the sequence of a gene, you would
use a symbology like this:

\begin{mynoindent}
  {\footnotesize for a DNA, using the 1-letter code of the nitrogenous
    bases:} A T G C A G T C\\
  {\footnotesize for an RNA, using the 1-letter code of the
    nitrogenous bases:} A U G C A G U C\\
  Adenine is thus the base 1 and Cytosine is the last base ($\mathrm{n
    = 8}$).
\end{mynoindent}


\subsection*{Polysaccharides}



These biopolymers are almost certainly amongst the more complex in the
living world. This is mainly due to the fact that saccharides are
usually heavily modified in living cells. There are a huge variety of
chemical modifications occurring on these biopolymers. Furthermore,
the ramifications in the polymer structure are more often the normal
situation than not. Interestingly these molecules are first thought of
as the ``fuel'' for the cell, which is certainly far from being total
non-sense, but it is clear that their structural role is extremely
important. Their ability to form complex structures has been exploited
in living systems for identification processes. There are a number of
complex sugars on the cell walls\dots

Nonetheless, the general picture is not that complex, if we only think
of the way monomers are polymerized together. As far as we are
concerned, in fact, the polymerization mechanism is a simple
condensation. In this respect, everything looks much like with
proteins; some people do use the same terminology: a monomer sugar
becomes a residue once polymerized in the saccharidic chain.

There are two main different kinds of sugars: \emph{pentoses} (in
$\mathrm{C_5}$) and \emph{hexoses} (in $\mathrm{C_6}$); it should be
noted, however, that there is a variety of other common molecules,
like \emph{sialic acids}, \emph{heptose}\dots

A saccharidic polymer is polar: it has a left end and a right end
(same as for proteins and nucleic acid, should you remember!). This
means that the polymerization process is something ordered, from left
to right. The terminology regarding the ends of a saccharidic polymer
is rather unexpected at first sight: the left end is said to be the
\emph{non-reducing end} while the right end is said to be the
\emph{reducing end}. Historically this was observed with
monosaccharides (also called \emph{monoses}), which reduced cupric
($\mathrm{Cu^{2+}}$) ions, thus getting oxydized themselves on the
carbonyl (when in the open ring aldehydic form).

Figure~\ref{fig:sacchbond-formation} shows the polymerization reaction
between a sugar and another one (2 glucose monomers, actually), to
yield a maltose disaccharide. The polymerization mechanism is a simple
condensation. The elongation of the polysaccharidic polymer is a
simple repetition of this condensation reaction so that the chain
growth is always in the same orientation, from non-reducing end to
reducing end.

The conventional representation of a polysaccharide involves showing
the non-reducing end on the left, and the reducing end on the right,
horizontally.

\begin{figure}
  \begin{center}
    \includegraphics[scale=0.2]{figures/raster/sacchbond-formation.png}
  \end{center}
  \caption[Osidic bond formation]{\textbf{Osidic bond formation by
      condensation.} The two monomers are subject to condensation with
    loss of one molecule of water.}
  \label{fig:sacchbond-formation}
\end{figure}

Figure~\ref{fig:sacch-polymer} shows a simple way to formalize what a
saccharidic polymer is. The top formula is the representation of the
``monomer'' in the sense that the polymer is made of a number of these
monomers. The bottom formula represents a polysaccharide, with the
repetitive elements boxed (there are n monomers polymerized). The
atoms shown in red (outside the boxed repetitive elements) are the
saccharidic polymer \emph{caps}. Thus, we see clearly that in the case
of polysaccharides, the left cap is a proton and the right cap is a
hydroxyl. This anecdotically happens to be identical to the protein
case and the exact converse of what we described previously for
nucleic acids.

\begin{figure}
  \begin{center}
    \includegraphics[scale=0.25]{figures/raster/sacch-polymer.png}
  \end{center}
  \caption[A saccharidic polymer is a capped osidic residue
  chain]{\textbf{End capping chemistry of the polysaccharidic
      polymer.} A polysaccharide is made of a chain of osidic residues
    (blue-boxed formula) and of two caps (red-colored atoms). The left
    cap is the proton group that belongs to the non-reducing end of
    the polymer. The right cap is the hydroxyl group that belongs to
    the reducing end of the polymer.}
  \label{fig:sacch-polymer}
\end{figure}

Now comes the question of unambiguously defining the structure of a
saccharidic polymer. It is commonly accepted that the simple ordered
sequence of the named monoses in the saccharidic polymer, from left
(non-reducing end) to right (reducing end), constitutes an unambiguous
description of the glycan sequence. To enunciate the sequence of a
glycan, you would use a symbology like this:

\begin{mynoindent}
  {\footnotesize using a  3-letter code:}\\
  Ara Gal Xyl Glc Hep Man Fru\\
  Arabinose is thus the monose 1 and Fructose is the last monose
  ($\mathrm{n = 7}$).
\end{mynoindent}

Incidentally, this is where the ability of \pxm\ to handle monomer
codes of non-limited length comes in handy!

\renewcommand{\sectitle}{To Sum Up}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

rapidly made an overview of the three major polymers in the living
world. A great many other polymers exist around us.

Table~\ref{tab:three-biopolym-exples} on
page~\pageref{tab:three-biopolym-exples} tries to sum up all the
informations gathered so far. Note that the formulae given for the
monomers are the ``residual'' ones. For example, the formula of the
glycyl residue corresponds to the formula of the Glycine monomer less
one molecule of water.

\begin{table}
    \begin{small}
      \begin{tabular}{c|ccccc}\hline
        polymer     &   name & code  &    formula                &  left cap  & right cap \\ 
        \hline
        protein     &   &      &                           &      H         &           OH       \\
        & Glycine   &   G     & $\mathrm{C_2H_3O_1N_1}$   &                &                    \\
        & Alanine   &   A     & $\mathrm{C_3H_5O_1N_1}$   &                &                    \\
        & Tyrosine  &   T     & $\mathrm{C_9H_9O_2N_1}$   &                &                    \\
        nucleic acid&   &      &                           &      OH        &            H       \\
        & Adenine   &   A     & $\mathrm{C_{10}H_{12}O_5N_5P_1}$ &         &                    \\
        & Cytosine  &   C     & $\mathrm{C_9H_{12}O_6N_3P_1}$    &         &                    \\
        saccharide  &   &      &                           &      H         &            OH      \\
        & Arabinose &   Ara   & $\mathrm{C_5H_8O_4}$      &                &                    \\
        & Heptose   &   Hep   & $\mathrm{C_7H_{12}O_8}$   &                &                    \\
        \hline
        \multicolumn{6}{c}{Note: LC=left cap; RC= right cap}\\
        \hline
      \end{tabular}
      \caption[Comparison of three common biopolymers]{\textbf{Quick comparison of three biopolymers with examples of monomers}}\label{tab:three-biopolym-exples}
    \end{small}
\end{table}

Many synthetic polymers are much simpler than the ones we have rapidly
reviewed, and it should be clear that, if \pxm\ can deal with the
complex biopolymers described so far, it certainly will be very
proficient with less complex synthetic polymers. Describing the
formation of polymers is one thing, but we also have to describe how
to disrupt polymers. This is what we shall do in the next section.

\renewcommand{\sectitle}{Polymer Chain Disrupting Chemistry}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

\label{sect:pol-chain-disrupt-chem}

As we initially spoke of ``polymer chain disrupting chemistry''
earlier, we said that this was a complex subject, and that it was of
\emph{enormous} importance to the mass spectrometrist. This is why we
will treat this subject in a pretty thorough manner.

First of all we should insist on the fact that chemically modifying a
polymer does not necessarily mean that the chain structure of the
polymer is perturbed. Here, however, we are concerned specifically
with the chemical modifications that yield a polymer chain
perturbation; \emph{cleavage} and \emph{fragmentation}:

\begin{itemize}
\item \textsc{A cleavage is a chemical process} by which a molecule
  will act directly on the polymer making it fall into at least two
  separated pieces (the \emph{oligomers}). As a result of the cleavage
  reaction, groups originating in the cleaving molecule remain
  attached to the polymer at the precise cleavage location;
\item \textsc{A fragmentation is a chemical process} by which the
  polymer structure is disrupted into separated pieces (the
  \emph{fragments}) mainly because of energy-dependent electron
  doublet rearrangements leading to bond breakage.
\end{itemize}

Here are the details pertaining to each one of these two very
different processes:

\subsection*{Polymer Cleavage}



We said above that, upon cleavage of a polymer, the cleaving molecule
reacts with it, and by doing so directly or indirectly
``\emph{dissolves}'' an inter-monomer bond. A polymer cleavage always
occurs in such a way as to generate a set of \emph{true} polymers
(smaller in size than the parent polymer, evidently, which is why they
are called \emph{oligomers}). Indeed, let us take the example shown in
Figure~\ref{fig:prot-cleavage}, where a tripeptide (a very little
protein, containing a methionyl residue at position 2) is submitted
either to a water-mediated cleavage (hydrolysis, upper panel) or to a
cyanogen bromide-mediated cleavage (lower panel). The two cases
presented in this figure are similar in some respects but different in
other respects:

\begin{itemize}
\item in both cases the bond that is cleaved is the inter-monomer bond
  (in protein chemistry this is a peptidic bond);
\item in both cases the Oligomer 2 has the same structure;
\item in the first case the molecule that is responsible for the
  cleavage is water, while in the second case it is cyanogen bromide;
\item the structures of the Oligomer 1 species differ when produced
  using water or cyanogen bromide as the cleaving molecule.
\end{itemize}

\begin{figure}
  \begin{center}
    \includegraphics[scale=0.3]{figures/raster/prot-cleavage.png}
  \end{center}
  \caption[Protein cleavage by water and cyanogen
  bromide]{\textbf{Protein cleavage by water and cyanogen bromide.} A
    tripeptide (pretty small protein) is cleaved at position 1 either
    by hydrolysis (top) or by cyanogen bromide (bottom). Cyanogen
    bromide cleaves specifically on the right of a methionine
    monomer.}
  \label{fig:prot-cleavage}
\end{figure}

The difference between hydrolysis and cyanogen bromide cleavage is the
Oligomer 1 species: the cyanogen bromide cleavage has a side effect of
generating a homoserine as the right end monomer of Oligomer 1, while
hydrolysis generates a genuine methionine monomer. This is because
water reverses in a very symmetrical manner what polymerization did
(hydrolysis is the converse of condensation), while cyanogen bromide
did some chemical modification onto the generated Oligomer 1 species.

Nonetheless, the reader might have noted that --interestingly-- all
the four oligomers do effectively have their left cap (a proton) and
their right cap (the hydroxyl). This means that in both water and
cyanogen bromide-mediated cleavage, all the generated oligomers are
indeed true polymers in the sense that: 1) they are a chain of
monomers (modified or not) and 2) they are correctly capped
(\textit{i.e.} they are polymers in their finished state). This is
important because it is the basis on which we shall make the
difference between a cleavage process and a fragmentation process.

Thus, the \pxm\ definition of an oligomer might be: \emph{an oligomer
  is a polymer (of at least one monomer) in its finished state that
  was generated upon cleavage of a longer polymer}.


When the polymer cleavage reaction precisely reverses the reaction
that was performed for the same polymer's synthesis, there is no
special difficulty. But when the cleavage reaction modifies the
substrate, then this should be carefully modelled. How? To answer this
question we might start by comparing the two different Oligomer 1
species that were yielded upon the water-mediated and the cyanogen
bromide-mediated cleavage reactions: ``the hydrolysis-generated
Oligomer 1 is equal to the cyanogen bromide-generated Oligomer 1 +S1
+C1 +H2 -O1''; this is a big difference! The observations we did so
far might be worded this way:

\textsl{Whenever a protein undergoes a cyanogen bromide-mediated
  cleavage, the \[\textrm{``-C1H2S1+O1''}\] chemical reaction should
  be applied to the resulting oligomers \textit{if and only if} they
  have a methionine monomer at their right end}. This logical
condition is called, in \pxm' jargon, a \emph{leftrightrule}, and will
be described later (see page~\pageref{sect:cleavespecif}).

Well, this sounds reasonable. But what about the ``normal'' case, when
the cleavage is done using water? Nothing special: the mass of the
oligomer is calculated by summing the mass of each monomer in the
oligomer (since the monomers are not modified this is easily done) and
the masses corresponding to both the left and right caps (these are
defined in the polymer chemistry definition; in our present case it
would be a proton on the left end, and a hydroxyl on the right end).
In this way, the oligomer complies with its definition, which states
that it is a faithful polymer made of monomers and that it is in its
finished state.

Yes, but then how will \pxm\ manage to calculate the mass of the
modified oligomer, like our Oligomer 1 in the case of the cyanogen
bromide-mediated cleavage?  Simple enough, in a first step it does
exactly the same way as for the unmodified oligomer. Next, each
oligomer is checked for presence or absence of a methionine residue on
its right end. If a methionine is found, the mass corresponding to the
``-C1H2S1+O1'' chemical reaction is applied. And that's it!

In the previous cyanogen bromide example, the logical condition was
involving the identity of the oligomers' right end monomer, but other
examples can involve not the right end monomer, but the left end
monomer, if some chemical modification was to occur to the monomer
sitting right of the cleavage location. In this case the user would
have to analyse the situation and provide \pxm\ with the proper
chemical reaction by stating something analog to: \textsl{\textit{if
    and only if} they have a Xyz monomer at their left end} (note the
partial analogy with the case described above).

For the moment this is enough polymer cleavage abstraction, as the
rest of the description pertaining to the cleavage specification
definition is thoroughly detailed at page~\pageref{sect:cleavespecif}.


\subsection*{Polymer Fragmentation}

\label{sect:polymer-fragmentation}



In a fragmentation process, the bond that is broken is not necessarily
the inter-monomer bond. Indeed, fragmentations are oft-times high
energy chemical processes that can affect bonds that belong to the
monomers' internal structure. This is one of the reasons why
fragmentations do differ from cleavages: they are specific of the
polymer type in which they occur. Hydrolyzing a protein and an
oligosaccharide is just the same process, from a chemical point of
view. But fragmenting a protein or an oligosaccharide are truly
different processes because the way that the fragmentation happens in
the polymer sequence is so much dependent on the nature of each
monomer that makes it.

Another peculiarity of the fragmentations, compared with the cleavages
that were described above, is the fact that there is no cleaving
molecule starting the process. Instead, a fragmentation process is
often initiated by an intra molecular electron doublet rearragement
that propagates more or less in the polymer structure to eventually
break it. Fragmentations are mainly a gas phase process, not some
reaction that happens in solution as a result of putting in contact
the polymer and some reagent. It is precisely because no cleaving
molecule is involved in the fragmentation process that the fragments
are not necessarily capped like a normal polymer should be; and this
is another really important difference between cleavage and
fragmentation.

Let us illustrate these concepts through two examples: proteins and
nucleic acids.

\subsubsection*{Protein Fragmentation}

There is a pretty important number of different kinds of fragments
that can be generated upon fragmentation of peptides. We are going to
detail the most common ones; the user is invited to use the \pxm'
fragmentation-specification grammar to add less frequent (or newly
discovered) fragmentation types.

\begin{figure}
  \begin{center}
    \includegraphics[scale=0.2]{figures/raster/prot-fragmentation.png}
  \end{center}
  \caption[Protein fragmentation]{\textbf{Protein fragmentation
      patterns most widely encountered.} An hexapeptide is fragmented
    in the seven most widely encountered manners, such as to generate
    a, b, c, x, y, z and immonium fragment ions. The figure
    illustrates the position of the cleavage for each kind of fragment
    (exemplified using the case of the smallest fragment possible) and
    the mass calculation method is described for each fragment kind;
    consider that each fragment bears only \emph{one positive}
    charge.}
  \label{fig:prot-fragmentation}
\end{figure}

As can be seen from Figure~\ref{fig:prot-fragmentation}, the
fragmentations do generate fragments of three categories: the ones
that include the left end of the precursor polymer (a, b, c), the ones
that include the right end of the precursor polymer (x, y, z), and
finally the special case in which the fragment is an \emph{internal
  fragment}, like the immonium ions. When looking at the
fragmentations described in the figure it becomes immediately clear
why a fragmentation cannot be mistaken for a cleavage: the ionization
of the fragment is not necessarily due to the captation of a proton by
the fragment. Furthermore, we can also see that a fragmentation is not
a cleavage because the fragment that is generated is \emph{absolutely}
not necessarily what we call a polymer, in the sense that the fragment
might not be capped the same way as the precursor polymer is (in its
finished state).

The two observations above should make clear to the reader that
calculating masses for fragments is a more difficult process than what
was described above for the oligomers. Indeed, while it was simple to
calculate the mass of an oligomer (by simply adding the masses of its
constitutive monomer units, plus the left and right caps, plus
ionization), here there is no chemical formalism generally applicable
to all the fragment types. This is why the specification of the
fragmentation is left to the user's responsibility.

By looking at Figure~\ref{fig:prot-fragmentation}, the reader should
have noticed that the fragment naming scheme takes into consideration
the fact that the fragment bears the left or the right end of the
precursor polymer (or none, also). Indeed, the numbering of fragments
holding the left end of the precursor polymer sequence begins at the
left end, and for fragments that hold the right end at the right end.
Thus the third fragment of series \emph{a} --\emph{a3}-- would
involve monomers [1$\rightarrow$3]; and the third fragment of series
\emph{y} --\emph{y3}-- would involve monomers [6$\rightarrow$4] (in
the figure these left-to-right and right-to-left directions are
symbolized using arrows). Therefore, it should appear to the reader
how important --when specifying a fragmentation-- it is to clearly
indicate from which end of the precursor polymer the fragment is
generated (in \pxm\ jargon this is ``LE'' for left end, ``RE'' for
right end and ``NE'' for no end). \pxm\ knows what action it should
take when it encounters one of these three specifications; for
example, if a ``LE'' specification is found for a given fragmentation
specification, \pxm\ adds to the fragment's mass the mass
corresponding to the left cap of the precursor polymer.

Now that the stage is set we can start rationalizing fragment
specifications, and thus mass calculations.

\paragraph{\emph{a} fragment series} If we take the \emph{a} fragment
series, the Figure~\ref{fig:prot-fragmentation} indicates that the
fragments include the left end and that their last monomer lacks its
carbonyl group (see, on top of Figure~\ref{fig:prot-fragmentation},
that the \emph{a1} arrow goes between the C$\alpha$H and the CO of
monomer 1?).  So we would say that each fragment of the \emph{a}
series should be challenged with the following chemical treatments: 1)
addition of the mass corresponding to the left cap (proton), 2)
removal of the mass corresponding to the lacking CO group. This way we
have the mass of fragment \emph{a1}. If we were interested in the
fragment \emph{a4} we would have summed the masses of monomers 1 to 4,
added the mass of the left cap, and finally removed the mass of a CO;
that's it. The mass calculation is thus mathematically expressed \[a_i
= LC + \sum_{1}^{i} M_i - CO\]

\paragraph{\emph{b} fragment series} Similarly, the mass calculation
is mathematically expressed \[b_i = LC + \sum_{1}^{i} M_i\]

\paragraph{\emph{c} fragment series} The mass calculation is
mathematically expressed \[c_i = LC + \sum_{1}^{i} M_i + NH_3\]

\paragraph{\emph{x} fragment series} For this series of fragments we
do not add the left cap anymore, but replace it with the right cap,
since the fragments hold the right end of the precursor polymer. Note
also that the numbering of the monomers using the variable \emph{i} in
the following mathematical expressions goes from right to left
(contrary to what happened for the \emph{a, b, c} fragment series. All
the fragments that hold the precursor polymer right end are numbered
this way, so this applies to fragments \emph{x, y, z}. The mass
calculation is mathematically expressed \[x_i = RC + \sum_{1}^{i} M_i
+ CO\]

\paragraph{\emph{y} fragment series} The calculation is mathematically
expressed \[y_i = RC + \sum_{1}^{i} M_i + H_2\]

\paragraph{\emph{z} fragment series} In low energy CID, the \emph{z}
fragments are expressed this way: \[z_i = RC + \sum_{1}^{i} M_i - NH\]
which is equivalent to \emph{y-$NH_3$}; in high energy CID an
additional proton is often measured: \[z_i = RC + \sum_{1}^{i} M_i -
NH + H\]

\paragraph{\emph{immonium} fragment series} These fragments are
internal fragments in the sense that they do not hold neither of the
two precursor polymer's ends. \pxm\ understands that the user is
speaking of this kind of fragment when the ``from which end'' piece of
data --in the fragmentation specification-- states ``NE'' instead of
``LE'' or ``RE'' (see page~\pageref{sect:fragspecif}). The mass
calculation for these fragments does not take into account the
monomers surrounding the one for which the calculation is done. The
mass for an immonium ion --at position \emph{i} in the precursor
polymer-- will be the mass of the monomer at position \emph{i}, less
the mass of a CO, plus the mass of a proton. The mass calculation for
these special internal fragments is expressed \[imm_i = M_i + H - CO\]


\subsubsection*{Nucleic Acid Fragmentation}


The fragmentations that can be obtained with nucleic acid are numerous
and it is more complicated than with proteins to describe them fully.
The main reason for this is that there are a big number of
fragmentation combinations because of the loss of nitrogenous bases
from the skeleton. The mechanisms by which this loss happens are
fairly complex, and I am not going to detail any of them.  Figure
~\ref{fig:dna-fragmentation} shows the most common fragmentations
(without taking into consideration the potential loss of bases). An
example of fragment is given for each fragment series (pretty the same
way as we did before for proteins). Note that the fragment
representations are aimed at helping the reader to figure out what the
product ion is, not taking into account where the negative charge lies
on the fragment, since this charge can float around at every
de-protonatable group. All the fragments shown bear one and one only
negative charge.

The reader might have noticed --at the bottom of the figure-- that a
provision is made in the case the fragmented molecular species are not
5' end-phosphorylated but 5' end-hydroxylated. Indeed, the canonical
monomer is such that, upon polymerization and left capping, the 5' end
is phosphorylated. However, oft-times the oligonucleotides are
synthesized chemically without the 5' end phosphate group, thus ending
in hydroxyl. This special case should be accounted for by applying to
all the fragments that bear the left end of the precursor polymer the
following chemical reaction: $\mathrm -HPO_3$. This chemical reaction
should be applied \emph{in addition} to the chemical reaction that
yields the fragment \emph{per se}.

\begin{figure}
  \begin{center}
    \includegraphics[scale=0.2]{figures/raster/dna-fragmentation.png}
  \end{center}
  \caption[DNA fragmentation]{\textbf{DNA fragmentation patterns most
      widely encountered.} A short DNA sequence is fragmented in the
    eight most widely encountered manners, such as to generate a, b,
    c, d, w, x, y, z fragment ions. The figure illustrates the
    position of the cleavage for each kind of fragment (exemplified
    using the case of the smallest fragment possible). and the mass
    calculation method is described for each fragment kind;
    considering that each fragment is protonated only once (+1).}
  \label{fig:dna-fragmentation}
\end{figure}

Exactly as we did for the protein fragments, we are giving below the
mathematical expressions used to calculate the mass of different
series of nucleic acid fragments; in these calculations we assume that
the left end of the precursor polymer is phosphorylated (5' P) and the
reader should bear in mind that this precise phosphate might itself be
expelled by the fragmentation. The fragment naming scheme
consideration that we emitted for protein fragments above
(left-to-right or, conversely, right-to-left) applies here also in an
identical manner.

\paragraph{\emph{a} fragment series} 
These fragments most often appear with base loss. \[a_i = LC +
\sum_{1}^{i} M_i - O\]

\paragraph{\emph{b} fragment series}
\[b_i = LC + \sum_{1}^{i} M_i\]

\paragraph{\emph{c} fragment series}
\[c_i = LC + \sum_{1}^{i} M_i - HPO_2\]

\paragraph{\emph{d} fragment series}
\[d_i = LC + \sum_{1}^{i} M_i - HPO_3\]

\paragraph{\emph{w} fragment series}
\[w_i = RC + \sum_{1}^{i} M_i + O\]

\paragraph{\emph{x} fragment series}
\[x_i = RC + \sum_{1}^{i} M_i\]

\paragraph{\emph{y} fragment series}
\[y_i = RC + \sum_{1}^{i} M_i - HPO_2\]

\paragraph{\emph{z} fragment series}
\[z_i = RC + \sum_{1}^{i} M_i - HPO_3\]

There are also a variety of fragments for which a base is lost. But we
cannot describe them all!

\subsubsection*{More Complex Patterns Of Fragmentation}


Before finishing with fragmentations, it is necessary to describe a
powerful feature of the fragmentation specification grammar available
in \pxm. This feature was required for the fragmentation of
oligosaccharides and also sometimes for proteins. When the
fragmentation (the bond breakage reaction itself) occurs at the level
of certain monomers, it might be necessary to be able to specify some
particular chemistry that would arise on the monomer in question.

We have seen in the cleavage documentation that, upon cleavage of a
protein sequence with cyanogen bromide, for example, a particular
chemical reaction had to be applied to the oligomers that were
generated with a methionine monomer as their right end monomer. Well,
in a fragmentation specification it is possible to apply comparable
chemical reactions but in a more thorough manner. Indeed, while in the
cleavage it was possible to say something like ``\textsl{apply a given
  chemical reaction to the oligomer if the right end monomer is
  Xyz''}, in the fragmentation the logical condition can be bound not
only to the identity of the currently fragmented monomer, but also
(optionally) to the identity of the previous and/or next monomer in
the precursor polymer sequence. For example: ---\textsl{``Apply a
  given chemical reaction if fragmentation occurs at the level of
  ``Xyz'' monomer only if it is preceded by a ``Yxz'' monomer and
  followed by a ``Zyx'' monomer''}.

These logical conditions are called \emph{fragrules}. A
\emph{fragspecif} can hold as many \emph{fragrules} as necessary. Thus
we see that a fragmentation specification is a multi-part
specification, with a \emph{fragspecif} optionally integrating
\emph{fragrule} objects\dots All of this is described in great detail
at page~\pageref{sect:fragspecif}.

\subsubsection*{To Sum Up}


To sum up all what we have seen so far with polymer chain disrupting
chemistries:

\begin{itemize}
\item A polymer sequence gets cleaved into oligomers when a chemical
  reaction occurs in it at the level of one or more inter-monomer
  bond(s); monomer-specific chemical reactions can be modelled into
  the cleavage specification using at most one leftrighrule;
\item A polymer sequence gets fragmented into fragments when a bond
  breakage occurs, without the help of any exterior molecule, at any
  level of the polymer structure, with no limitation to the
  inter-monomer bond; monomer-specific chemical reactions can be
  modelled into the fragmentation specification using any number of
  fragrules;
\item Oligomers are automatically capped --\emph{on both ends}--
  using the rules described in the precursor polymer's definition;
\item Fragments are capped automatically only --\emph{on the end they
    hold, if any}-- using the rules described in the precursor
  polymer's definition;
\item Oligomers are automatically ionized (if required by the user)
  using the rules described in the precursor polymer's definition;
\item Fragments are never ionized automatically; ionization (gain/loss
  of a charged group) is necessarily integrated in the fragmentation
  specification.
\end{itemize}


\cleardoublepage


%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "polyxmass"
%%% End: