Sophie

Sophie

distrib > Fedora > 15 > i386 > by-pkgid > 1f34149679700274d273f929cf13b29a > files > 683

PyXB-1.1.2-1.fc15.noarch.rpm

.. _contentModel:

Content Model
=============

PyXB's content model is used to complete the link between the
:ref:`componentModel` and the :ref:`bindingModel`.  These classes are the
ones that:

- determine what Python class attribute is used to store which XML
  element or attribute; 
- distinguish those elements that can occur at most once from those that
  require an aggregation; and
- ensure that the ordering and occurrence constraints imposed by the XML
  `model group <http://www.w3.org/TR/xmlschema-1/#Model_Groups>`_ are
  satisfied, when XML is converted to Python instances and vice-versa.

The classes involved in the content model are in the
:api:`pyxb.binding.content` module, and their relationships are displayed in
the following diagram.

.. image:: Images/ContentModel.jpg

Associating XML and Python Objects
----------------------------------

In the standard code generation template, both element and attribute values
are stored in Python class fields.  As noted in
:ref:`binding_deconflictingNames` it is necessary to ensure an attribute and
an element which have the same name in their containing complex type have
distinct names in the Python class corresponding to that type.  Use
information for each of these is maintained in the type class.  This use
information comprises:

- the original :api:`name <pyxb.binding.content.AttributeUse.name>` of the element/attribute in the XML
- its :api:`deconflicted name <pyxb.binding.content.AttributeUse.id>` in Python
- the private name by which the value is stored in the Python instance dictionary

Other information is specific to the type of use.  The
:api:`pyxb.binding.basis.complexTypeDefinition` retains maps from the
component's name the attribute use or element use instance corresponding to
the component's use.

.. _attributeUse:

Attribute Uses
^^^^^^^^^^^^^^

The information associated with an `attribute use
<http://www.w3.org/TR/xmlschema-1/#cAttributeUse>`_ is recorded in an
:api:`pyxb.binding.content.AttributeUse` instance.  This class provides:

- The :api:`type <pyxb.binding.content.AttributeUse.dataType>` of the
  attribute, as a subclass of :api:`pyxb.binding.basis.simpleTypeDefinition`

- The :api:`default value <pyxb.binding.content.AttributeUse.defaultValue>` of
  the attribute

- Whether the `attribute use
  <http://www.w3.org/TR/xmlschema-1/#cAttributeUse>`_ is 
  :api:`required <pyxb.binding.content.AttributeUse.required>`
  or :api:`prohibited <pyxb.binding.content.AttributeUse.prohibited>`

- Whether the value of the attribute in a binding instance was :api:`provided
  <pyxb.binding.content.AttributeUse.provided>` by an external source or set
  to the default value

- Whether the attribute value is :api:`fixed <pyxb.binding.content.AttributeUse.fixed>`

- Methods to :api:`read <pyxb.binding.content.AttributeUse.value>`, :api:`set
  <pyxb.binding.content.AttributeUse.set>`, and :api:`reset
  <pyxb.binding.content.AttributeUse.reset>` the value of the attribute in a
  given binding instance.

A :api:`map <pyxb.binding.basis.complexTypeDefinition._AttributeMap>` is used
to map from expanded names to AttributeUse instances.  This map is defined
within the class definition itself.

.. _elementUse:

Element Uses
^^^^^^^^^^^^

The element analog to an attribute use is an `element declaration
<http://www.w3.org/TR/xmlschema-1/#cElement_Declarations>`_, and the
corresponding information is stored in a
:api:`pyxb.binding.content.ElementUse` instance.  This class provides:

- The :api:`element binding <pyxb.binding.content.ElementUse.elementBinding>`
  that defines the properties of the referenced element, including its type

- Whether the use allows :api:`multiple occurrences
  <pyxb.binding.content.ElementUse.isPlural>`

- The :api:`default value <pyxb.binding.content.ElementUse.defaultValue>` of
  the element.  Currently this is either C{None} or an empty list, depending
  on :api:`pyxb.binding.content.ElementUse.isPlural`

- Methods to :api:`read <pyxb.binding.content.ElementUse.value>`, :api:`set
  <pyxb.binding.content.ElementUse.set>`, :api:`append to
  <pyxb.binding.content.ElementUse.append>` (only for plural elements), and
  :api:`reset <pyxb.binding.content.ElementUse.reset>` the value of the
  element in a given binding instance

- The :api:`setOrAppend <pyxb.binding.content.ElementUse.setOrAppend>` method,
  which is most commonly used to provide new content to a value

A :api:`map <pyxb.binding.basis.complexTypeDefinition._ElementMap>` is used to
map from expanded names to ElementUse instances.  This map is defined within
the class definition itself.

Content Model Automata
----------------------

The XML `model group <http://www.w3.org/TR/xmlschema-1/#Model_Groups>`_
construct permits a nested specification of legal type instances through
ordered sequences (``sequence``), conjunctions or unordered sequences
(``all``), choices (``choice``), and wildcards (``any``).  The model group can
be considered a form of regular expression, and as such we use `Thompson's
algorithm <http://portal.acm.org/citation.cfm?doid=363387>`_ to construct a
non-deterministic finite automaton which recognizes the set of conforming
documents.  A `powerset construction
<http://en.wikipedia.org/wiki/Powerset_construction>`_ is then used to make
the automaton deterministic, and the resulting automaton is stored as a
:api:`pyxb.binding.content.ContentModel` instance, with a set of :api:`states
<pyxb.binding.content.ContentModelState>` each of which has :api:`transitions
<pyxb.binding.content.ContentModelTransition>` on elements, wildcards, and
model groups with an ``all`` compositor.

The sole complication in the automaton construction is dealing with ``all``
model groups, which accept subsets of a set of nodes in any order.  This
construct produces an exponential increase in the size of the deterministic
finite automaton, so is left as a single :api:`transition
<pyxb.binding.content.ModelGroupAll>` which iteratively matches against the
candidate value until an alternative is found.

.. _arch_content_automata_parsing:

Parsing With Automata
^^^^^^^^^^^^^^^^^^^^^

Automata-based parsing is used for building up a binding instance from a
series of values.

To allow incremental construction of instances, as required by the :api:`SAX
interface <pyxb.binding.saxer.PyXBSAXHandler>` or initialization by constructor
arguments, each complex type with a content model contains a :api:`DFA stack
<pyxb.binding.content.DFAStack>`.  Each level of the stack contains an instance
of :api:`DFA state <pyxb.binding.content._DFAState>`.  Normally the state
specifies the content model and automaton state within that model that
represents the instance's position in a path through the automata, where the
path so far comprises the member elements added to the instance.

The need for a stack of states comes when automaton execution reaches a
transition that involves an :api:`"all" model group
<pyxb.binding.content.ModelGroupAll>`.  Evaluation of such a transition
requires suspending the parent automaton execution and continuing with the
evaluation of the automata that represent alternatives in the model group.

Generation With Automata
^^^^^^^^^^^^^^^^^^^^^^^^

Automaton evaluation is also used to validate that the content of a binding
instance is consistent with type's content model, and to determine a sequence
of contained elements that define a valid path through the automaton.  This
technique is used to create a valid DOM document from a binding instance.

A memoization technique is used, where the state of the system is represented
by a set of element uses (which identify valid consuming transitions), with a
sequence of values for each such use.  The element uses are symbols in the
alphabet of the automaton; the values are a token that permits a transition on
that symbol.  The state of the system also incorporates a sequence of
symbol-value pairs that record the path up to the current position.

The automaton starts in the initial state, then each transition is examined
until one is found for which there is a value available.  The state resulting
from executing that transition is pushed onto a stack, and the remaining
transitions are examined as well.  If no transition can proceed, the state is
discarded and the top state from the stack is evaluated.

When no more symbols remain, if the current state is a final state, the
validation succeeds, and the corresponding sequence is returned as a valid
path.  If a final state cannot be reached, the validation fails.

See the :api:`validation method <pyxb.binding.content.ContentModel.validate>`
for details on how all this really works.

.. ignored
   ## Local Variables:
   ## fill-column:78
   ## indent-tabs-mode:nil
   ## End: