

distrib > Fedora > 14 > x86_64 > media > updates > by-pkgid > 0e54ba0ee564ce6063a5e83aa86060c5 > files > 450


    <title>&xml; Support</title>

      There are three levels of support for &xml; with &est;.

      <title>Loading as an Utterance</title>
	A built in &xml; parser allows text marked up according to an
	&xml; DTD to be read into an
	<classname>EST_Utterance</classname> (see <xref
	linkend="xmltoutterance" endterm="xmltoutterancetitle">).

	A &cpp; class <classname>XML_Parser_Class</classname> (see
	<link linkend="xmlparserclass">The documentation for that
	class</link>) which makes it relatively simple to write
	specialised &xml; processing code.

	<productname>RXP</productname> XML parser is included and can be
	used directly(<xref linkend="rxpparser">). 

    <sect1 ID="xmltoutterance">
      <title ID="xmltoutterancetitle">Reading &xml; Text As An <classname>EST_Utterance</classname></title>
	In order to read &xml; marked up text, the &est; code must be
	told how the &xml; markup should relate to the utterance
	structure. This is done by annotating the DTD using which the
	text is processed. 
	There are two posible ways to anotate the DTD. Either a new
	DTD can be created with the anotations added, or the
	anotations can be included in the &xml; file.

	  <title>A new DTD</title>

	  To write a new DTD based on an existing one, you should include
	  the existing one as follows:
	  <programlisting arch='xml'>
	    &lt;!-- Extended FooBar DTD for speech tools --&gt;

	    &lt;!-- Include original FooBar DTD --&gt;
	    &lt;!ENTITY % OldFooBarDTD PUBLIC "//Foo//DTD Bar"

	    &lt;!-- Your extensions, for instance... --&gt;

	    &lt;!-- syn-node elements are nodes in the Syntax relation  --&gt;
	    &lt;!ATTLIST syn-node relationNode CDATA #FIXED "Syntax" &gt;</programlisting>

	  <title>In the &xml; file</title>

	  Extensions to the DTD can be included in the 
	  <sgmltag>!DOCTYPE</sgmltag> declaration in the marked up
	  text. For instance:
	  <programlisting arch='xml'>
	    &lt;?xml version='1.0'?&gt;
	    &lt;!DOCTYPE utterance PUBLIC "//Foo//DTD Bar"
		&lt;!-- Item elements are nodes in the Syntax relation  --&gt;
		&lt;!ATTLIST item relationNode CDATA #FIXED "Syntax" &gt;

	    &lt;!-- Actual markup starts here --&gt;</programlisting>

	<title>Summary of DTD Anotations</title>
	  The following attributes may be added to elements in your
	  DTD to describe it's relation to <link
		The value should be a comma separated list of
		attributes which should be set as features on the
		utterance. Each attribute can be either a simple
		identifier, or two identifiers separated by
		A value <literal>foo:bar</literal> causes the value of
		the <literal>foo</literal> attribute of the element to be
		set as the value of the Utterance feature
		A simple identifier <literal>foo</literal> causes the
		<literal>foo</literal> attribute of the element to be
		set as the value of the Utterance feature
		<literal>X_foo</literal> where <token>X</token> is the
		name of the element.

		The value should be a comma separated list of
		attributes which should be set as features on the
		relation related to this element. It's format and
		meaning is the same as for

		Indicates that this element defines a relation. All
		elements inside this one will be made nodes in the
		relation, unless they are explicitly marked to be
		ignored by <sgmltag>estRelationIgnore</sgmltag>. The
		value of the <sgmltag>estRelationElementAttr</sgmltag>
		attribute is the name of an attribute which gives the
		name of the relation. 

		When an element has a
		<sgmltag>estRelationElementAttr</sgmltag> tag to indicate it's
		content defines a relaion, it may also have the
		<sgmltag>estRelationTypeAttr</sgmltag> tag. This gives
		the name of an attribute which gives the type of
		relation. Currently only a type of `list' or `linear'
		gives a lienar relation, anything else gives a tree.

		If this is set to any value on an element which would
		otherwise be interpreted as an
		<classname>EST_Item</classname> in the current
		relation, the element is passed over. The contents
		will be processed as if they had been directly inside
		this element's parent.

		When placed on an element, indicates that this element
		is to be interpreted as an item in the relation named
		in the value of the attribute.

		The value of this attribute defines how ranges in
		<sgmltag>href</sgmltag> attributes are expanded for
		this element. If the value is <token>replace</token>
		the nodes created during expansion are placed at the
		same level in the hierachy as the original element. If
		the value is <token>embed</token> they are created as
		children of a new node.

		The value of this attribute is the featre which is set
		to the contents of the current element.



    <sect1 id='xmlparserclass'>
      <title>The <classname>XML_Parser_Class</classname> &cpp; Class </title>

	The &cpp; class  <classname>XML_Parser_Class</classname>
	(declared in <filename>rxp/XML_Parser.h</filename>) defines an
	abstract interface to the &xml; parsing process. By
	breating a cub-class of
	<classname>XML_Parser_Class</classname> you can create code to
	read &xml; marked up text quite simply.

	<title>Some Definitions</title>

	  <listitem><para>An &xml; parser is an object which can
	      analyse a piece of text marked up according to an &xml;
	      doctype and perform actions based on the markup. One
	      &xml; parser deals with one text.
	      An &xml; parser is represented by an instance of the
	      class <classname>XML_Parser</classname>.
	  <listitem><para>An &xml parser class is an object from which
	      &xml parses can be created. It defines the behaviour of
	      the parsers when they process their assigned text, and
	      also a mapping from &xml; entity IDs to places to look
	      for them.
	      An &xml; parser class is represented by an instance of
	      <classname>XML_Parser_Class</classname> or a subclass of

	<title>Creating An &xml; Processing Procedure</title>
	  In order to create a procedure which will process &xml;
	  marked up text in the manner of your choice you need to do 4
	  things. Simple examples can be found in
	  <filename>testsuite/</filename> and
	  <title>Create a Sub-Class of <classname>XML_Parser_Class</classname></title>

	  <title>Create a Structure Holding the State of the Parse</title>

	  <title>Decide How Entity IDs Should Be Converted To Filenames</title>

	  <title>Write A Procedure To Start The Parser</title>


	<title>The <classname>XML_Parser_Class</classname> in Detail</classname></title>





    <sect1 ID='rxpparser' XRefLabel='The RXP Parser'>
      <title>The <productname>RXP</productname> &xml; Parser</title>

	Included in the &est; library is a version of the
	<productname>RXP</productname> &xml; parser. This version is
	limited to 8-bit characters for consistancy with the rest of
	&est;. For more details, see the
	<productname>RXP</productname> documentation. 
		<footnote><para>Insert reference to
		      <productname>RXP</productname> documentation



Local Variables:
mode: sgml
sgml-parent-document:("speechtools.sgml" "chapter" "book")