Sophie

Sophie

distrib > Mandriva > 2010.1 > x86_64 > media > main-release > by-pkgid > 303fbf88f78bb63b946e00c7f4e66ae1 > files > 5

antlr-manual-2.7.7-6mdv2010.1.x86_64.rpm

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!-- tidied with tidy -cmiu -wrap 72 -->

<HTML>
  <HEAD>
    <TITLE>C++ Notes</TITLE>
<STYLE type="text/css">
 p.c7 {font-family: Arial; font-size: 80%}
 p.c6 {font-weight: bold}
 tt.c5 {font-style: italic}
 span.c4 {font-family: Courier New}
 small.c3 {font-family: Courier New}
 small.c2 {color: #000000; font-family: Arial}
 a.c1 {color: #000000; font-family: Arial}
</STYLE>
<STYLE type="text/css">
 td.c1 {font-weight: bold}
</STYLE>
  </HEAD>

  <BODY bgcolor="#FFFFFF" text="#000000">
          <H1><A name="_notes">C++ Notes</A></H1>
          <HR>

          <P>The C++ runtime and generated grammars look very much the
          same as the java ones. There are some subtle differences
          though, but more on this later.</P>

          <H2><A name="_buildruntime">Building the runtime</A></H2>

          <P>The following is a bit unix centric. For Windows some
          contributed project files can be found in lib/cpp/contrib. These
          may be slightly outdated.
          <P>The runtime files are located in the lib/cpp subdirectory
          of the ANTLR distribution. Building it is in general done via the
			 toplevel configure script and the Makefile generated by the configure
			 script. Before configuring please read INSTALL.txt in the toplevel directory. The file lib/cpp/README may contain some extra information on specific target machines.</P>
<PRE>
./configure --prefix=/usr/local
make
</PRE>

          <P>Installing ANTLR and the runtime is then done by typing</P>
<PRE>
make install
</PRE>
          This installs the runtime library libantlr.a in
          /usr/local/lib and the header files in
          /usr/local/include/antlr. Two convenience scripts antlr and
          antlr-config are also installed into /usr/local/bin. The first
          script takes care of invoking antlr and the other can be used to
          query the right options for your compiler to build files with
          antlr.

          <H2><A name="_usingruntime">Using the runtime</A></H2>
          Generally you will compile the ANTLR generated files with
          something similar to:
<PRE>
c++ -c MyParser.cpp -I/usr/local/include
</PRE>
          Linking is done with something similar to:
<PRE>
c++ -o MyExec &lt;your .o files&gt; -L/usr/local/lib -lantlr
</PRE>

          <H2><A name="_generatingcpp">Getting ANTLR to generate
          C++</A></H2>

          <P>To get ANTLR to generate C++ code you have to add</P>
<PRE>
language="Cpp";
</PRE>
          to the global options section. After that things are pretty
          much the same as in java mode except that a all token and AST
          classes are wrapped by a reference counting class (this to
          make live easier (in some ways and much harder in others)).
			 The reference counting class uses
<PRE>
operator-&gt;
</PRE>
          to reference the object it is wrapping. As a result of this
          you use -&gt; in C++ mode in stead of the '.' of java. See
          the examples in examples/cpp for some illustrations.

          <H2><A name="_changingasttype">AST types</A></H2>

			<P>New as of ANTLR 2.7.2 is that if you supply the
			<PRE>buildAST=true</PRE> option to a parser then you <b>have
			to</b> set and initialize an ASTFactory for the parser and
			treewalkers that use the resulting AST.
<PRE>
ASTFactory my_factory;	// generates CommonAST per default..
MyParser parser( some-lexer );
// Do setup from the AST factory repeat this for all parsers using the AST
parser.initializeASTFactory( my_factory );
parser.setASTFactory( &my_factory );
</PRE>
			 <P>In C++ mode it is also possible to override the AST type used
          by the code generated by ANTLR. To do this you have to do the
          following:

          <UL>
            <LI>
              Define a custom AST class like the following:
<PRE>
#ifndef __MY_AST_H__
#define __MY_AST_H__

#include &lt;antlr/CommonAST.hpp&gt;

class MyAST;

typedef ANTLR_USE_NAMESPACE(antlr)ASTRefCount&lt;MyAST&gt; RefMyAST;

/** Custom AST class that adds line numbers to the AST nodes.
 * easily extended with columns. Filenames will take more work since
 * you'll need a custom token class as well (one that contains the
 * filename)
 */
class MyAST : public ANTLR_USE_NAMESPACE(antlr)CommonAST {
public:
   // copy constructor
   MyAST( const MyAST& other )
   : CommonAST(other)
   , line(other.line)
   {
   }
   // Default constructor
   MyAST( void ) : CommonAST(), line(0) {}
   virtual ~MyAST( void ) {}
   // get the line number of the node (or try to derive it from the child node
   virtual int getLine( void ) const
   {
      // most of the time the line number is not set if the node is a
      // imaginary one. Usually this means it has a child. Refer to the
      // child line number. Of course this could be extended a bit.
      // based on an example by Peter Morling.
      if ( line != 0 )
         return line;
      if( getFirstChild() )
         return ( RefMyAST(getFirstChild())-&gt;getLine() );
      return 0;
   }
   virtual void setLine( int l )
   {
      line = l;
   }
   /** the initialize methods are called by the tree building constructs
    * depending on which version is called the line number is filled in.
    * e.g. a bit depending on how the node is constructed it will have the
    * line number filled in or not (imaginary nodes!).
    */
   virtual void initialize(int t, const ANTLR_USE_NAMESPACE(std)string& txt)
   {
      CommonAST::initialize(t,txt);
      line = 0;
   }
   virtual void initialize( ANTLR_USE_NAMESPACE(antlr)RefToken t )
   {
      CommonAST::initialize(t);
      line = t-&gt;getLine();
   }
   virtual void initialize( RefMyAST ast )
   {
      CommonAST::initialize(ANTLR_USE_NAMESPACE(antlr)RefAST(ast));
      line = ast-&gt;getLine();
   }
   // for convenience will also work without
   void addChild( RefMyAST c )
   {
      BaseAST::addChild( ANTLR_USE_NAMESPACE(antlr)RefAST(c) );
   }
   // for convenience will also work without
   void setNextSibling( RefMyAST c )
   {
      BaseAST::setNextSibling( ANTLR_USE_NAMESPACE(antlr)RefAST(c) );
   }
   // provide a clone of the node (no sibling/child pointers are copied)
   virtual ANTLR_USE_NAMESPACE(antlr)RefAST clone( void )
   {
      return ANTLR_USE_NAMESPACE(antlr)RefAST(new MyAST(*this));
   }
   static ANTLR_USE_NAMESPACE(antlr)RefAST factory( void )
   {
      return ANTLR_USE_NAMESPACE(antlr)RefAST(RefMyAST(new MyAST()));
   }
private:
   int line;
};
#endif
</PRE>
            </LI>

            <LI>
              Tell ANTLR's C++ codegenerator to use your RefMyAST by
              including the following in the options section of your grammars:
<PRE>
ASTLabelType = "RefMyAST";
</PRE>
              After that you only need to tell the parser before every
              invocation of a new instance that it should use the AST
              factory defined in your class. This is done like this:
<PRE>
// make factory with default type of MyAST
ASTFactory my_factory( "MyAST", MyAST::factory );
My_Parser parser(lexer);
// make sure the factory knows about all AST types in the parser..
parser.initializeASTFactory(my_factory);
// and tell the parser about the factory..
parser.setASTFactory( &my_factory );
</PRE>
<P>After these steps you can access methods/attributes of (Ref)MyAST
directly (without typecasting) in parser/treewalker productions.
<P>Forgetting to do a setASTFactory results in a nice SIGSEGV or you OS's
equivalent. The default constructor of ASTFactory initializes itself to
generate CommonAST objects.
<P>If you use a 'chain' of parsers/treewalkers then you have to make sure
they all share the same AST factory. Also if you add new definitions of
ASTnodes/tokens in downstream parsers/treewalkers you have to apply the
respective initializeASTFactory methods to this factory.
<P>This all is demonstrated in the examples/cpp/treewalk example.
            </LI>
          </UL>

          <H2><A name="_heteroast">Using Heterogeneous AST
          types</A></H2>

			 <P>This should now (as of 2.7.2) work in C++ mode. With probably some
			 caveats.
			 <P>The heteroAST example show how to set things up. A short excerpt:
<PRE>
ASTFactory ast_factory;

parser.initializeASTFactory(ast_factory);
parser.setASTFactory(&ast_factory);
</PRE>
			<P>A small excerpt from the generated initializeASTFactory method:
<PRE>
void CalcParser::initializeASTFactory( antlr::ASTFactory& factory )
{
   factory.registerFactory(4, "PLUSNode", PLUSNode::factory);
   factory.registerFactory(5, "MULTNode", MULTNode::factory);
   factory.registerFactory(6, "INTNode", INTNode::factory);
   factory.setMaxNodeType(11);
}
</PRE>
<P>After these steps ANTLR should be able to decide what factory to use at
what time.

          <H2><A name="_extras">Extra functionality in C++
          mode.</A></H2>
          In C++ mode ANTLR supports some extra functionality to make
          life a little easier.

          <H3>Inserting Code</H3>
          In C++ mode some extra control is supplied over the places
          where code can be placed in the gerenated files. These are
          extensions on the <B>header</B> directive. The syntax is:
<PRE>
header "&lt;identifier&gt;" { <CODE> }
</CODE>
</PRE>

          <TABLE summary="Overview of header id's">
            <TR>
              <TD class="c1">identifier</TD>

              <TD class="c1">where</TD>
            </TR>

            <TR>
              <TD><CODE>pre_include_hpp</CODE></TD>

              <TD>Code is inserted before ANTLR generated includes in
              the header file.</TD>
            </TR>

            <TR>
              <TD><CODE>post_include_hpp</CODE></TD>

              <TD>Code is inserted after ANTLR generated includes in
              the header file, but outside any generated namespace
              specifications.</TD>
            </TR>

            <TR>
              <TD><CODE>pre_include_cpp</CODE></TD>

              <TD>Code is inserted before ANTLR generated includes in
              the cpp file.</TD>
            </TR>

            <TR>
              <TD><CODE>post_include_cpp</CODE></TD>

              <TD>Code is inserted after ANTLR generated includes in
              the cpp file, but outside any generated namespace
              specifications.</TD>
            </TR>
          </TABLE>

          <H3>Pacifying the preprocessor</H3>

          <P>Sometimes various tree building constructs with '#'
          in them clash with the C/C++ preprocessor. ANTLR's
          preprocessor for actions is slightly extended in C++ mode to
          alleviate these pains.</P>
			 <P><B>NOTE:</B> At some point I plan to replace the '#' by
          something different that gives less trouble in C++.

          <P>The following preprocessor constructs are not
          touched. (And as a result you cannot use these as labels for
          AST nodes.</P>

          <UL>
            <LI><CODE>if</CODE></LI>

            <LI><CODE>define</CODE></LI>

            <LI><CODE>ifdef</CODE></LI>

            <LI><CODE>ifndef</CODE></LI>

            <LI><CODE>else</CODE></LI>

            <LI><CODE>elif</CODE></LI>

            <LI><CODE>endif</CODE></LI>

            <LI><CODE>warning</CODE></LI>

            <LI><CODE>error</CODE></LI>

            <LI><CODE>ident</CODE></LI>

            <LI><CODE>pragma</CODE></LI>

            <LI><CODE>include</CODE></LI>
          </UL>

          <P>As another extra it's possible to escape '#'-signs
          with a backslash e.g. "\#". As the action lexer sees these
          they get translated to simple '#' characters.</P>

          <H2><A name="_template">A template grammar file for
          C++</A></H2>
<PRE>
<CODE>header "pre_include_hpp" {
    // gets inserted before antlr generated includes in the header file
}
header "post_include_hpp" {
    // gets inserted after antlr generated includes in the header file
     // outside any generated namespace specifications
}

header "pre_include_cpp" {
    // gets inserted before the antlr generated includes in the cpp file
}

header "post_include_cpp" {
    // gets inserted after the antlr generated includes in the cpp file
}

header {
    // gets inserted after generated namespace specifications in the header
    // file. But outside the generated class.
}

options {
   language="Cpp";
    namespace="something";      // encapsulate code in this namespace
//  namespaceStd="std";         // cosmetic option to get rid of long defines
                                // in generated code
//  namespaceAntlr="antlr";     // cosmetic option to get rid of long defines
                                // in generated code
    genHashLines = true;        // generated #line's or turn it off.
}

{
   // global stuff in the cpp file
   ...
}
class MyParser extends Parser;
options {
   exportVocab=My;
}
{
   // additional methods and members
   ...
}
... rules ...

{
   // global stuff in the cpp file
   ...
}
class MyLexer extends Lexer;
options {
   exportVocab=My;
}
{
   // additional methods and members
   ...
}
... rules ...

{
   // global stuff in the cpp file
   ...
}
class MyTreeParser extends TreeParser;
options {
   exportVocab=My;
}
{
   // additional methods and members
   ...
}
... rules ...
</CODE>
</PRE>

</BODY>
</HTML>