Sophie

Sophie

distrib > Mageia > 7 > x86_64 > by-pkgid > 9406df6f885a8f97005c2d7e241d165f > files > 268

colorer-take5-docs-0-0.beta5.14.mga7.x86_64.rpm

<?xml version="1.0" encoding="windows-1251"?>
<!-- PUBLIC "-//OASIS//DTD DocBook V4.2//EN" -->
<!DOCTYPE article[
<!--<!ENTITY colorerloc "file:/d:/programs/devel/colorer">-->
<!ENTITY colorerloc "../..">
<!ENTITY fileref "&colorerloc;/doc/2003/hrc.xsd">
<!ENTITY hrdref "&colorerloc;/doc/2003/hrd.xsd">
<!ENTITY catalogref "&colorerloc;/doc/2003/catalog.xsd">
]>
<article lang="en" xmlns:x="uri:custom:schema-db">

<articleinfo>

<releaseinfo>Colorer-take5 beta4 library HRC Reference. April 2005</releaseinfo>
<title>HRC Language Reference</title>

<pubdate>28 April 2005</pubdate>

<revhistory>
<revision>
  <revnumber>take5.beta4</revnumber>
  <date>28 April 2005</date>
  <revremark>(Available as
    <ulink url='http://colorer.sf.net/hrc-ref/'>HTML</ulink>,
    <ulink url='http://colorer.sf.net/hrc-ref/hrc-ref.pdf'>PDF</ulink>,
    <ulink url='http://colorer.sf.net/hrc-ref/hrc-ref.zip'>DocBook</ulink>)
  </revremark>
</revision>
<revision>
  <revnumber>take5.beta4(draft)</revnumber>
  <date>19 February 2005</date>
</revision>
<revision>
  <revnumber>take5.beta3</revnumber>
  <date>30 January 2004</date>
</revision>
<revision>
  <revnumber>take5.beta2</revnumber>
  <date>12 September 2003</date>
</revision>
<revision>
  <revnumber>take5.beta1</revnumber>
  <date>30 March 2003</date>
</revision>
<revision>
  <revnumber>take5.alpha3</revnumber>
  <date>1 March 2003</date>
</revision>
<revision>
  <revnumber>take5.alpha2</revnumber>
  <date>30 January 2003</date>
</revision>
</revhistory>
<author>
  <firstname>Igor</firstname><surname>Russkih</surname>
  <affiliation>
    <address>
      <email>irusskih at gmail.com</email>
    </address>
  </affiliation>
</author>

<copyright><year>2003</year><year>2004</year><year>2005</year><holder>Igor Russkih (Cail Lomecb)</holder></copyright>

<abstract><title>Abstract</title>
<para>This reference defines syntax and semantics of the
<acronym role='Highlighting Resource Codes'>HRC</acronym>
language, used in Colorer-take5 Library to represent and describe
syntax and lexical structure of target programming languages.
This description is used by library to
parse and colorize text in editors or other systems.
</para>
</abstract>
</articleinfo>


<section id='introduction'>
<title>Introduction</title>

<para>
HRC is a script language which is describes parsing process of text files
to produce syntax highlighting. It is based on XML markup, and defines its
own XML vocabulary and structure. HRC language is developed to achieve
most flexible and efficient process of describing programming language structures.
</para>
<para>
Started nearly in year 1999, it was a simple XML-like structure, describing
some common language constructions. But later it has grown into
the much more complex and powerful language with complex relations between
different languages and syntax contexts.
</para>
<para>
HRC is based on Regular Expressions, which allow to achieve flexible recognition
of text elements, lexemes and tokens. But Regular Expressions (RE) allows
recognition of rather limited syntax constructions, and often it is needed to
describe more complex languages. Because this, HRC language uses special
construction, named &quot;scheme&quot;, which allows to describe more powerful,
recursive class of languages (context free) and in combination with RE brings
HRC to strong declarative language.
</para>
</section>



<section id='core'>
<title>Core Syntax</title>
<para>
HRC language allows describing and storing syntax rules for numerous languages.
All language descriptions are divided into two parts:
<emphasis role='strong'>informal</emphasis> part (used to describe different
properties of the language, initial choose rules, service information
- <literal>prototype</literal> element),
and <emphasis role='strong'>formal</emphasis> part, which defines
syntax and semantics of the target parsed language (<literal>type</literal> element).
Prototypes are used to determine, which type should be applied to the currently opened
file, they define some internal application-dependent properties
and other useful information about language. Because you may separate prototype
definition from real language definition, full type loading only occurs
when it is really requested by the user. Also, all the prototype definitions,
collected in one initial source file, allow user to see list of languages, supported
by the library and guarantee fast library's bootstrap.
</para>

<formalpara><title>Structure</title><para>
Each HRC file contains declaration of one or more prototypes or
one language type. Root XML content starts with <literal>hrc</literal>
element, which contains all other HRC definitions.

<x:schemaref uri="&fileref;" role='hrc'/>

</para>


<para>
Each HRC language object is defined using XML elements and attributes.
You can find definition of the HRC XML Syntax in <xref linkend='hrcxsd'/>.
For instance, mostly all HRC definitions start with the next syntax:
<example><title>Common HRC file</title>
<programlisting><![CDATA[
<?xml version="1.0"?>
<!DOCTYPE hrc PUBLIC "-//Cail Lomecb//DTD Colorer HRC take5//EN"
  "http://colorer.sf.net/2003/hrc.dtd">
<hrc version="take5" xmlns="http://colorer.sf.net/2003/hrc"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://colorer.sf.net/2003/hrc
                         http://colorer.sf.net/2003/hrc.xsd">
  <annotation>
   <documentation>
   your documentation...
   </documentation>
  </annotation>

  your definitions...

</hrc>
]]>
</programlisting></example>

</para>

<para>
Each element in HRC can be documented with XML Schema-like
elements:
<x:schemaref uri="&fileref;" role='annotation'/>

Annotation object can be used anywhere in the HRC context to document
and describe any of the HRC elements.

</para>
</formalpara>


<section id='core.filetypes'><title>File Types</title>
<para>Each language prototype requires definition of
the language's name and description. These properties are used
to determine language in context of the other language definitions
and in inter-language linkage process.
</para>


<section id='core.filetypes.proto'><title>Prototypes</title>

<para>
Prototypes are declared with <literal>&lt;prototype&gt;</literal> elements.
For instance:

<example><title>Prototype definition</title>
<programlisting><![CDATA[
  <prototype name="cpp" group="main" description="C++">
    <location link="base/cpp.hrc"/>
    <filename>/\.(cpp|cxx|cc|hpp|h)$/i</filename>
    <firstline>/^\s*(\/\* | \/\/)/xi</firstline>
    <firstline>/\#include/</firstline>
    <firstline>/\#define|\#if/</firstline>
  </prototype>
]]>
</programlisting></example>

declares "C++" language and it properties. These are language's group,
description, location of HRC file with language's syntax declaration;
RE mask, used to identify this type by file name extension,
and one or more masks, used to identify type by file's first line.
</para>

<x:schemaref uri="&fileref;" role='prototype'/>

<para>

Each language must be chosen by the library before starting syntax highlighting
process. This is made with help of <literal>firstline</literal> and <literal>filename</literal>
parameters. Each matched instance of one of these parameters adds some additional
weight to the total language weight (choosing probability). This value is taken by default,
or can be changed explicitly with <literal>weight</literal> attribute of these elements.
When total weights of all types are evaluated, first language with maximum
weight is selected to assign to the opening file.

</para>

<x:schemaref uri="&fileref;" role='filename'/>
<x:schemaref uri="&fileref;" role='firstline'/>

<para>
If any of these two operators is used more than once, each its matched
instance adds a specified weight to the total language weight.
</para>

<para>
If real language definition is separated from the prototype and placed in
other resource, it location is pointed with <literal>location</literal>
element. Real type loading occurs when the described type will be really
requested.
<x:schemaref uri="&fileref;" role='location'/>
</para>

<x:schemaref uri="&fileref;" role='parameters'/>
</section>


<section id='core.filetypes.package'><title>Packages</title>
<para>
You can define file type with a special meaning of the internal
type, which is used by other types and is not visible to user.
This role is managed by the <literal>package</literal> element:

<x:schemaref uri="&fileref;" role='package'/>

This element doesn't contain <literal>filename</literal> or
<literal>firstline</literal> properties because it doesn't directly map
to any file type or language.
As for the rest, all its behavior is like prototype's one.
For example:

<example><title>Package definition</title>
<programlisting><![CDATA[
  <package name="def" group="packages" description="core definitions">
    <location link="default.hrc"/>
  </package>
  <package name="regexp" group="packages" description="Regexp common library">
    <location link="lib/regexp.hrc"/>
  </package>
]]>
</programlisting></example>
</para>

</section>


<section id='core.filetypes.type'><title>Types</title>
<para>
Each prototype (or package) defines its linkage with real file type, describing
information, specific for the syntax parsing process. This information
is stored in basic units, called <literal>type</literal>s.

<x:schemaref uri="&fileref;" role='type'/>

Normally, each type must be defined in a separate file, which contains
this type and optionally its prototype (if there is no this prototype
definition in the global repository).
This allows each file to be loaded only once, when required type is really requested
by the user.

</para>
</section>


</section>

<section id='core.namespaces'><title>Namespaces</title>
<para>
Each type defines its own namespace, where different objects are resided.
Each object must have unique identifier (name) in this namespace,
which is used to reference to it from other objects.
Uniqueness is only required for objects of the same type. So, you can create
objects of different types with the same name and there will be no conflicts.

If object must be referenced from the other type, its fully
qualified name is used in form of <literal>typename:objectname</literal>.
But often there are too many inter-type links, so it is too tedious to use
qualified names each time. To eliminate this, HRC language has
an <literal>import</literal> statement. If used, it 'imports' all the
object names from the imported type into the current one.
There can be as much import statements, as needed. Unqualified name resolving
occurs in order of their definition.

<x:schemaref uri="&fileref;" role='import'/>

For instance, you can write
<programlisting>
  &lt;import type='def'/>
</programlisting>
to import all definitions from the 'def' type into the current one.
Note, that if multiple imported types have equal local names,
they would be resolved in order of import declarations.
</para>
</section>

</section>



<section id='schemesyntax'><title>Scheme syntax</title>

<para>

Scheme is a common construction in the HRC language, used to express and
describe syntax of target languages. Each scheme represents syntax
context, which contains different syntax elements, matched in order
of text analysis. For example, scheme for "C++" language contains different
keywords, string and number tokens, comments, and others.
To describe all the information, required to be highlighted, <literal>region</literal>
objects are used. Each region defines some syntactically meaningful element.
This element always has a name and sometimes a reference to its parent region
(if exists). When parsed, source text is described in terms of these regions.
This description contains groups of the regions with specified positions and lengths.
</para>

<para>
In the next stage of the text processing, each region is associated with some
handler. For example, a handler can assign color and font style information to
the each of the regions, or can produce some operations over these structures.
</para>

<para>
Each region is defined using a <literal>region</literal> element:
<x:schemaref uri="&fileref;" role='region'/>
</para>

<para>
Scheme is a common construction in the HRC language, which contains
syntax definition of the described programming language. Each element in
scheme, while parsing, creates one or more syntax regions which are used to highlight
parsed text. Resulting parse information contains not only a list of the regions,
it also contains a recursive scheme tree, which shows overall text structure.
</para>

<para>
Each type can define as much schemes, as needed, provided that all of them
have names, unique in this type scope. Scheme is defined using
<literal>scheme</literal> element:

<x:schemaref uri="&fileref;" role='scheme'/>

Each type must have one scheme, called "base scheme" - this scheme is
required to be declared in each type. Only types, declared as
<literal>package</literal> can ignore this requirement because they are never used
as top level types.
Each base scheme must have local (unqualified) name equal to its type name.
Base scheme in each type is used as an entry point for parse process.

<example><title>Sample type definition</title>
<programlisting><![CDATA[
  <type name="somelang">
    <region name="Keyword" description="This language's keyword"/>
    <scheme name="somelang">
      <keywords region="Keyword">
        <word name='word1'/><word name='word2'/>
        <word name='otherkeyword'/>
      </keywords>
      <regexp match="/other(keyword)?/i" region="Keyword"/>
    </scheme>
  </type>
]]></programlisting></example>

</para>
<para>
You can customize scheme loading and overall HRC structure using
if/unless attributes of the scheme element. If used, they have to
reference to a common parameter declaration of this scheme's type.

These parameters values could be changed within Colorer's API, this
allows to customize HRC loading and suggest different language
profiles to user's choice.
</para>

<para>
The next sections will describe different types of syntax elements,
available in the HRC language.
</para>

<section id='schemesyntax.keywords'><title>Keyword lists</title>
<para>
This common and often used syntax construction describes a list
of keywords with similar properties.
<x:schemaref uri="&fileref;" role='keywords'/>
<x:schemaref uri="&fileref;" role='word'/>
<x:schemaref uri="&fileref;" role='symb'/>
</para>
<para>
Each element in this list can define its own region assignment or
derive the global one, defined in <literal>keywords</literal> element.
Symbols never checks for characters surrounding them, while words only can matches
if they are surrounded by not-word symbols. These word dividers can be redefined
using <literal>worddiv</literal> attribute of <literal>keywords</literal> element.
</para>
</section>

<section id='schemesyntax.re'><title>Regular Expressions</title>
<para>
Regular expression tokens are used to create custom syntax elements.
These elements are described with RE rules, which makes them very powerful
and flexible. Each RE token can be used to create a number of different
syntax regions (up to 16).
</para>

<x:schemaref uri="&fileref;" role='regexp'/>

<para>

For details of colorer-take5 regular expressions syntax please see <xref linkend='hrcre'/>.
This syntax is used in <literal>match</literal> attribute.
Each <literal>&lt;regexp&gt;</literal> region can have a number of optional attributes -
<literal>region0, region1, ... regionf</literal> (total 16 regions). A value of each attribute is a name
of the corresponding syntax region, which will be used to highlight text.
A number in attribute's name means corresponding bracket in a RE.
<literal>region0</literal> means full matched range of the RE (this could be changed with
<literal>\m</literal> and <literal>\M</literal> RE metasymbols).

Regular Expression can also include named brackets in syntax of <literal>(?{name} ... )</literal>.
In this case the name of the bracket itself is a name of corresponding syntax region.
</para>

<para>
Each RE definition can include references to any predefined sequence of RE code.
Such references are called <quote>entities</quote>. Entities are defined on a <literal>type</literal>
level and have their own qualified namespace.
To include entity's value into RE, special syntax of <literal>%entityname;</literal> is used.
</para>

<x:schemaref uri="&fileref;" role='entity'/>

<para>

Each RE has a defined priority attribute (by default it <literal>normal</literal>).
This means normal RE priority over enwrapped syntax elements.
Enwrapped element is an element, which waits to be matched from a top-level scheme in
a parse sequence. <literal>low</literal> priority means that such a top-level element
(such a <literal>end</literal> attribute of a <literal>block</literal> element) will
be matched first, if there will be choice conflict between these two.
</para>

</section>

<section id='schemesyntax.contextswitch'><title>Blocked context switch</title>
<para>
Although regular expressions is very powerful feature, their syntax doesn't allow
to express some complex language constructions. First, this is due to general
limitation of colorer's RE parser - single line of text scope. This means that
no regular expression can work on multiple lines of the parsed text.
Often programming languages have constructions which could be wrapped into
each other unlimited number of times. This is also an area, where Regular Expression
would not help much.
</para>
<para>
To express much more complex syntax and allow to declare context-free grammar
constructions, HRC defines a special token, named "block".

<x:schemaref uri="&fileref;" role='block'/>

<x:schemaref uri="&fileref;" role='blockInner'/>

Each block defines its <literal>start</literal> and <literal>end</literal>
tags, each with a RE syntax already described.
Everything enwrapped within these two marks will be highligted
as a syntax of some other <literal>scheme</literal>, also pointed in this
element's attribute.
</para>
<para>
This means that using <literal>block</literal> attribute you can switch context
between differen highligting schemas. This allows to define great number of syntax
variations and particulars.
</para>

</section>

<section id='schemesyntax.boundaries'><title>Scheme boundaries and priority</title>
<para>
Both regular expressions and block'ed scheme switches work in the same scheme context,
and tested against text in order they were defined in HRC.
This means that any conflict between multiple match possibilities is resolved according
to order of RE, defined in HRC file. After RE match the parse position is increased by
width of that RE. By default the width is from first matched symbol till last matched.
However it is possible to redefine regular expression boundaries and therefore shift
somehow parse position increase. This could be done with special <literal>\m</literal>
(redefines RE start) and <literal>\M</literal> (redefines RE end) metasymbols.
Such a behaviour allows you to define overlapped tokens, where next token parse starts
somewere in the middle of the previous. In such a case color definitions occur in the order they are
parsed.
</para>
<section id="priority"><title>priority</title>
<para>
Additional selection rules are applied to a usecase, where return occurs from
an inner scheme to its caller scheme (via <literal>end</literal> tag of the 
<literal>block</literal> element).
Information about relative position of the <literal>block</literal> element
can't help here to determine, what to apply: <literal>end</literal> RE of the outer
block, or a next regular expression/keyword/block, defined in the inner (called) scheme.
To resolve such a conflict HRC defines a special attribute for <literal>regexp</literal>
and <literal>block</literal> objects: <literal>priority</literal>.
This attribute's default value is <literal>"normal"</literal>. When changed to
<literal>low</literal> it tells Colorer not to take this object into account,
when resolving conflicts upon exit from inner scheme.
This means that <literal>end</literal> tag of the outer <literal>block</literal> element
will be used instead of the object with lowered priority (when a conflict will occur).
When reviewing nested <literal>block</literal> tags, <literal>priority</literal> attribute
relates to the inner object's <literal>start</literal> tag.
</para>
</section>
<section id="content-priority"><title>content-priority</title>
<para>
Often it is required to define a behavior of an element dynamically, depending on
usage context. With <literal>priority</literal> attribute it is impossible to change
element's priority depending on called context. The element will always be the same
priority. But it is possible to change whole scheme's definition priority (i.e. all
it's elements priority) - using <literal>content-priority</literal> attribute of a
<literal>block</literal> element.
</para>
<para>
When changed into <literal>low</literal> it causes all the elements of that scheme
to change their priority to <literal>low</literal>, no matter the value of their
particular <literal>priority</literal> attribute. 
</para>
</section>

<section id="inner-region"><title>inner-region</title>
<para>
While defining scheme context switch there is a possibility to set a default region,
used for all called scheme content. This region will be used as a "background" for
all other regions, defined in that scheme.
It is possible to manage boundaries this region will use while instantiation.
In a normal case all the scheme's content including its <literal>start</literal>
and <literal>end</literal> attributes is handled inside of this default region.
Therefore the region starts where <literal>start</literal> token starts, and
ends where <literal>end</literal> token ends.
</para>
<para>
Sometimes it is required to change this behaviour and handle <literal>start</literal>
and <literal>end</literal> tokens (and all the regions they instantiate) outside of
called scheme default region. This could be achieved with <literal>inner-region</literal>
attribute with <literal>"yes"</literal> value.
When defined it tells parser to include start/end regions outside of called scheme,
and to change called scheme default region boundaries. In this case they start at
the end of <literal>start</literal> token and ends just before <literal>end</literal>
token area.
</para>
<para>
Inner region feature could be used to implement special wrapped areas and in general
can affect special background color treatment.
</para>
</section>

</section>

</section>



<section id='interscheme'><title>Inter-scheme links</title>

<section id='interscheme.inheritance'><title>Inheritance</title>

<x:schemaref uri="&fileref;" role='inherit'/>

<x:schemaref uri="&fileref;" role='virtual'/>

<para>
</para>
</section>

<section id='interscheme.substitution'><title>Schemes substitutions</title>
<para></para>
</section>
</section>


<section id='coding.std'><title>HRC Language Coding Conventions</title>
<para>
Although HRC itself could be used in an arbitrary way, Colorer-take5
library has a number of coding, naming and other conventions, which
applies not to HRC syntax itself, but to the library which reads
and parses output of an analizer.
</para>

<section id='coding.std.naming'><title>Object naming</title>
<para>
All regions in Colorer-take5 HRC database are going to be named in a same
way: name starts with capital letter, each name-part also starts with capital
letter. For instance: <literal>StringQuote</literal>. Any separate type or
package is named in low-case letters with possible shortcuts, to make
package name shorter. Any region in this case is addressed as
<literal>def:StringQuote</literal>.
</para>

<para>
Scheme names are going to be context dependent and could be used in any case.
Dash or Dot delimiter is often used to make them looks better:
<literal>&lt;scheme name="Comment.content"&gt;</literal> for instance.
</para>

<para>
All files in file system storage should be placed in low-case with
possible Dash or Dot delimiters. External XML entities should be used
to simplify complex HRC files structure and make more easier autogenerated
HRC files creation and integration. External entities should not have
<literal>hrc</literal> extension. They should use a <literal>xml</literal>,
as more compliant.
</para>
</section>

<section id='coding.std.spec'><title>Default package</title>
<para>
Colorer-take5 defines special package type to declare general syntax
regions framework, simplify HRC database support and
decouple parse content and its presentation.
The name of this package is <literal>def</literal>. It is placed in
<literal>hrc/lib/default.hrc</literal>.
The general purpose of this file is to declare a basic number of syntax
regions, all other HRC regions should be inherited from this set.
This allows to flexibly define HRD color highlighting rules to
make them unique across all supported syntaxes and languages.
Any HRC package can explicitly import and use them. Or it can
declare its own syntax regions, derived from the defaults.
</para>

<section id='coding.std.spec.pairs'><title>Pair construction matching</title>
<para>
Colorer-take5 library itself (independently on HRC syntax) implements
a number of extensions to make editing process more flexible. These are
paired construction matching and file's structure/error list outlining.
These features are implemented through the specially declared regions,
which are mapped to more complex syntax region values.

To declare a matching paired construction, package should instantiate
regions with special <literal>def:PairStart</literal> and <literal>def:PairEnd</literal>.
These regions parse layout should be properly enwrapped in a valid recursive sequence.
Using this information Colorer-take5 library can make actions to provide user
an ability to jump over text blocks in target language and highlight them during
editing process.
</para>
</section>

<section id='coding.std.spec.outline'><title>Outliner construction</title>
<para>
Another feature Colorer-take5 library can provide is a possibility to build
a tree of some valuable syntax tokens with possibility to quickly navigate over
them in editor. These tokens can represent programming language's functions, procedures,
or any other logical structure of the text. While parsing such constructions are collected
into a special outline container which can represent them to user in realtime or by request.
Colorer-take5 editor implements two basic forms of outliner: functions and errors list.
Any programming language can instantiate token with region equals or derived from
<literal>def:Outlined</literal>. All tokens with this region are considered to be
outliner-targeted tokens and collected while parsing. Outliner can also take into account
information about parse tree structure to generate tree-like text outliners.
Moreover, any language can provide special algorithmic support or logic to implement
special outlined regions parsing and building valid outline tree.
For instance EclipseColorer editor evaluates a name of each outlined region and searches
an icon with such a name. If found, it uses this icon to customize outliner window
items with graphic objects, not only text.
</para>
<para>
Outliner can generally be setted up agains any region type. It works as a kind of filter,
gathering from parser only required information. In such a way works Errors list, which
collects regions, derived from <literal>def:Error</literal>. Each language uses this
region to mark problems in text, it found during parsing.
</para>

</section>

</section>

<section id='coding.rec'><title>Coding Recommendations</title>
<para>
HRC database has long standing up history, many times its format,
syntax and meaning were changed to reach more logic and formal structure.

Because of this there are still some file type descriptions, which are
not fully comply with general HRC conventions.
These should not be supported in their current form, but should be reworked
to be more complient with all HRC descriptions.
In general these includes invalid package naming, invalid region/schemes naming.
</para>
<para>
It is good point in HRC to have an <literal>import</literal> feature,
which allows to use other package's objects in their unqualified names,
but in general this feature should not be overused. It is much more convenient
to use fully qualified regions and schemes names to explicitly show the package
reusage/intersections.

</para>
</section>

</section>

<appendix id='hrcre'>
<title>Regular Expressions syntax</title>

<section id='hrcre.intro'><title>Introduction</title>
<para>
All work of the Colorer library and HRC language is based on the regular expressions (RE) usage.
They allow you to create universal syntax rules of highlighting in HRC.
</para>

<para>
Regular expressions consist of the set of characters.
Some of these are simple, and some are special metacharacters.
All metacharacters (escapes) are divided into three categories: first - zerolength (words boundaries and so on);
second - class metacharacters (<literal>\w</literal>, <literal>\s</literal> <literal>.</literal>);
and the third class is operators.
RE operators can be applied to single character,
to block, enwrapped in brackets or into other operators.
You can use brackets to group any sequence of characters.
Regular expressions in HRC Language are like Perl regexp in their base syntax.
There are some differences in extended operators.
</para>
</section>

<section id='hrcre.syntax'><title>Syntax</title>
<para>
All regexps must be in slashes <literal>/.../</literal>.
After the end slash there can be the next parameters:

<itemizedlist>
<listitem><simpara><literal>i</literal> - ignore symbol case</simpara></listitem>
<listitem><simpara><literal>x</literal> - ignore direct spaces and crlf (for comfort)</simpara></listitem>
<listitem><simpara><literal>s</literal> - suppose, that regexp is single line - it means, than '.' class should include \r\n symbols. </simpara></listitem>
</itemizedlist>
Each symbol in RE is linearly compared with the target string.
Everything, that doesn't looks like metacharacters, means simple character.
</para>
</section>

<section id='hrcre.meta'><title>Metacharacters</title>

<table><title>Metacharacters</title><tgroup cols='2'><tbody>
<row><entry><literal>^</literal></entry><entry>Match the beginning of the line</entry></row>
<row><entry><literal>$</literal></entry><entry>Match the end of the line</entry></row>
<row><entry><literal>.</literal></entry><entry>Match any character (except \r\n)</entry></row>
<row><entry><literal>[...]</literal></entry><entry>Match characters in set</entry></row>
<row><entry><literal>[^...]</literal></entry><entry>Match characters not in set.
          All the operators are disabled here, but you can
          use other metacharacters and range operator:
          a-z means all chars from first to second (a - z),
          <literal>[{ASSIGNED}-[{Lu}]-[{Ll}]]</literal> - unicode classes reference.
          -[] - Class substraction.
          &amp;&amp;[] - Class join (can be dropped).
          |[] - Class intersection.
          </entry></row>
<row><entry><literal>\#</literal></entry><entry>Next symbol '#' after slash (except a-z and 1-9)</entry></row>
<row><entry><literal>\b</literal></entry><entry>Word break at this point</entry></row>
<row><entry><literal>\B</literal></entry><entry>No word break at this point</entry></row>
<row><entry><literal>\xHH</literal>, <literal>\x{HHHH}</literal></entry><entry><literal>HH, HHHH</literal> - character code (hex)</entry></row>
<row><entry><literal>\n</literal></entry><entry>0x10 (lf)</entry></row>                                          
<row><entry><literal>\r</literal></entry><entry>0x13 (cr)</entry></row>
<row><entry><literal>\t</literal></entry><entry>0x09 (tab)</entry></row>
<row><entry><literal>\s</literal></entry><entry>Whitespace character (tab/space/cr/lf)</entry></row>
<row><entry><literal>\S</literal></entry><entry>Not whitespace</entry></row>
<row><entry><literal>\w</literal></entry><entry>Word symbol (chars, digits, _)</entry></row>
<row><entry><literal>\W</literal></entry><entry>Not word symbols</entry></row>
<row><entry><literal>\d</literal></entry><entry>Digit</entry></row>
<row><entry><literal>\D</literal></entry><entry>Not Digit</entry></row>
<row><entry><literal>\u</literal></entry><entry>Uppercase symbol</entry></row>
<row><entry><literal>\l</literal></entry><entry>Lowercase symbol</entry></row>
</tbody></tgroup></table>
</section>

<section id='hrcre.exmeta'><title>Extended metacharacter</title>
<para>These metacharacters are incompatible with Perl</para>
<table><title>Extended Metacharacters</title><tgroup cols='2'><tbody>
<row><entry><literal>\c</literal></entry><entry>Means 'not word' before</entry></row>
<row><entry><literal>\N</literal></entry><entry>Link inside of regexp to one of its brackets.
          <literal>N</literal>  - needed brackets pair.  This operator  works
          only with non-operator symbols in a bracket.
</entry></row>
</tbody></tgroup></table>

<para>
The next operators are only available in Colorer-take5 regexp module parser,
when in compiled for Colorer library:
</para>
<table><title>Colorer-take5 Parsing Metacharacters</title><tgroup cols='2'><tbody>
<row><entry><literal>~</literal></entry><entry>Matches for the start of parent scheme (end of <literal>start</literal> tag).</entry></row>
<row><entry><literal>\m</literal></entry><entry>Changes start of regexp</entry></row>
<row><entry><literal>\M</literal></entry><entry>Changes end of regexp</entry></row>
<row><entry><literal>\yN \YN \y{name} \Y{name}</literal></entry><entry>Link to the external regexp (in <literal>end</literal> token to <literal>start</literal> token param). N - required bracket pair, name - named bracket.</entry></row>
</tbody></tgroup></table>

<para>
For more information about <literal>\m \M</literal> meaning see
in <xref linkend="schemesyntax.boundaries"/>.
</para>

</section>


<section id='hrcre.ops'><title>Operators</title>
<para>
Operators can't be used without some preceding character sequence.
Each operator must be applied to the appropriate character,
metacharacter, or block of their combination (enclosed with brackets).
</para>

<table><title>Operators</title><tgroup cols='2'><tbody>
<row><entry><literal>( )</literal></entry><entry>Group and remember characters to form one pattern.</entry></row>
<row><entry><literal>(?{name} )</literal></entry><entry>Group and remember characters into the named group.</entry></row>
<row><entry><literal>(?{} ) or (?: )</literal></entry><entry>Group but don't remember characters into the group (unnamed group).</entry></row>
<row><entry><literal>(?{} )</literal></entry><entry>Group and remember characters into the unnamed uncounted group.</entry></row>
<row><entry><literal>|</literal></entry><entry>Match previous or next pattern.</entry></row>
<row><entry><literal>*</literal></entry><entry>Match previous pattern 0 or more times.</entry></row>
<row><entry><literal>+</literal></entry><entry>Match previous pattern 1 or more times.</entry></row>
<row><entry><literal>?</literal></entry><entry>Match previous pattern 0 or 1 times.</entry></row>
<row><entry><literal>{n}</literal></entry><entry>Repeat n times.</entry></row>
<row><entry><literal>{n,}</literal></entry><entry>Repeat n or more times.</entry></row>
<row><entry><literal>{n,m}</literal></entry><entry>Repeat from n to m times.</entry></row>
</tbody></tgroup></table>

<para>
If you'll add <literal>?</literal> after operator, it becomes nongreedy.
For example <literal>*</literal> operator becomes nongreedy if placing <literal>*?</literal>
Greedy operator tries to take as much in string, as it can. Nongreedy takes by minimum.
</para>
</section>


<section id='hrcre.exops'><title>Extended operators</title>
<table><title>Extended Operators</title><tgroup cols='2'><tbody>
<row><entry><literal>?#N</literal></entry><entry>Look-behind. N - symbol number to look behind.</entry></row>
<row><entry><literal>?~N</literal></entry><entry>Negative look-behind.</entry></row>
<row><entry><literal>?=</literal></entry><entry>Look-ahead.</entry></row>
<row><entry><literal>?!</literal></entry><entry>Negative Look-ahead.</entry></row>
</tbody></tgroup></table>
<para>
Note, that two last operators exist in Perl - in form of <literal>(?=foobar)</literal>.
But colorer uses syntax <literal>(foobar)?=</literal>
</para>
</section>

<section id='hrcre.examples'><title>Examples</title>
<example><title>RE examples</title>
<para>

<variablelist>
<varlistentry>
<term><literal>/foobar/</literal></term>
<listitem><para>will match "foobar", "foobar barfoo"</para></listitem>
</varlistentry>

<varlistentry>
<term><literal>/ FOO bar /ix</literal></term>
<listitem><para>will match "foobar" "FOOBAR" "foobar and two other foos"</para></listitem>
</varlistentry>

<varlistentry>
<term><literal>/(foo)?bar/</literal></term>
<listitem><para>will match "foobar", "bar"</para></listitem>
</varlistentry>

<varlistentry>
<term><literal>/^foobar$/</literal></term>
<listitem><para>will match _only_ with "foobar"</para></listitem>
</varlistentry>

<varlistentry>
<term><literal>/([\d\.])+/</literal></term>
<listitem><para>will match any number</para></listitem>
</varlistentry>

<varlistentry>
<term><literal>/(foo|bar)+/</literal></term>
<listitem><para>will match "foofoofoobarfoobar", "bar"</para></listitem>
</varlistentry>

<varlistentry>
<term><literal>/f[obar]+r/</literal></term>
<listitem><para>will match "foobar", "for", "far"</para></listitem>
</varlistentry>
</variablelist>

</para>
</example>
</section>

</appendix>

<appendix id='catalog.xml'>
<title>Format of <literal>catalog.xml</literal> file</title>
<para>

Catalog of all Colorer Library resources is a convenient way to
unify creation and management of all the Colorer features.

This catalog is stored in <literal>catalog.xml</literal> file,
and mapped into the ParserFactory class.

Catalog supports storing of all installed HRC modules, management
of error logging and listing of available HRD sets.

</para>
<x:schemaref uri="&catalogref;" role='catalog'/>

<x:schemaref uri="&catalogref;" role='hrc-sets'/>
<x:schemaref uri="&catalogref;" role='hrd-sets'/>
<x:schemaref uri="&catalogref;" role='hrd-entry'/>
<x:schemaref uri="&catalogref;" role='location'/>

<x:schemaref uri="&catalogref;"/>

<para>
</para>
</appendix>

<appendix id='hrd'>
<title>Format of <literal>HRD</literal> color schemes</title>
<para>
<acronym role='Highlighting Resource Descriptions'>HRD</acronym>
storage is used to assign each HRC Region with some editor-specific properties.
Commonly, these include color and style information.
HRD file consists of the list of the entries, each of them describing one
HRC Region.

</para>
<x:schemaref uri="&hrdref;" role='hrd'/>

<x:schemaref uri="&hrdref;" role='assign'/>

<para>
It is possible to maintain different HRD settings for different languages,
or to compile them into one single HRD file.
The former allows you to distribute recommended settings with each language,
and the latter - to unify modifying and storing changed HRD settings with provided UI.

</para>
<x:schemaref uri="&hrdref;"/>

<para>
</para>
</appendix>

<appendix id='hrcxsd'>
<title>XML Schema for HRC Language</title>
<para>
This XML Schema instance was automatically generated
from the original <literal>hrc.xsd</literal> source, available at
<ulink url="http://colorer.sf.net/2003/hrc.xsd">http://colorer.sf.net/2003/hrc.xsd</ulink>.
All comments and documentation tags were stripped to achieve more compact format.
To use this schema in other, than informational purposes, use up-to-date
version, available on the link above.
</para>

<x:schemaref role='' uri="&fileref;"/>

</appendix>

<appendix id='history'><title>History of the changes</title>

<itemizedlist><title>take5.beta4, 28 April 2005</title>
<listitem><para>
New <xref linkend="inner-region"/> attribute description.
</para></listitem>
<listitem><para>
Minor HRD schema clarifications.
</para></listitem>
</itemizedlist>

<itemizedlist><title>take5.beta4(draft), 19 February 2005</title>
<listitem><para>
Clarification of <literal>regexp</literal> and <literal>block</literal> regions usage.
</para></listitem>
<listitem><para>
"Scheme boundaries and priority" explained.
</para></listitem>
<listitem><para>
"HRC Language Coding Conventions" section was added.
</para></listitem>
</itemizedlist>

</appendix>


<bibliography id='bibliography'><title>References</title>
 
 <bibliomixed id="xml-rec">
 <abbrev>XML 1.0</abbrev>Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen, Eve Maler, editors.
 <citetitle><ulink url="http://www.w3.org/TR/REC-xml">Extensible Markup Language (XML) 1.0 Second Edition</ulink></citetitle>.
 W3C (World Wide Web Consortium), 2000.
 </bibliomixed>
 
 <bibliomixed id="xslt-rec">
 <abbrev>XSLT 1.0</abbrev>James Clark, editor.
 <citetitle><ulink url="http://www.w3.org/TR/xslt">XSL Transformations (XSLT) 1.0</ulink></citetitle>.
 W3C (World Wide Web Consortium), 1999.
 </bibliomixed>

 <bibliomixed id="xmlschema-1">
 <abbrev>W3C XML Schema Structures</abbrev>
 Henry S. Thompson, David Beech, Murray Maloney, Noah Mendelsohn, editors.
 <citetitle><ulink url="http://www.w3.org/TR/xmlschema-1/">XML Schema Part 1: Structures</ulink></citetitle>.
  W3C (World Wide Web Consortium), 2001.
 </bibliomixed>

 <bibliomixed id="xmlschema-2">
 <abbrev>W3C XML Schema Datatypes</abbrev>Paul V. Biron, Ashok Malhotra, editors.
 <citetitle><ulink url="http://www.w3.org/TR/xmlschema-2/">XML Schema Part 2: Datatypes</ulink></citetitle>.
  W3C (World Wide Web Consortium), 2001.
 </bibliomixed>

</bibliography>

</article>
<!-- ***** BEGIN LICENSE BLOCK *****
   - Version: MPL 1.1/GPL 2.0/LGPL 2.1
   -
   - The contents of this file are subject to the Mozilla Public License Version
   - 1.1 (the "License"); you may not use this file except in compliance with
   - the License. You may obtain a copy of the License at
   - http://www.mozilla.org/MPL/
   -
   - Software distributed under the License is distributed on an "AS IS" basis,
   - WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
   - for the specific language governing rights and limitations under the
   - License.
   -
   - The Original Code is the Colorer Library.
   -
   - The Initial Developer of the Original Code is
   - Cail Lomecb <cail@nm.ru>.
   - Portions created by the Initial Developer are Copyright (C) 1999-2005
   - the Initial Developer. All Rights Reserved.
   -
   - Contributor(s):
   -
   - Alternatively, the contents of this file may be used under the terms of
   - either the GNU General Public License Version 2 or later (the "GPL"), or
   - the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
   - in which case the provisions of the GPL or the LGPL are applicable instead
   - of those above. If you wish to allow use of your version of this file only
   - under the terms of either the GPL or the LGPL, and not to allow others to
   - use your version of this file under the terms of the MPL, indicate your
   - decision by deleting the provisions above and replace them with the notice
   - and other provisions required by the LGPL or the GPL. If you do not delete
   - the provisions above, a recipient may use your version of this file under
   - the terms of any one of the MPL, the GPL or the LGPL.
   -
   - ***** END LICENSE BLOCK ***** -->