Sophie

Sophie

distrib > Fedora > 20 > x86_64 > by-pkgid > 3a1f9d3637f3247d5d88534895c3fa55 > files > 42

emacs-pymacs-0.25-4.fc20.noarch.rpm

.. role:: code(strong)
.. role:: file(literal)

================================================
Handling of boxed comments in various box styles
================================================

.. contents::
.. sectnum::

This page documents the :file:`contrib/rebox/` subdirectory of the
Pymacs distribution.  First install Pymacs from the top-level of the
distribution, this has the side-effect of adjusting a few files in this
directory.  Once this done, return to this directory, then run ``python
setup.py install``.  Also read `Emacs usage`_ below.

Introduction
============

For comments held within boxes, it is painful to fill paragraphs, while
stretching or shrinking the surrounding box "by hand", as needed.  This piece
of Python code eases my life on this.  It may be used interactively from
within Emacs through the Pymacs interface, or in batch as a script which
filters a single region to be reformatted.  I find only fair, while giving
all sources for a package using such boxed comments, to also give the
means I use for nicely modifying comments.  So here they are!

As a user tool
==============

Box styles
----------

First, a quick reminder:

  ======   ===============================================
  Number   Meaning
  ======   ===============================================
  100      Language: unknown
  200      Language: /* and \*\ /
  300      Language: //
  400      Language: #
  500      Language: ;
  600      Language: %
  010      Quality: straight, or 1-wide
  020      Quality: rounded, or 2-wide
  030      Quality: starred, or 3-wide
  040      Quality: starred, or 4-wide
  001      Type: left \|-shaped border
  002      Type: U-shaped border, simple lines
  003      Type: O-shaped border, simple lines
  004      Type: U-shaped border, doubled lines
  005      Type: O-shaped border, doubled lines
  006      Type: [-shaped border, simple lines
  007      Type: [-shaped border, doubled lines
  111      No box at all
  221      Usual simple C comments
  ======   ===============================================

Each supported box style has a number associated with it.  This number is
arbitrary, yet by *convention*, it holds three non-zero digits such the the
hundreds digit roughly represents the programming language, the tens digit
roughly represents a box quality (or weight) and the units digit roughly
a box type (or figure).  An unboxed comment is merely one of box styles.
Language, quality and type are collectively referred to as style attributes.

When rebuilding a boxed comment, attributes are selected independently
of each other.  They may be specified by the digits of the value given
as Emacs commands argument prefix, or as the ``-s`` argument to the
:code:`rebox` script when called from the shell.  If there is no such
prefix, or if the corresponding digit is zero, the attribute is taken
from the value of the default style instead.  If the corresponding digit
of the default style is also zero, than the attribute is recognised and
taken from the actual boxed comment, as it existed before prior to the
command.  The value 1, which is the simplest attribute, is ultimately
taken if the parsing fails.

A programming language is associated with comment delimiters.  Values are
100 for none or unknown, 200 for ``/*`` and ``*/`` as in plain C, 300 for ``//``
as in C++, 400 for ``#`` as in most scripting languages, 500 for ``;`` as in
Lisp, Scheme, assembler and 600 for ``%`` as in TeX, PostScript, Erlang.

Box quality differs according to language. For unknown languages (100) or
for the C language (200), values are 10 for simple, 20 for rounded, and
30 or 40 for starred.  Simple quality boxes (10) use comment delimiters
to left and right of each comment line, and also for the top or bottom
line when applicable. Rounded quality boxes (20) try to suggest rounded
corners in boxes.  Starred quality boxes (40) mostly use a left margin of
asterisks or X'es, and use them also in box surroundings.  For all others
languages, box quality indicates the thickness in characters of the left
and right sides of the box: values are 10, 20, 30 or 40 for 1, 2, 3 or 4
characters wide.  With C++, quality 10 is not useful, it is not allowed.

Box type values are 1 for fully opened boxes for which boxing is done
only for the left and right but not for top or bottom, 2 for half
single lined boxes for which boxing is done on all sides except top,
3 for fully single lined boxes for which boxing is done on all sides,
4 for half double lined boxes which is like type 2 but more bold,
or 5 for fully double lined boxes which is like type 3 but more bold.

The special style 221 is for C comments between a single opening ``/*``
and a single closing ``*/``.  The special style 111 deletes a box.

Batch usage
-----------

Usage is ``rebox [OPTION]... [FILE]``.  By default, FILE is reformatted
to standard output by refilling the comment up to column 79, while
preserving existing boxed comment style.  If FILE is not given, standard
input is read.  Options may be:

  -n         Do not refill the comment inside its box, and ignore -w.
  -s STYLE   Replace box style according to STYLE, as explained above.
  -t         Replace initial sequence of spaces by TABs on each line.
  -v         Echo both the old and the new box styles on standard error.
  -w WIDTH   Try to avoid going over WIDTH columns per line.

So, a single boxed comment is reformatted by invocation. :code:`vi`
users, for example, would need to delimit the boxed comment first,
before executing the ``!}rebox`` command (is this correct? my :code:`vi`
recollection is far away).

Batch usage is also slow, as internal structures have to be reinitialised
at every call.  Producing a box in a single style is fast, but recognising
the previous style requires setting up for all possible styles.

Emacs usage
-----------

For most Emacs language editing modes, refilling does not make sense
outside comments, one may redefine the ``M-q`` command and link it to this
Pymacs module.  For example, I use this in my :file:`.emacs` file::

     (add-hook 'c-mode-hook 'fp-c-mode-routine)
     (defun fp-c-mode-routine ()
       (local-set-key "\M-q" 'rebox-comment))
     (autoload 'rebox-comment "rebox" nil t)
     (autoload 'rebox-region "rebox" nil t)

with a "rebox.el" file having this single line::

     (pymacs-load "Pymacs.rebox")

Install Pymacs from https://github.com/pinard/Pymacs .

The Emacs function :code:`rebox-comment` automatically discovers the extent of
the boxed comment near the cursor, possibly refills the text, then adjusts
the box style.  When this command is executed, the cursor should be within
a comment, or else it should be between two comments, in which case the
command applies to the next comment.  The function :code:`rebox-region` does
the same, except that it takes the current region as a boxed comment.
Both commands obey numeric prefixes to add or remove a box, force a
particular box style, or to prevent refilling of text.  Without such
prefixes, the commands may deduce the current box style from the comment
itself so the style is preserved.

The default style initial value is nil or 0.  It may be preset to
another value through calling :code:`rebox-set-default-style` from Emacs
Lisp, or changed to anything else though using a negative value for a
prefix, in which case the default style is set to the absolute value of
the prefix.

A ``C-u`` prefix avoids refilling the text, but forces using the default
box style.  ``C-u -`` lets the user interact to select one attribute at
a time.

Adding new styles
-----------------

Let's suppose you want to add your own boxed comment style, say::

    //--------------------------------------------+
    // This is the style mandated in our company.
    //--------------------------------------------+

You might modify :file:`rebox.py` but then, you will have to edit
it whenever you get a new release of :file:`pybox.py`.  Emacs users
might modify their :file:`.emacs` file or their :file:`rebox.el`
bootstrap, if they use one.  In either cases, after the ``(pymacs-load
"Pymacs.rebox")`` line, merely add::

    (rebox-Template NNN MMM ["//-----+"
                             "// box  "
                             "//-----+"])

If you use the :code:`rebox` script rather than Emacs, the simplest is
to make your own.  This is easy, as it is very small.  For example,
the above style could be implemented by using this script instead of
:code:`rebox`::

    #!/usr/bin/env python
    import sys
    from Pymacs.Rebox import rebox
    rebox.Template(226, 325, ('//-----+',
                              '// box  ',
                              '//-----+'))
    rebox.main(*sys.argv[1:])

In all cases, NNN is the style three-digit number, with no zero digit.
Pick any free style number, you are safe with 911 and up.  MMM is the
recognition priority, only used to disambiguate the style of a given boxed
comments, when it matches many styles at once.  Try something like 400.
Raise or lower that number as needed if you observe false matches.

On average, the template uses three lines of equal length.  Do not worry if
this implies a few trailing spaces, they will be cleaned up automatically
at box generation time.  The first line or the third line may be omitted
to create vertically opened boxes.  But the middle line may not be omitted,
it ought to include the word ``box``, which will get replaced by your actual
comment.  If the first line is shorter than the middle one, it gets merged
at the start of the comment.  If the last line is shorter than the middle
one, it gets merged at the end of the comment and is refilled with it.

As a Pymacs example
===================

This example tool comes in two parts: a batch script :file:`rebox` and a
:code:`Pymacs.rebox` module.  Go to the :file:`contrib/rebox/` directory
of the distribution and use ``python setup.py install`` there.  To check
that both are properly installed, type ``rebox </dev/null`` in a shell;
you should not receive any output nor see any error.

The problem
------------

For comments held within boxes, it is painful to fill paragraphs, while
stretching or shrinking the surrounding box *by hand*, as needed.
This piece of Python code eases my life on this.  It may be used
interactively from within Emacs through the Pymacs interface, or in
batch as a script which filters a single region to be reformatted.

In batch, the reconstruction of boxes is driven by command options and
arguments and expects a complete, self-contained boxed comment from
a file.  Emacs function :code:`rebox-region` also presumes that the
region encloses a single boxed comment.  Emacs :code:`rebox-comment` is
different, as it has to chase itself the extent of the surrounding boxed
comment.

Python side
-----------

The Python code is too big to be inserted in this documentation:
see file :file:`Pymacs/rebox.py` in the Pymacs distribution.  You
will observe in the code that Pymacs specific features are used
exclusively from within the :code:`pymacs_load_hook` function and the
:code:`Emacs_Rebox` class.  In batch mode, :code:`Pymacs` is not even
imported.  Here, we mean to discuss some of the design choices in the
context of Pymacs.

In batch mode, as well as with :code:`rebox-region`, the text to
handle is turned over to Python, and fully processed in Python, with
practically no Pymacs interaction while the work gets done.  On the
other hand, :code:`rebox-comment` is rather Pymacs intensive: the
comment boundaries are chased right from the Emacs buffer, as directed
by the function :code:`Emacs_Rebox.find_comment`.  Once the boundaries
are found, the remainder of the work is essentially done on the Python
side.

Once the boxed comment has been reformatted in Python, the
old comment is removed in a single delete operation, the new
comment is inserted in a second operation, this occurs in
:code:`Emacs_Rebox.process_emacs_region`.  But by doing so, if point
was within the boxed comment before the reformatting, its precise
position is lost.  To well preserve point, Python might have driven all
reformatting details directly in the Emacs buffer.  We really preferred
doing it all on the Python side: as we gain legibility by expressing the
algorithms in pure Python, the same Python code may be used in batch or
interactively, and we avoid the slowdown that would result from heavy
use of Emacs services.

To avoid completely loosing point, I kludged a :code:`Marker` class,
which goal is to estimate the new value of point from the old.
Reformatting may change the amount of white space, and either delete or
insert an arbitrary number characters meant to draw the box.  The idea
is to initially count the number of characters between the beginning
of the region and point, while ignoring any problematic character.
Once the comment has been put back in a box, point is advanced from
the beginning of the region until we get the same count of characters,
skipping all problematic characters.  This :code:`Marker` class works
fully on the Python side, it does not involve Pymacs at all, but it does
solve a problem that resulted from my choice of keeping the data on the
Python side instead of handling it directly in the Emacs buffer.

We want a comment reformatting to appear as a single operation, in the
context of Emacs Undo.  The method :code:`Emacs_Rebox.clean_undo_after`
handles the general case for this.  Not that we do so much in
practice: a reformatting implies one :code:`delete-region` and
one :code:`insert`, and maybe some other little adjustments at
:code:`Emacs_Rebox.find_comment` time.  Even if this method scans and
modifies an Emacs Lisp list directly in the Emacs memory, the code doing
this stays neat and legible.  However, I found out that the undo list
may grow quickly when the Emacs buffer use markers, with the consequence
of making this routine so Pymacs intensive that most of the CPU is spent
there.  I rewrote that routine in Emacs Lisp so it executes in a single
Pymacs interaction.

Function :code:`Emacs_Rebox.remainder_of_line` could have been
written in Python, but it was probably not worth going away from this
one-liner in Emacs Lisp.  Also, given this routine is often called by
:code:`find_comment`, a few Pymacs protocol interactions are spared this
way.  This function is useful when there is a need to apply a regular
expression already compiled on the Python side, it is probably better
fetching the line from Emacs and do the pattern match on the Python
side, than transmitting the source of the regular expression to Emacs
for it to compile and apply it.

For refilling, I could have either used the refill algorithm built
within in Emacs, programmed a new one in Python, or relied on Ross
Paterson's :code:`fmt`, distributed by GNU and available on most
Linuxes.  In fact, :code:`refill_lines` prefers the latter.  My own
Emacs setup is such that the built-in refill algorithm is *already*
overridden by GNU :code:`fmt`, and it really does a much better job.
Experience taught me that calling an external program is fast enough
to be very bearable, even interactively.  If Python called Emacs to
do the refilling, Emacs would itself call GNU :code:`fmt` in my case,
I preferred that Python calls GNU :code:`fmt` directly.  I could have
reprogrammed GNU :code:`fmt` in Python.  Despite interesting, this is an
uneasy project: :code:`fmt` implements the Knuth refilling algorithm,
which depends on dynamic programming techniques; Ross did carefully fine
tune them, and took care of many details.  If GNU :code:`fmt` fails,
for not being available, say, :code:`refill_lines` falls back on a dumb
refilling algorithm, which is better than none.

Emacs side
----------

The Emacs recipe appears under the `Emacs usage`_ section, above.

History
=======

I first observed rounded corners, as in style 223 boxes, in code from
Warren Tucker, a previous maintainer of the :code:`shar` package, circa
1980.

Except for very special files, I carefully avoided boxed comments for
real work, as I found them much too hard to maintain.  My friend Paul
Provost was working at Taarna, a computer graphics place, which had
boxes as part of their coding standards.  He asked that we try something
to get him out of his misery, and this is how :file:`rebox.el` was
originally written.  I did not plan to use it for myself, but Paul was
so enthusiastic that I timidly started to use boxes in my things, very
little at first, but more and more as time passed, still in doubt that
it was a good move.  Later, many friends spontaneously started to use
this tool for real, some being very serious workers.  This convinced me
that boxes are acceptable, after all.

I do not use boxes much with Python code.  It is so legible that boxing
is not that useful.  Vertical white space is less necessary, too.
I even often avoid white lines within functions.  Comments appear
prominent enough when using highlighting editors like Emacs or nice
printer tools like :code:`enscript`.

After Emacs could be extended with Python, in 2001, I translated
:file:`rebox.el` into :file:`rebox.py`, and added the facility to use it
as a batch script.  The least old copy I could find of :file:`rebox.el`
is also provided here, to ease pondering and comparisons with the Python
translation and adaptation.