.. role:: code(strong) .. role:: file(literal) ================================================ Handling of boxed comments in various box styles ================================================ .. contents:: .. sectnum:: This page documents the :file:`contrib/rebox/` subdirectory of the Pymacs distribution. First install Pymacs from the top-level of the distribution, this has the side-effect of adjusting a few files in this directory. Once this done, return to this directory, then run ``python setup.py install``. Also read `Emacs usage`_ below. Introduction ============ For comments held within boxes, it is painful to fill paragraphs, while stretching or shrinking the surrounding box "by hand", as needed. This piece of Python code eases my life on this. It may be used interactively from within Emacs through the Pymacs interface, or in batch as a script which filters a single region to be reformatted. I find only fair, while giving all sources for a package using such boxed comments, to also give the means I use for nicely modifying comments. So here they are! As a user tool ============== Box styles ---------- First, a quick reminder: ====== =============================================== Number Meaning ====== =============================================== 100 Language: unknown 200 Language: /* and \*\ / 300 Language: // 400 Language: # 500 Language: ; 600 Language: % 010 Quality: straight, or 1-wide 020 Quality: rounded, or 2-wide 030 Quality: starred, or 3-wide 040 Quality: starred, or 4-wide 001 Type: left \|-shaped border 002 Type: U-shaped border, simple lines 003 Type: O-shaped border, simple lines 004 Type: U-shaped border, doubled lines 005 Type: O-shaped border, doubled lines 006 Type: [-shaped border, simple lines 007 Type: [-shaped border, doubled lines 111 No box at all 221 Usual simple C comments ====== =============================================== Each supported box style has a number associated with it. This number is arbitrary, yet by *convention*, it holds three non-zero digits such the the hundreds digit roughly represents the programming language, the tens digit roughly represents a box quality (or weight) and the units digit roughly a box type (or figure). An unboxed comment is merely one of box styles. Language, quality and type are collectively referred to as style attributes. When rebuilding a boxed comment, attributes are selected independently of each other. They may be specified by the digits of the value given as Emacs commands argument prefix, or as the ``-s`` argument to the :code:`rebox` script when called from the shell. If there is no such prefix, or if the corresponding digit is zero, the attribute is taken from the value of the default style instead. If the corresponding digit of the default style is also zero, than the attribute is recognised and taken from the actual boxed comment, as it existed before prior to the command. The value 1, which is the simplest attribute, is ultimately taken if the parsing fails. A programming language is associated with comment delimiters. Values are 100 for none or unknown, 200 for ``/*`` and ``*/`` as in plain C, 300 for ``//`` as in C++, 400 for ``#`` as in most scripting languages, 500 for ``;`` as in Lisp, Scheme, assembler and 600 for ``%`` as in TeX, PostScript, Erlang. Box quality differs according to language. For unknown languages (100) or for the C language (200), values are 10 for simple, 20 for rounded, and 30 or 40 for starred. Simple quality boxes (10) use comment delimiters to left and right of each comment line, and also for the top or bottom line when applicable. Rounded quality boxes (20) try to suggest rounded corners in boxes. Starred quality boxes (40) mostly use a left margin of asterisks or X'es, and use them also in box surroundings. For all others languages, box quality indicates the thickness in characters of the left and right sides of the box: values are 10, 20, 30 or 40 for 1, 2, 3 or 4 characters wide. With C++, quality 10 is not useful, it is not allowed. Box type values are 1 for fully opened boxes for which boxing is done only for the left and right but not for top or bottom, 2 for half single lined boxes for which boxing is done on all sides except top, 3 for fully single lined boxes for which boxing is done on all sides, 4 for half double lined boxes which is like type 2 but more bold, or 5 for fully double lined boxes which is like type 3 but more bold. The special style 221 is for C comments between a single opening ``/*`` and a single closing ``*/``. The special style 111 deletes a box. Batch usage ----------- Usage is ``rebox [OPTION]... [FILE]``. By default, FILE is reformatted to standard output by refilling the comment up to column 79, while preserving existing boxed comment style. If FILE is not given, standard input is read. Options may be: -n Do not refill the comment inside its box, and ignore -w. -s STYLE Replace box style according to STYLE, as explained above. -t Replace initial sequence of spaces by TABs on each line. -v Echo both the old and the new box styles on standard error. -w WIDTH Try to avoid going over WIDTH columns per line. So, a single boxed comment is reformatted by invocation. :code:`vi` users, for example, would need to delimit the boxed comment first, before executing the ``!}rebox`` command (is this correct? my :code:`vi` recollection is far away). Batch usage is also slow, as internal structures have to be reinitialised at every call. Producing a box in a single style is fast, but recognising the previous style requires setting up for all possible styles. Emacs usage ----------- For most Emacs language editing modes, refilling does not make sense outside comments, one may redefine the ``M-q`` command and link it to this Pymacs module. For example, I use this in my :file:`.emacs` file:: (add-hook 'c-mode-hook 'fp-c-mode-routine) (defun fp-c-mode-routine () (local-set-key "\M-q" 'rebox-comment)) (autoload 'rebox-comment "rebox" nil t) (autoload 'rebox-region "rebox" nil t) with a "rebox.el" file having this single line:: (pymacs-load "Pymacs.rebox") Install Pymacs from https://github.com/pinard/Pymacs . The Emacs function :code:`rebox-comment` automatically discovers the extent of the boxed comment near the cursor, possibly refills the text, then adjusts the box style. When this command is executed, the cursor should be within a comment, or else it should be between two comments, in which case the command applies to the next comment. The function :code:`rebox-region` does the same, except that it takes the current region as a boxed comment. Both commands obey numeric prefixes to add or remove a box, force a particular box style, or to prevent refilling of text. Without such prefixes, the commands may deduce the current box style from the comment itself so the style is preserved. The default style initial value is nil or 0. It may be preset to another value through calling :code:`rebox-set-default-style` from Emacs Lisp, or changed to anything else though using a negative value for a prefix, in which case the default style is set to the absolute value of the prefix. A ``C-u`` prefix avoids refilling the text, but forces using the default box style. ``C-u -`` lets the user interact to select one attribute at a time. Adding new styles ----------------- Let's suppose you want to add your own boxed comment style, say:: //--------------------------------------------+ // This is the style mandated in our company. //--------------------------------------------+ You might modify :file:`rebox.py` but then, you will have to edit it whenever you get a new release of :file:`pybox.py`. Emacs users might modify their :file:`.emacs` file or their :file:`rebox.el` bootstrap, if they use one. In either cases, after the ``(pymacs-load "Pymacs.rebox")`` line, merely add:: (rebox-Template NNN MMM ["//-----+" "// box " "//-----+"]) If you use the :code:`rebox` script rather than Emacs, the simplest is to make your own. This is easy, as it is very small. For example, the above style could be implemented by using this script instead of :code:`rebox`:: #!/usr/bin/env python import sys from Pymacs.Rebox import rebox rebox.Template(226, 325, ('//-----+', '// box ', '//-----+')) rebox.main(*sys.argv[1:]) In all cases, NNN is the style three-digit number, with no zero digit. Pick any free style number, you are safe with 911 and up. MMM is the recognition priority, only used to disambiguate the style of a given boxed comments, when it matches many styles at once. Try something like 400. Raise or lower that number as needed if you observe false matches. On average, the template uses three lines of equal length. Do not worry if this implies a few trailing spaces, they will be cleaned up automatically at box generation time. The first line or the third line may be omitted to create vertically opened boxes. But the middle line may not be omitted, it ought to include the word ``box``, which will get replaced by your actual comment. If the first line is shorter than the middle one, it gets merged at the start of the comment. If the last line is shorter than the middle one, it gets merged at the end of the comment and is refilled with it. As a Pymacs example =================== This example tool comes in two parts: a batch script :file:`rebox` and a :code:`Pymacs.rebox` module. Go to the :file:`contrib/rebox/` directory of the distribution and use ``python setup.py install`` there. To check that both are properly installed, type ``rebox </dev/null`` in a shell; you should not receive any output nor see any error. The problem ------------ For comments held within boxes, it is painful to fill paragraphs, while stretching or shrinking the surrounding box *by hand*, as needed. This piece of Python code eases my life on this. It may be used interactively from within Emacs through the Pymacs interface, or in batch as a script which filters a single region to be reformatted. In batch, the reconstruction of boxes is driven by command options and arguments and expects a complete, self-contained boxed comment from a file. Emacs function :code:`rebox-region` also presumes that the region encloses a single boxed comment. Emacs :code:`rebox-comment` is different, as it has to chase itself the extent of the surrounding boxed comment. Python side ----------- The Python code is too big to be inserted in this documentation: see file :file:`Pymacs/rebox.py` in the Pymacs distribution. You will observe in the code that Pymacs specific features are used exclusively from within the :code:`pymacs_load_hook` function and the :code:`Emacs_Rebox` class. In batch mode, :code:`Pymacs` is not even imported. Here, we mean to discuss some of the design choices in the context of Pymacs. In batch mode, as well as with :code:`rebox-region`, the text to handle is turned over to Python, and fully processed in Python, with practically no Pymacs interaction while the work gets done. On the other hand, :code:`rebox-comment` is rather Pymacs intensive: the comment boundaries are chased right from the Emacs buffer, as directed by the function :code:`Emacs_Rebox.find_comment`. Once the boundaries are found, the remainder of the work is essentially done on the Python side. Once the boxed comment has been reformatted in Python, the old comment is removed in a single delete operation, the new comment is inserted in a second operation, this occurs in :code:`Emacs_Rebox.process_emacs_region`. But by doing so, if point was within the boxed comment before the reformatting, its precise position is lost. To well preserve point, Python might have driven all reformatting details directly in the Emacs buffer. We really preferred doing it all on the Python side: as we gain legibility by expressing the algorithms in pure Python, the same Python code may be used in batch or interactively, and we avoid the slowdown that would result from heavy use of Emacs services. To avoid completely loosing point, I kludged a :code:`Marker` class, which goal is to estimate the new value of point from the old. Reformatting may change the amount of white space, and either delete or insert an arbitrary number characters meant to draw the box. The idea is to initially count the number of characters between the beginning of the region and point, while ignoring any problematic character. Once the comment has been put back in a box, point is advanced from the beginning of the region until we get the same count of characters, skipping all problematic characters. This :code:`Marker` class works fully on the Python side, it does not involve Pymacs at all, but it does solve a problem that resulted from my choice of keeping the data on the Python side instead of handling it directly in the Emacs buffer. We want a comment reformatting to appear as a single operation, in the context of Emacs Undo. The method :code:`Emacs_Rebox.clean_undo_after` handles the general case for this. Not that we do so much in practice: a reformatting implies one :code:`delete-region` and one :code:`insert`, and maybe some other little adjustments at :code:`Emacs_Rebox.find_comment` time. Even if this method scans and modifies an Emacs Lisp list directly in the Emacs memory, the code doing this stays neat and legible. However, I found out that the undo list may grow quickly when the Emacs buffer use markers, with the consequence of making this routine so Pymacs intensive that most of the CPU is spent there. I rewrote that routine in Emacs Lisp so it executes in a single Pymacs interaction. Function :code:`Emacs_Rebox.remainder_of_line` could have been written in Python, but it was probably not worth going away from this one-liner in Emacs Lisp. Also, given this routine is often called by :code:`find_comment`, a few Pymacs protocol interactions are spared this way. This function is useful when there is a need to apply a regular expression already compiled on the Python side, it is probably better fetching the line from Emacs and do the pattern match on the Python side, than transmitting the source of the regular expression to Emacs for it to compile and apply it. For refilling, I could have either used the refill algorithm built within in Emacs, programmed a new one in Python, or relied on Ross Paterson's :code:`fmt`, distributed by GNU and available on most Linuxes. In fact, :code:`refill_lines` prefers the latter. My own Emacs setup is such that the built-in refill algorithm is *already* overridden by GNU :code:`fmt`, and it really does a much better job. Experience taught me that calling an external program is fast enough to be very bearable, even interactively. If Python called Emacs to do the refilling, Emacs would itself call GNU :code:`fmt` in my case, I preferred that Python calls GNU :code:`fmt` directly. I could have reprogrammed GNU :code:`fmt` in Python. Despite interesting, this is an uneasy project: :code:`fmt` implements the Knuth refilling algorithm, which depends on dynamic programming techniques; Ross did carefully fine tune them, and took care of many details. If GNU :code:`fmt` fails, for not being available, say, :code:`refill_lines` falls back on a dumb refilling algorithm, which is better than none. Emacs side ---------- The Emacs recipe appears under the `Emacs usage`_ section, above. History ======= I first observed rounded corners, as in style 223 boxes, in code from Warren Tucker, a previous maintainer of the :code:`shar` package, circa 1980. Except for very special files, I carefully avoided boxed comments for real work, as I found them much too hard to maintain. My friend Paul Provost was working at Taarna, a computer graphics place, which had boxes as part of their coding standards. He asked that we try something to get him out of his misery, and this is how :file:`rebox.el` was originally written. I did not plan to use it for myself, but Paul was so enthusiastic that I timidly started to use boxes in my things, very little at first, but more and more as time passed, still in doubt that it was a good move. Later, many friends spontaneously started to use this tool for real, some being very serious workers. This convinced me that boxes are acceptable, after all. I do not use boxes much with Python code. It is so legible that boxing is not that useful. Vertical white space is less necessary, too. I even often avoid white lines within functions. Comments appear prominent enough when using highlighting editors like Emacs or nice printer tools like :code:`enscript`. After Emacs could be extended with Python, in 2001, I translated :file:`rebox.el` into :file:`rebox.py`, and added the facility to use it as a batch script. The least old copy I could find of :file:`rebox.el` is also provided here, to ease pondering and comparisons with the Python translation and adaptation.