MALAGA VERSION CHANGES Copyright (C) 1996 Bjoern Beutel = Version 7.12, released 2008-02-28 =========================================== malshow now uses cairo for drawing; line thickness grows with font size. The Malaga executables now link the dynamic malaga library, thanks to Ville-Pekka Vainio. When using the command "help", the commands are now sorted columns-first, instead of lines-first. The documentation has changed to reflect the fact that Win32 is no longer "officially" supported. = Version 7.11, released 2007-07-09 =========================================== Malaga may now be licensed by GPL version 2 or any later version. Repaired project file name dialog in Malaga's Emacs mode. The Makefile now uses LDFLAGS for linking, thanks to Michael Piotrowski. = Version 7.10, released 2007-05-27 =========================================== Entering Ctrl-D to exit no longer causes a segmentation violation error. = Version 7.9, released 2006-10-28 ============================================ Added missing UTF-8 validity checks for "libreadline" input and "-input" command line option. Added manpages written by Antti-Juhani Kaijanaho (from Malaga 4.3) and updated them. Changed options "display-line" and "transmit-line" to "display-cmd" and "transmit-cmd", respectively. The old option names can still be used for compatibility. = Version 7.8, released 2006-09-22 ============================================ The options "mor-pruning" and "syn-pruning" are now integer values. A value of 0 means that pruning rules are not called. A value > 0 indicates the minimum number of states needed to call the pruning rule. = Version 7.7, released 2006-09-20 ============================================ Compiling and linking creates more readable and compacter logs. "commands_interactive.c" now includes "stdio.h" and "input.h". A "pruning_rule" can now also be defined for a morphology grammar. The switch for using the pruning rule is called "mor-pruning" for morphology and "syn-pruning" for syntax. = Version 7.6, released 2006-08-31 ============================================ When using "libmalaga", the readline library is no longer needed, thanks to Harri Pitkänen. Update to autoconf 2.60 and libtool 1.5.22, thanks to Harri Pitkänen. Support for specifying DESTDIR during "make install" and "make uninstall", thanks to Flammie Pirinen. = Version 7.5, released 2006-06-18 ============================================ The function "get_value_string()" now returns a "char_t *", so the result can be modified and freed. When the windows of a malshow process have different font sizes, commas could be displayed in the wrong size. This has been corrected. Option "use-display" will be set to "no" if the "DISPLAY" environment is undefined. The configure option "--with-readline" enables fancy command line editing, provided that the GNU readline library is installed. = Version 7.4 ================================================================= Some errors in the Win32 code part have been corrected, thanks to Kai Solehmainen. The "string_t" type has been changed to "const char_t *", which improves source code documentation, optimisation and interaction via the "malaga.h" API. = Version 7.3 ================================================================= An error in the function "strncmp_no_case" has been fixed, thanks to Harri Pitkänen. The error led to incorrect results in the allomorph lexicon search. = Version 7.2 ================================================================= Path selection in malshow's Tree window has been changed. The left mouse button is now used to select a single state. The right mouse button activates a pop-up menu. The Malaga profile may now contain lines "show_indexes: {yes,no}", "hanging_style: {yes,no}", "inline_path: {yes,no}", and "show_tree: {full,no_dead_ends,result_paths}" which set the respective defaults for the malshow windows. Malaga now contains an if-expression, which is equivalent to an if-statement, but part of an expression. The "parallel" statement has been replaced by the "select" statement, which replaces the keyword "parallel" by "select" and the keyword "and" by "or". The new keyword fits better to Malaga's semantics and nicely complements the "choose" statement. For compatibility reasons, the "parallel" statement still exists. In malaga, the commands "ma-line", "sa-line", and their debug counterparts now take the line number as their first argument. If the file name is omitted, the file name of the previous "{ma,sa}-line" or "{ma,sa}-file" command will be used. In mallex, the commands "ga-line" and "debug-ga-line" now expect the line number as their first argument. For these commands, and the commands "ga-file" and "read-constants", the lexicon file name is now optional. If it is omitted, the previous lexicon file name will be used. = Version 7.1 ================================================================= When exporting in Postscript format, the program "malshow" will now only include the Hangul font if there are actually Hangul characters to be exported. The dynamic library "libmalaga.so" now contains a reference to libglib-2.0, so glib-2.0.so is automatically loaded. Pressing Ctrl-D in interactive mode makes malaga and mallex quit now. Previously, the programs sometimes stuck in an endless loop. = Version 7.0 ================================================================= Malaga now supports Unicode via the UTF-8 format. Support for 8-bit character sets (like ISO8859-x) and for KSC5601 has been abandoned. The project option "charset: xxx" has been removed, since the character set is now always UTF-8. Hangul support can now be switched on with the project option "split-hangul-syllables: yes". In malshow, you can now use the mouse wheel to scroll a canvas up and down. In Malaga Emacs mode, a brace that spans multiple lines is now indented correctly even if whitespace is following the opening brace. The program "mallex" doesn't crash anymore when a lexicon file contains named constants. = Version 6.14 ================================================================ From now, the state numbers in the tree display won't count break nodes, since they are no states. You can use the tree display's menu option "Tree/Show State Indexes" to toggle state indexes in the tree window. You can use the path display's menu option "Path/Show State Indexes" to toggle state indexes in the path window. When compiling with GTK+ 2.0 (or later), we do not use our own flicker-free drawing mechanism any more, since GTK+ 2.0 has a general one. The configure script now prefers to compile "malshow" with GTK+ 2.0 (or later). The GTK+ scrollbars are automatic now: they only appear if the window is not wide and/or tall enough. The new "continue" statement is similar to the break statement, but it terminates the current pass of a foreach loop and starts the next pass. = Version 6.13 ================================================================ The alternative syntax for rule definitions and rule calls with braces has been abandoned again. The options "mor-incomplete" and "syn-incomplete" may be used to get results for incompletely parsed input. The command line option "-quoted" makes malaga require quotes around each input line. The quotes are removed prior to analysis. The option "result-list" may be used to print all analysis results as a single list when malaga is used in filter mode or a file is analysed. Even results of different lengths are combined, which is useful in combination with "mor-incomplete" and "syn-incomplete". = Version 6.12 ================================================================ "Categories" have been renamed to "feature structures", since that's what they are. In combi rules, "start" has been renamed to "state" and "next" has been renamed to "link". = Version 6.11 ================================================================ Malaga's internal function error() has been renamed to complain() because the GNU C library "glibc" defines the function error(). Let's hope "complain" will never be used by "glibc". = Version 6.10 ================================================================ When configured, Malaga now prefers to be compiled for GTK+ 1.2, since it has been written for this version. Postscript output will now use Helvetica instead of Times-Roman. = Version 6.9 ================================================================= The debugger commands "step" and "next" will now also stop when a path has terminated and another path is going to be executed. The formal example grammars have been renamed. Their new names reflect the languages which they recognize. = Version 6.8 ================================================================= The expression LIST * NUMBER now returns the first NUMBER elements of LIST, if NUMBER > 0, or the last NUMBER elements of LIST, if NUMBER < 0. Libmalaga now uses the standard versioning scheme supported by libtool. The interface file "malaga.h" also contains versioning information. A new, precompiled lexicon format has been introduced, "prelex", to support distribution of binary lexicons that may be extended by the recipient. = Version 6.7 ================================================================= PostScript may now also exported for Hangul. Furthermore, the PostScript code now uses the metrics from the PostScript fonts, not the screen fonts. = Version 6.6 ================================================================= An alternate pattern for rule definitions and rule calls has been introduced: Instead of "NAME( $ARG1, $ARG2, ... )" you may write "{NAME $ARG1, $ARG2}". = Version 6.5 ================================================================= When reporting an error, malrul now prints where a rule or subrule has been defined or used for the first time. The "print" command may now print not only variables, but also named constants. If "use-display = yes", the output will be displayed in a window of its own. The "finish" debug-command resumes rule execution until a "return" is met or the current rule path is terminated. = Version 6.4 ================================================================= The command "get switches" will print the switches in alphabetical order. The "fail" statement has been renamed to "stop". "fail" is still valid, but it will generate a warning. Rule execution may be interrupted. = Version 6.3 ================================================================= malshow now uses the character set that is selected in the project file line that begins with "char-set:" or "char_set:". malshow's font may now be changed by "font:" in the configuration file. The default font size may be changed by "font_size:" in the configuration file. = Version 6.2 ================================================================= The "debug-state" command now behaves more intuitively. It doesn't switch back to "walk mode" if debugging a new successor rule of a state; instead, it uses the current debug mode. Emacs should now always understand when Malaga wants it to jump to a certain point in a source file. = Version 6.1 ================================================================= Malaga is now also compilable under the Microsoft Win32 API, using the MinGW GCC compiler. The malmake option "-new" causes the whole project to be recompiled. = Version 6.0 ================================================================= In regular expressions, a "?" behind one of the postfix operators "?", "+", "*" makes the operator behave in a "non-greedy" fashion: It will try to match as few characters as possible. Without the "?", the operators will try to match as many characters as possible. The escape char in regular expressions has changed from "!" to "\". Since this is also the escape char in strings, it has to be inserted twice in a regular expression. And if you want to use a "\" in a pattern as a simple character, you have to insert it even four times. = Version 5.10 ================================================================ Depending on the processor endianness, the binary files get the suffix "_l" (for little endian), "_b" (for big endian) or "_c" (for complex schemes). The calling syntax for "malrul", "mallex", "malsym" and "maldump" has changed because the binary file names are no longer allowed as arguments. = Version 5.9 ================================================================= The function length() now also works on string values. The function "substring(STRING, FIRST_INDEX, LAST_INDEX)" returns the substring in STRING that begins at FIRST_INDEX and ends at LAST_INDEX (both inclusive). = Version 5.8 ================================================================= Libtool is now included in the Malaga Package. = Version 5.7 ================================================================= Malaga now uses GNU Libtool to create its libraries. = Version 5.6 ================================================================= "configure.in" now checks for "gtk-config". A 64-bit pointer incompatibility in "tries.c" has been removed. = Version 5.5 ================================================================= The documentation format has been changed from LaTeX to Texinfo. = Version 5.4 ================================================================= In rule files and the lexicon file, default values for constants can be defined via "default @constant := VALUE;" After such a definition, the constant may be redefined via "define @constant := VALUE;", but only if the constant has not used until then. The commands "frame", "up" and "down" have been introduced to select the current frame when debugging. In the project file, any line may end with a comment "#...". All Malaga programs now have a "-help" command line option. The TCL/TK Display program has been removed. Instead, the GTK+ Display program "malshow" has been included. The display-command-line "malshow" is set by default. The configuration option for debug versions has been removed. The system-wide startup file "malagarc" has been removed. = Version 5.3 ================================================================= The "continue" command with a comparison expression has been reworked. It now must be applied at a point in the source where the tested variable is currently defined. Only this variable will be tested, all other variables with the same name will be ignored. This speeds up execution. malaga and mallex read a system wide startup file "${MALAGA}/malagarc" as well as the personal one "${HOME}/.malagarc". The parsing of the lexicon file won't get irritated by execution of the commands "print" or "continue $VAR = VALUE" in debug mode any longer. = Version 5.2 ================================================================= The types "int_t" and "u_int_t" are now set to "int" and "unsigned int", resp. This makes Malaga work with 64-bit architectures where "int" is still 32 bits long. The Hangul KSC encoding has been slightly modified. To generate debug versions of the executables, use the configure option "--enable-debug". = Version 5.1 ================================================================= The type "char_t" is now equal to "char" instead of "unsigned char", to be compliant with Standard C. The option "auto-result" can be used to switch automatic results after analyses on or off. Errors while setting options from the project file or ".malagarc" will be reported. Malaga will now use the cache also in "ma-file" and in morphology analysis from libmalaga. A relative path name as argument for "init_libmalaga" doesn't crash any longer. = Version 5.0 ================================================================= The Tree display now contains "unfinal" nodes: these are nodes of end states that have been removed since they didn't consume all the input. The first parameter of the robust rule now contains the remaining input up to, but not including, the next space. The robust rule now has an optional second parameter that contains all the remaining input. The result statement in a robust_rule may now have two arguments: if it has two arguments, the first must be a prefix of the remaining input, and the second must be the feature structure for this prefix. This enables you to set word-boundaries in a more flexible way. If your end-rule only has one parameter, it is only called at a word boundary, that means, if there is a leading space in the remaining input or all input is consumed. If your end-rule has two parameters, then the second is the input that has not been consumed yet. In this case, your end-rule will be called regardless what the remaining input looks like. You can use the input to decide if you really want to have an analysis result here. This enables you to set word-boundaries in a more flexible way. The error statement may now take any value that evaluates to a string as argument, not just a constant string literal. A new Emacs major mode has been introduced, namely "malaga-project-mode", which is intended to edit Malaga project files. It is an extension of "text-mode"; the commands "malaga", "mallex" and "malmake" are bound to the keys "C-c C-p", "C-c C-l" and "C-c C-r", respectively. The debug command "continue" now allows local breakpoints and local watchpoints: - "c [FILE_NAME] LINE_NUMBER" or "c RULE_NAME" continues until the specified source text position (which is called "local breakpoint") or a global breakpoint is reached. - "c VAR_PATH = VALUE" continues until VAR_PATH (a variable name which may be followed by a path of attribute names and indexes) equals VALUE. The Tcl/Tk variables display now has an easier way to make a variable value (dis)appear: click on the variable name. If a variable should be defined in a pattern match, the new preferred way to do it is to write the variable name BEHIND the pattern segment instead in front of it: ? $x matches ".*": $var1, "en"; The old way is still allowed for backwards compatibility. Malaga now supports the Hangul character set. If the symbol file has been compiled with the option "-hangul", e.g. "malsym xxx.sym -hangul", the symbol file and all files that use it will be encoded with the Hangul character set. If the project file contains a line "char-set: hangul", then "malmake" will execute "malsym" with that option. In malaga and mallex, the option "use-ksc" can be used to use either KSC-5601 encoding or romanized Hangul for output. "mallex" has been reworked and now needs much less process space, which is important because it creates the whole lexicon in main memory before it is dumped to the lexicon file. Malaga values are now pretty-printed if printed in interactive mode. Results and variables may now be printed by the external "display" program or by malaga itself, using the "result" resp. "variable" command. This can be chosen using the "display-output" switch. The "output" command and the "output" option have been deleted, because their work is now done by "result" if "display-output = no". The "print" command does not print all variables of a rule any longer, this will be done by "variables" if "display-output = no". The "print" command can still be used to print the values of single variables or paths. The "result" option has been deleted, so the result output will be printed always after an analysis. In mallex, the command "read-constants" may be used to initialize lexicon constants by reading them from a file. The commands ma-line/sa-line and their "debug" counterparts, "debug-ma-line" and "debug-sa-line" have been introduced. They allow analysing a single word or a single sentence from a file by giving its line number. The ga/sa/ma commands have been renamed (once again), so they form a more regular pattern: ga ga-line ga-file debug-ga debug-ga-line ma ma-line ma-file debug-ma debug-ma-line sa sa-line sa-file debug-sa debug-sa-line The function "transmit" is now available in allomorph rules as well as in combination rules. Positive numbers may be written alternatively like "5L" (= "5"), negative numbers may be written alternatively like "12R" (= "-12"). Output files for morphology/syntax analysis and lexicon generation get default suffix ".out" instead of ".cat". Assignment of list elements to multiple variables introduced: "<$a, $b, $c> := <sym1, sym2, sym3>", or, if you want do define $a, $b, $c: "define <$a, $b, $c> := <sym1, sym2, sym3>". = Version 4.3 ================================================================= The escape char in patterns has changed from "\" to "!". The foreach statement may now be preceded by a label, and the break statement may leave a foreach loop. In a rule set, there may be more than one rule after an "else" keyword, like "rules (A1, A2, A3 else B1, B2, B3 else C1, C2, C3)". The "matches" condition has been reworked. Now it looks like: MatchCond ::= Expr "matches" "(" Segment "," ... "," Segment ")" ";" . Segment ::= [Variable ":" ] Pattern-String . A Pattern string may be any constant String (consisting of literals, constant values and the operator "+"). The value of the String must be a pattern, which may now contain parentheses for grouping. The "input_rule" and "filter_rule" have been renamed to "input_filter" and "output_filter", respectively. The pruning rule now works differently; It has only one parameter, namely a list of feature structures, and must execute a "return" statement with a list of "yes"/"no"-symbols, one for each feature structure in the parameter. The value heap now grows automatically if needed; it can grow indefinitely, so the option "heap-size" is no longer needed and abolished. In the emacs Malaga support file, Malaga mode will also be invoked for files with suffix ".mal", but no longer for files with suffix ".nav" or ".sub". In the emacs Malaga support file, The commands "C-c m", "C-c r", "C-c l" and "C-c d" have changed to "C-c C-p", "C-c C-r", "C-c C-l" and "C-c C-d" resp. In an assert statement, you can now use "!" as a shorthand for "assert". The function "floor(N)", which returns the greatest integer number not greater than N, has been introduced. The repeat statement has been introduced. The expression "LIST / NUMBER" yields LIST without its leftmost NUMBER elements, if NUMBER > 0, or LIST without its rightmost abs(NUMBER) elements, if NUMBER < 0. The function "symbol_name" has been changed to the function "value_string" which can convert every value to a string. The rule set in the initial state or in a result statement may be enclosed in parentheses. Subrules may now have zero parameters. The environment variables "MALAGA_TRANSMIT" and "MALAGA_DISPLAY" have been abandoned. The command lines are now defined by setting the options "transmit" and "display", respectively. The user can now set preferred options for malaga and mallex in the startup file "~/.malagarc". The Operator "RECORD1 * RECORD2" works like "RECORD2 + RECORD1". So ":=*" is useful to add a default attribute to a record if that attribute doesn't exist in the original record. Malaga switches can now have any values, even records and lists. = Version 4.2 ================================================================= The function "transmit" has been introduced, which allows communication with an external process via pipe. Indexes and floats are merged into the Malaga value type "number", which has the same properties as "float". If you want to access the sixth element of $list, write "$list.6" instead of "$list.6L". Negative numbers count from the right end of a list, e.g. "$list.-2". Attention: a dot that is immediately following a number is part of the number, so "$list.2.4" is different from "$list.(2).(4)". The "match" condition, now called "matches" condition, is now written as "VALUE matches PATTERN" instead of "match VALUE = PATTERN". In mallex, the command "debug" has been renamed to "debug-entry". The command "debug-file" has been introduced which generates allomorphs for a file in debug mode. In mallex, the commands "ga-file" and "debug-file" leave their results permanent, so the results can be displayed with "output" or "result". The commands "ma-file" and "sa-file" don't interrupt if an error occurs during file analysis. Instead, the error message is written into the output file and the next item is analysed. The new function "symbol_name" returns the name of a symbol as a string. A condition can be used everywhere a (non-constant) value can be used. The value of a condition used in such a place is "yes" or "no". Conditions are now grouped by ordinary parentheses "()" instead of "{}". A match condition can now be used in every place where an ordinary condition can be used. Exception: If a match condition defines variables, it may not be part of a disjunction or a negation. The pattern in a match condition may contain constants and literal strings; it may contain parentheses and the operators & (for concatenation of subpatterns) and | (for alternatives). They may be only mixed if precedence is indicated by parentheses. A variable definition which subsumes only part of a pattern must be in parentheses: '$x: "A" & "B"' will assign "AB" to $s if it does match, whereas '($x: "A") & "B"' will assign "A" to $x. A variable definition may not be part of an alternative. A lexicon file may now contain constant definitions "define @Name := Value;". The lexicon entries in a lexicon file may now be arbitrary Malaga expressions, i.e. they may contain constants and the operators ".", "+", "-", "*" and "/". The command "ga-line" has been introduced in mallex, which generates allomorphs for a single entry in a lexicon file. Allomorphs can be displayed graphically in mallex using "result". The commands "output" and "result" have been incorporated into mallex as well as the options with the very names. There may now be only one allo_rule in the allo rule file, which may only call subrules and create allomorphs. The allomorphs are now created by the command "result", not "allo". An allo rule file may also contain a filter_rule which is called once for each set of generated allomorph lexicon entries that share the same surface. This rule can be used to join entries with a common surface. In a morphology file, aside the combination rules, there may be a filter_rule (formerly located in the mfil-file) and a robust_rule (formerly called "unknown_rule"). In a syntax file, aside the combination rules, there may be a input_rule (formerly known as "filter_rule" in the ifil-file), a pruning_rule, and a filter_rule (formerly located in the sfil-file). In Emacs Malaga mode, comments that start at the first column will not be indented. The command line options of malaga, mallex etc. now have one-letter-abbreviations, for example "-v" for "-version". The option "alias" has been introduced. It is used to define command line abbreviations. The "paradigm" command has been deleted, so the "generate" statement has been deleted, too. Identifiers may also include the character "|". "include" is now forbidden within rules. The operators "+=" and "-=" have changed to ":=+" and ":=-", resp. They are complemented by the new operators ":=*" and ":=/". The unary prefix operator "-" has been introduced which inverts numbers. Constant values can now also contain parentheses "()", and the operators "+", "-", "*", "/" and ".". The end of a rule may now include the rule name: "end RULE_NAME;" The command "value" has been renamed to "print". It now also accepts indexes in a variable path. The option "sort-records" now has three possible settings: "internal", "alphabetic", and "definition" (as in the symbol-table). Float values may now be preceded by a "-" sign. The operator "value_type()" returns the type of a Malaga value coded as one of the symbols "symbol", "string", "float", "index", "list", and "record". Indexes like 1L or 4R may now be part of Malaga values. They can also be part of a path in an assignment. The "." operator may now be followed by a list <e1, e2, e3> of symbols and/or indexes. This will be interpreted as ".e1.e2.e3". The operator "length()" returns the number of elements in a list as an index, e.g. length(<A, B, C>) = 3L. The statement "choose" may now choose indexes: "choose $index in 6L;" generates paths where $index has values 1L, 2L, ..., 6L. "choose $index in 6R;" generates paths where $index has values 1R, 2R, ..., 6R. The statement "foreach" may now iterate over indexes: "foreach $index in 6L: STATEMENTS end;" executes STATEMENTS where $index is assigned the values 1L, 2L, ..., 6L sequentially. The operator "LIST - INDEX" now removes ONLY the element at position INDEX in LIST. The "remain" part of a "choose" statement has been removed. Where it has been needed, it can be replaced by index iterating and removing elements by position: "choose $Element in $List remain $List" would be replaced by "choose $Index in length($List); define $Element := $List.$Index; $List :=- $Index;" The argument to the function "switch" must now be a symbol instead of a string. The commands "ma" and "sa" without arguments don't enter ma-mode or sa-mode any longer; they re-analyse the last input. Use "ma-mode" and "sa-mode" to enter ma-mode or sa-mode, respectively. = Version 4.1 ================================================================= The functions in libmalaga now return also when an error occurred. In this case, the error message is in "malaga_error". Else, "malaga_error" is NULL. The command "clear-cache", which deletes all wordforms in the cache, has been introduced. You can set switches in malaga and mallex with the option "set switch", and you can query them in rules using the operator "switch". The option "variables" has been introduced, to show Variables automatically in debug mode. Output is now sent to a single graphical display program via pipes. The program command line must be in the environment variable MALAGA_DISPLAY. A subrule can now be called before it is defined. A command "trace" has been added, to show the current call stack. The option "graphics" has been deleted. For textual result output, use the command "output". For graphical results, use the command "result". The commands can be automatically executed after "sa" or "ma". Use the options "result" and "output" for this purpose. The option "cache" has been deleted. Use "set cache-size 0" to deactivate the cache. "result-format" and "unknown-format" are also used for textual output with "ma", "sa" and "result". In "result-format" and "unknown-format", "%n" means the number of states for this analysis. In "result-format", "%r" is the ambiguity-index. "get cache-size" now also shows how many cache entries are used. sa-file, ma-file and ga-file take an additional optional parameter, namely the output file name. malaga and mallex print statistic information when they work in batch mode. analyse_item() in libmalaga now takes an additional argument, which says whether malaga should create an analysis tree. mallex now also reads the project file if it is called via malmake. Renamed option "heapsize" to "heap-size". Implemented a word form cache and option "cache" to switch it on or off. The cache size can be set using the option "cache-size" Replaced option "format" by "allo-format", "result-format" and "unknown-format". libmalaga now reads the "malaga:" option lines from the project file, not the "libmalaga:" lines. It ignores the options that only make sense for malaga. The option "heapsize" has been introduced to set the heapsize to a new value. Lines in the project file that start with "morinfo:" or "syninfo:" will be stored as mor-info or syn-info. In malaga, use the command "info mor" or "info syn" to get this information. In libmalaga, use the function "get_info(grammar_t grammar)". Option lines in included project files ("include:" lines) are now also executed. In the left hand of an assignment, paths can now also include any expressions, like "$var.$var1.($var2.attr) := value;" "sa" now supports sa-mode. "ma" now supports ma-mode. The "output" option has been replaced by the "graphics" option and the "tree" option. The "set" keyword must now be used when setting options that appear in the project file. The "define" keyword must now also be used for constant definitions. The "hidden" option syntax now needs a "+" in front of each symbol to hide, a "-" in front of each symbol to hide no more and a "none" to hide no symbols. TAB in Malaga mode only jumps to first non-blank if the cursor previously was in front of the first non-blank. = Version 4.0 ================================================================= The symbols "yes" and "no" are now defined by the system. A condition that consists only of a value (without condition operator) is tested whether it contains the value "yes" or "no". The former condition "capital" is now a standard function that returns a "yes" or "no" value. A definition of a new variable (formerly an assignment) now needs the keyword "define" in front. It is now called the define-statement. An assignment (formerly a "set"-statement) doesn't need a "set" in front any longer. A test-statement may be introduced by the "?" as well as the new keyword "require". The "next" command has been introduced. It works like "step", but it executes subrules without interruption. The "set" command is now introduced to set options; there are no individual commands for the individual options left. The "get" command is used to get the current settings. The initial state is now described in the format "initial FEAT, rules RULES;" (for combi rules), or "initial rules RULES;" (for other rules). The result statement now displays the result in a TCL/Tk window. This can be changed by the "output" option. The "set()" function has been implemented. It takes one parameter and it converts a list (multi-set) to a set where every element is contained at most one time. The debug commands have been renamed to form a more regular pattern: "debug" (for allomorph rules), "debug-line" (for lexicon lines), "debug-mor" (for morphology combination rules), "debug-mfil" (for morphology output filter rules), "debug-ifil" (for syntax input filter rules), "debug-syn" (for syntax combination rules), "debug-sfil" (for syntax output filter rules), "debug-node" (for analysis states). The keywords "base" and "cat" have been deleted. The "generate" statement now takes an "allo" keyword instead of "base" if the rule can generate a base allomorph. The "allo" statement now looks like: "allo ALLO, FEAT, BASE;". Syntax input filter rules have been introduced. A syntax input filter rule file has the ending ".ifil" and is executed after morphology output filter rules have been executed and before syntax combination rules will do their work. As a consequence, the morphology output filter rules may only use symbols of the symbol file, not of the extended symbol file (since the morphology output filter rules now belong totally to the morphology system). The "filter" command now takes the keywords "mfil", "sfil" and "ifil" instead of "mor" and "syn". The "error" statement now needs a string, namely the error message that it should print. The keyword "final_state_check" has been changed to "end_rule" again, since it IS a rule (although not a combi-rule). The "foreach" statement can now only include one list over which to iterate. This reduces complexity of Malaga statements. The "choose" statement can now assign the remainder to an existing variable: use the form "choose $var1 in $list remain set $rem_var", which will assign the remainder to "$rem_var". The "start" statement has been deleted. Instead, the rule parameters have to be specified behind the rule name, in parentheses, like: "rule ABC( $state, $link, $link_surf ):". Combi rules have 2-4 parameters, namely: state feature structure, link feature structure, link surface (optional) and link index (optional). Pruning rules have 3 parameters: the list of feature structures of the states that have already been tested, the feature structure of the state currently tested, and the list of feature structures of states to be tested later. Filter rules, allo rules and end rules have one parameter. There is no difference between test statements and result statements any longer. Therefore, the "case" statement was superfluous and could be erased. The "parallel" statement has been changed in syntax: instead of the "subrule" keyword IN FRONT of each parallel part, and "and" keyword BETWEEN two parallel parts is now used. Subrules (i.e. Malaga functions) have been introduced. They start with the keyword "subrule" and their parameter list can have any number of parameters. A subrule must return a value via the "result" statement: "result $xyz;". It is called in an expression like "$new := subrule_name( $Param1, $Param2 );". Subrules may nest, but they must not be called recursively. Every subrule must be defined before it is called (no forward declarations are possible). Values can be much bigger now (up to 1 Gigabyte, which is perhaps academic). = Version 3.0 ================================================================= The command "hide" has been renamed to "hidden", it takes an additional first argument: "add", "delete" or "clear". For "add", all subsequent arguments are added to the list of hidden arguments. For "delete", all subsequent arguments are removed from the list of hidden arguments. For "clear", all symbols are removed from the list of hidden arguments. The command "attributes" has been renamed to "sort_records". The command "hangul" now gets a parameter "on" or "off", so command "roman" could be deleted. The command "show" has been renamed to "output". The command "unknown" has been removed, it functionality has been included into command "format". The command "debug-node" takes a state number which you get from the title of a TCL/Tk state window, and executes all rules in debug mode that process this state. The "Disam" package has been removed: the commands "disam" and "prune" are no longer available. The command "value" now supports paths: You can write a series of attributes behind a variable name, e.g. "malaga> value $state.Form.Syn" Filter rules have been introduced. You can have a filter rule system for your morphology and one for your syntax. The morphology filter rule file has to end in ".mfil", the syntax filter rule file has to end in ".sfil". The filter rules are called after the combination rules have been executed. They are similar to the allomorph rules, only that they begin with "filter_rule" instead of "allo_rule", and they get the list of results of the combination rules as their "start" parameter. In the filter rules, you can compare the results, change them and create the new actual analysis results by using the "result ... accept;" statement. Filter rules can use the symbols in the symbol file and the symbols in the extended symbol file. If you include filter rules in your project file or in your command line arguments when calling malaga, the execution of filter rules is switched on by default. You can switch it on or off using the "filter" command. The filter command needs two arguments: the filter rule system type ("mor" or "syn") and one of "on" or "off". The file ending ".sys" (for syntax symbol file) has been changed to ".esym" (for extended symbol file) because the symbols in this file can now also be used by the filter rules. Pruning rules have been introduced. You can have a single pruning rule in your syntax rule file. Before a set of states (which have consumed the same amount of analysis input) is combined with a new link, the pruning rule is called for each state of this set. The rule decides whether the state should be deleted or not. The pruning rule starts with "pruning_rule", and in the "start" statement, it gets a list of three elements as a parameter: the first element is the list of feature structures of the states that have already been examined by the pruning rule, the second argument is the feature structure of the state that is to be examined currently, and the third argument is the list of feature structures of the states that haven't been examined yet. When the pruning rule executes an "accept" statement, the state will be preserved; else it will be killed. In morphology rule files, after the initial state, you can now include a "unknown_cat FEAT;". When robust analysis is activated, an unknown wordform is assigned the feature structure FEAT. Robust analysis is switched on and off with the command "robust". Commands that are included in the project file after "malaga:" or "mallex:" are now also executed in batch mode, so only commands that change settings are allowed here. Command line options for malaga and mallex have been reduced to "-version", "-readable" and "-interactive" for mallex and "-version", "-syntax", "-morphology" for malaga. The "-" operator for lists is the MULTI-SET difference now, whilst the "/" operator for lists is the SET difference. = Version 2.1 ================================================================= The keyword "end-rule" has been renamed to "final_state_check". The character "-" may not be part of symbol names, keywords, variable names... any longer. The new operator "-" can subtract floats and create the difference of multi-sets. The symbol "nil" can be compared to any value, even records, lists and strings. A rule set may contain multiple default rules (rules aa, bb, cc else dd else ee else...) The "set" statement can now set the value of a specified attribute, like "set $state.Form.Mor := $New_Value;". The new assignment operators "+=" and "-=" have been introduced (in set statements only). The statement "set a += b;" is an abbreviation for "set a := a + b"; the analogon holds for "-=". The atomizing operator "* a" now has to be written as "atoms(a)". The inverse operator has been introduced: "multi(a)" returns the multi-symbol whose atomic symbols are equal to a, which must be a list of atomic symbols. An error is reported if that multi symbol doesn't exist. In the symbol table, the "*" is not needed to mark multi-symbols; it is now forbidden. Furthermore, every multi-symbol's symbol list must contain two symbols at least and all multi-symbols need to have different definitions. The output format of the commands "ma-file", "sa-file" and "ga..." can be configured. The commands "format" and "unknown" have been introduced for this purpose. The condition "capital(string)" now tests whether "string" starts with a capital letter. In rule files, global constants can be defined by definitions of the form "@const := CONSTANT". Constants can only be defined outside of rules; they are valid throughout the rule file. The comparison operators "greater", "less_equal" and "greater_equal" compare floating point numbers (like "less"). In Malaga Emacs mode, the mode-line now also includes the name of the project file that is being used. mallex now can also generate readable lexicon files in batch mode, use the command line option "-readable". The lexicon file will then be printed on the standard output stream. Analysis statistics are now printed on the standard output stream. = End of file. ================================================================