Specifications of SigScheme =========================== General ------- 64-bit data models ~~~~~~~~~~~~~~~~~~ Supports LL64, LLP64, LP64 and ILP64 on both storage-fatty and storage-compact. Addressable memory space ~~~~~~~~~~~~~~~~~~~~~~~~ Ordinary storage implementation can address any Scheme object scattered on whole memory space. Both storage-fatty and storage-compact have no limitation on any 32 and 64-bit data models. But it may be limited if a storage implementation is designed to do so for some specific advantages, as like GNU Emacs' 28-bit tagged pointer does. Integer range ~~~~~~~~~~~~~ Current implementation only supports fixnum, and its range varies by the user-selected underlying storage implementation. The range can be known via R6RS (R5.91RS) compatible `(fixnum-width)`, `(least-fixnum)` and `(greatest-fixnum)`. R5RS conformance ---------------- Proper tail recursion ~~~~~~~~~~~~~~~~~~~~~ Supported. But the conformance of `eval` procedure is uncertain. See the comments of `scm_p_eval()` and `rec-by-eval` of `test-tail-rec.scm` for further information about `eval`. Continuations ~~~~~~~~~~~~~ Limited to nested use due to its setjmp/longjmp implementation. If a continuation that is not an ancestor of current continuation called, all continuation objects lying between the curent and the common ancestor of the destination are invalidated. Calling an invalidated continuation object causes an error. Hygienic macros ~~~~~~~~~~~~~~~ The hygienic macros are fully supported. But although the macro expansion engine itself works well and can be expected as R5RS-conformant, its integration into SigScheme is not fully validated yet. It is likely having a problem on identifier references. So the feature is disabled by default on most configurations. In addition to the validity problem, there is an architectural inefficiency. Since SigScheme development had been started as macro-less naive system, it lacks the concepts 'compilation', 'macro-expansion phase' and 'lexical environment'. So current SigScheme implementation adopted runtime macro expansion. A macro form is kept untransformed in program, and expanded immediately before each evaluation. For example, following code causes macro expansion on each `map-caddr` call. .Macro expansion inefficiency ---------------------------------------------------------------- (define-syntax compose-internal (syntax-rules () ((compose-internal f x) (f x)) ((compose-internal f g ...) (f (compose-internal g ...))))) (define-syntax compose (syntax-rules () ((compose) values) ((compose f) f) ((compose f g ...) (lambda (x) (compose-internal f g ... x))))) (define map-caddr (lambda (lst) (map (compose car cdr cdr) lst))) ---------------------------------------------------------------- Since it is considerably inefficient in many cases, keep in mind the expansion cost. The inefficiency problem is expected to be resolved in SigScheme 0.9.0. Numbers ~~~~~~~ SigScheme supports only the integer part of the numerical tower. Literals ^^^^^^^^ SigScheme recognizes only these limited part of numerical forms of "7.1.1 Lexical structure" section of R5RS. Other valid R5RS forms for numbers produce errors. ---------------------------------------------------------------- <number> --> <num 2>| <num 8> | <num 10>| <num 16> <num R> --> <prefix R> <complex R> <complex R> --> <real R> <real R> --> <sign> <ureal R> <ureal R> --> <uinteger R> <uinteger R> --> <digit R>+ #* ;; '#' must not occur <prefix R> --> <radix R> <sign> --> <empty> | + | - <radix 2> --> #b <radix 8> --> #o <radix 10> --> <empty> | #d <radix 16> --> #x <digit 2> --> 0 | 1 <digit 8> --> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 <digit 10> --> <digit> <digit 16> --> <digit 10> | a | b | c | d | e | f <digit> --> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ---------------------------------------------------------------- SigScheme accepts only lower case alphabets as radices as follows. But hexadecimal digits can be written as either lower or upper. ---------------------------------------------------------------- #b11 ==> 3 #B11 ==> error #xa1 ==> 161 #Xa1 ==> error #xAb ==> 171 ---------------------------------------------------------------- SigScheme uses a fixed-size buffer for number literals parsing. Due to the implementation, it can accept only one optional '0' prefix for maximum-length binary number literals. Two or more '0' prefixes causes an error as follows. ---------------------------------------------------------------- ;; storage-compact on ILP32 env (greatest-fixnum) ==> 2147483647 #b11111111000000001111111100000000 ==> 4278255360 #b011111111000000001111111100000000 ==> 4278255360 #b0011111111000000001111111100000000 ==> error #b00011111111000000001111111100000000 ==> error ---------------------------------------------------------------- Optional procedures ^^^^^^^^^^^^^^^^^^^ The procedures '-' and '/' support following optional form. ---------------------------------------------------------------- 6.2.5 Numerical operations optional procedure: - z1 z2 ... optional procedure: / z1 z2 ... ---------------------------------------------------------------- Characters ~~~~~~~~~~ All character category-sensitive procedures and predicates (such as char-upcase) work correctly only in ASCII range. i.e. Neither Unicode processing specified in R6RS nor other non-Unicode multibyte character processing are supported in such procedures/predicates. Non-ASCII charcter acceptance ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `integer->char` only accepts valid characters of current character codec. If no multibyte character codec is enabled on configuration, it accepts 0-255. If no multibyte character codec is enabled on configuration, character literal only covers ASCII. If UTF-8 codec is enabled and is the current codec, character literal covers all valid Unicode charcters. Whitespace charcters ~~~~~~~~~~~~~~~~~~~~ SigScheme treats vertical tab (0x0b) as a white space charcter although R5RS `char-whitespace?` does not cover it. ---------------------------------------------------------------- R5RS: 6.3.4 Characters The whitespace characters are space, tab, line feed, form feed, and carriage return. ---------------------------------------------------------------- ---------------------------------------------------------------- R6RS Standard Libraries: 1.1 Characters A character is whitespace if it is in one of the space, line, or paragraph separator categories (Zs, Zl or Zp), or if is U+0009 (Horizontal tabulation), U+000A (Line feed), U+000B (Vertical tabulation), U+000C (Form feed), or U+000D (Carriage return). ---------------------------------------------------------------- Case-insensitive character comparison ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SigScheme's case-insensitive comparison conforms to the foldcase'ed comparison described in R6RS and SRFI-13, although R5RS does not specify comparison between alphabetic and non-alphabetic char. See the description in sigschemeinternal.h for further details. Case-sensitive identifiers ~~~~~~~~~~~~~~~~~~~~~~~~~~ SigScheme does distinguish letter case in indentifiers. Although case insensitivity is required in R5RS as follows, it is hard to accept for the our application. ---------------------------------------------------------------- 2. Lexical conventions Upper and lower case forms of a letter are never distinguished except within character and string constants. For example, `Foo' is the same identifier as `FOO', and #x1AB is the same number as #X1ab. ---------------------------------------------------------------- Constant string ~~~~~~~~~~~~~~~ SigScheme treats string literals as constant as specified in R5RS. .constant string ================================================================ sscm> (string-set! "foo" 0 #\F) Error: in string-set!: attempted to modify immutable string: "foo" sscm> (string-set! (string-copy "foo") 0 #\F) "Foo" ================================================================ Constant list ~~~~~~~~~~~~~ SigScheme inhibits modification of constant list object by default as specified in R5RS, if the storage implementation suports it. storage-fatty supports it, but storage-compact does not due to no bit space for pair object. The behavior can be changed by `SCM_CONST_LIST_LITERAL`. ---------------------------------------------------------------- 4.1.2 Literal expressions `(quote <datum>)' may be abbreviated as '<datum>. The two notations are equivalent in all respects. 'a ==> a '#(a b c) ==> #(a b c) '() ==> () '(+ 1 2) ==> (+ 1 2) '(quote a) ==> (quote a) ''a ==> (quote a) As noted in section 3.4 Storage model, it is an error to alter a constant (i.e. the value of a literal expression) using a mutation procedure like `set-car!' or `string-set!'. 6.3.2 Pairs and lists procedure: set-car! pair obj Stores obj in the car field of pair. The value returned by `set-car!' is unspecified. (define (g) '(constant-list)) (set-car! (g) 3) ==> error ---------------------------------------------------------------- Constant vector ~~~~~~~~~~~~~~~ SigScheme inhibits modification of constant vector object by default as specified in R5RS, if the storage implementation suports it. storage-fatty supports it, but storage-compact is not yet. The behavior can be changed by `SCM_CONST_VECTOR_LITERAL`. ---------------------------------------------------------------- 6.3.6 Vectors procedure: vector-set! vector k obj (vector-set! '#(0 1 2) 1 "doe") ==> error ; constant vector ---------------------------------------------------------------- Quote-less null list ~~~~~~~~~~~~~~~~~~~~ SigScheme allows quote-less null list by default for convenience and performance. But it can be error as specified in R5RS, when `SCM_STRICT_R5RS` is enabled. .SCM_STRICT_R5RS disabled ================================================================ sscm> (null? ()) #t ================================================================ .SCM_STRICT_R5RS enabled ================================================================ sscm> (null? ()) Error: eval: () is not a valid R5RS form. use '() instead ================================================================ Quote-less vector literal ~~~~~~~~~~~~~~~~~~~~~~~~~ Sigscheme inhibits quote-less vector literal by default, as specified in R5RS. The behavior can be changed by `SCM_STRICT_VECTOR_FORM`. ---------------------------------------------------------------- 6.3.6 Vectors Vectors are written using the notation #(obj ...). For example, a vector of length 3 containing the number zero in element 0, the list `(2 2 2 2)' in element 1, and the string `"Anna"' in element 2 can be written as following: #(0 (2 2 2 2) "Anna") Note that this is the external representation of a vector, not an expression evaluating to a vector. Like list constants, vector constants must be quoted: '#(0 (2 2 2 2) "Anna") ==> #(0 (2 2 2 2) "Anna") ---------------------------------------------------------------- .vector literals ================================================================ sscm> #(1 2 3) Error: eval: #() is not a valid R5RS form. use '#() instead sscm> '#(1 2 3) #(1 2 3) ================================================================ Environment specifiers ~~~~~~~~~~~~~~~~~~~~~~ `(null-environment)` and `(scheme-report-environment)` does not return correct environemnt specified in R5RS. Current implementation returns same object of `(interaction-environment)`. Internal definitions ~~~~~~~~~~~~~~~~~~~~ SigScheme strictly conforms to the 'internal definitions' defined in R5RS (cited below) if `SCM_STRICT_DEFINE_PLACEMENT` is enabled (default). It can be disabled to get the syntax loosen, shrink the footprint and reduce runtime cost. ---------------------------------------------------------------- 5.2.2 Internal definitions Definitions may occur at the beginning of a <body> (that is, the body of a lambda, let, let*, letrec, let-syntax, or letrec-syntax expression or that of a definition of an appropriate form). Such definitions are known as internal definitions as opposed to the top level definitions described above. ---------------------------------------------------------------- Superfluous arguments ~~~~~~~~~~~~~~~~~~~~~ Superfluous or dotted arguments are strictly rejected as an error if `SCM_STRICT_ARGCHECK` is enabled. Otherwise ignored. Resource-sensitive apprication could disable it. .SCM_STRICT_ARGCHECK enabled ================================================================ sscm> (car '(1 2) 3 4) Error: in (function call): superfluous argument(s): (3 4) sscm> (symbol? 'foo . #t) Error: in (function call): improper argument list terminator: #t sscm> (+ 3 4 . 5) Error: in (reduction): improper argument list terminator: 5 ================================================================ .SCM_STRICT_ARGCHECK disabled ================================================================ sscm> (car '(1 2) 3 4) 1 sscm> (symbol? 'foo . #t) #t sscm> (+ 3 4 . 5) 7 ================================================================ Promises ~~~~~~~~ SigScheme only supports explicit forcing. And passing non-promise objects to force is an error. ---------------------------------------------------------------- (+ (delay (* 3 7)) 13) ==> error ---------------------------------------------------------------- Syntaxes/procedures not implemented ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Following R5RS syntaxes and procedures are not implemented (yet). Numbers ^^^^^^^ - *procedure:* complex? obj - *procedure:* real? obj - *procedure:* rational? obj - *procedure:* exact? z - *procedure:* inexact? z - *library procedure:* gcd n1 ... - *library procedure:* lcm n1 ... - *procedure:* numerator q - *procedure:* denominator q - *procedure:* floor x - *procedure:* ceiling x - *procedure:* truncate x - *procedure:* round x - *library procedure:* rationalize x y - *procedure:* exp z - *procedure:* log z - *procedure:* sin z - *procedure:* cos z - *procedure:* tan z - *procedure:* asin z - *procedure:* acos z - *procedure:* atan z - *procedure:* atan y x - *procedure:* sqrt z - *procedure:* expt z1 z2 - *procedure:* make-rectangular x1 x2 - *procedure:* make-polar x3 x4 - *procedure:* real-part z - *procedure:* imag-part z - *procedure:* magnitude z - *procedure:* angle z - *procedure:* exact->inexact z - *procedure:* inexact->exact z System interface ^^^^^^^^^^^^^^^^ - *optional procedure:* transcript-on filename - *optional procedure:* transcript-off SRFI conformance ---------------- SRFI-0 Feature-based conditional expansion construct ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Supported. But since the macro expansion is performed at evaluation-time and the expanded form is not stored as the code, the form may be expanded differently at next time. SRFI-1 List Library ~~~~~~~~~~~~~~~~~~~~ Fully supported. It is based on the reference implementation of SRFI-1. Some procedures are replaced with efficient C implementation. And bugs in `delete-duplicates!`, `lset-xor`, `lset-xor!` and `list=` of the reference implementation are fixed. SRFI-2 AND-LET* ~~~~~~~~~~~~~~~ Fully supported. SRFI-6 Basic String Ports ~~~~~~~~~~~~~~~~~~~~~~~~~ Fully supported. SRFI-8 receive ~~~~~~~~~~~~~~ Fully supported. SRFI-9 Defining Record Types ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fully supported. It is based on the reference implementation of SRFI-9. But different to the original implementation, `eval` procedure of the SigScheme port accepts `(interaction-environment)` as environment argument. SRFI-22 Running Scheme Scripts on Unix ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SigScheme only supports the prelude line interpretation. All options written in the line are applied as same as commandline invocation of sscm. But the `main` procedure invocation is not supported (yet). .Prelude line is interpreted as follows ================================================================ #! /usr/bin/env sscm -C UTF-8 ... ==> Character encoding for the file is changed to UTF-8 temporarily. ================================================================ SRFI-23 Error Reporting Mechanism ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fully supported. If srfi-34 is provided, the `error` procedure throws a SigScheme-specific error object in cooperate with "SRFI-34 Exception Handling for Programs". Otherwise it simply calls scm_fatal_error(). Since the error objects are represented as a list, be careful on catching an exception based on its type. If you want to distinguish the error objects from ordinary lists, use SigScheme-specific `%%error-object?` predicate. .Error objects are also caught as a list ================================================================ sscm> (guard (obj ((pair? obj) obj)) (error "reason" 1 2 3)) #<error "reason" 1 2 3> ================================================================ .Error object internal ================================================================ sscm> (define err (guard (err (#t err)) (error "reason" 1 2 3))) err sscm> err #<error "reason" 1 2 3> sscm> (pair? err) #t sscm> (car err) (#<undef> . #<undef>) sscm> (%%error-object? err) #t ================================================================ SRFI-28 Basic Format Strings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fully supported. A directive-less tilde at end of a format string causes an error as same as the reference implementation of SRFI-28. .SigScheme ================================================================ (format "~") ==> error (format "a~") ==> error ================================================================ SRFI-34 Exception Handling for Programs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fully supported. SRFI-38 External Representation for Data with Shared Structure ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Only `write-with-shared-structure` is implemented and `read-with-shared-structure` is not. The optional alias `write/ss` described in SRFI-38 is also defined. The optional `optarg` argument is simply ignored. The shared index starts with #1 (not #0). .Shared index starts with #1 ================================================================ sscm> (define lst (list 'a 'b)) lst sscm> (set-cdr! lst lst) #1=(a . #1#) sscm> lst #1=(a . #1#) ================================================================ SRFI-43 Vector library ~~~~~~~~~~~~~~~~~~~~~~ Fully supported. It is based on the reference implementation of SRFI-43. SRFI-48 Intermediate Format Strings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fully supported. The 'd' part of '~w,dF' directive is acceptable, but completely ignored on output format. Since SigScheme only supports integer currently, number is always formatted as integer even if the 'd' part is specified. .Proper behavior ================================================================ (format "~3F" 3) ==> " 3" (format "~3,2F" 3) ==> "3.00" ================================================================ .SigScheme ================================================================ (format "~3F" 3) ==> " 3" (format "~3,2F" 3) ==> " 3" ================================================================ Although the reference implementation of SRFI-48 allows directive-less tilde at end of a format string, SigScheme rejects it as an error since it decreases user-code portability, and is confusable due to that the behavior is different to the reference implementation of SRFI-28. .Reference implementation of SRFI-48 ================================================================ (format "~") ==> "~" (format "a~") ==> "a~" ================================================================ .SigScheme ================================================================ (format "~") ==> error (format "a~") ==> error ================================================================ SRFI-55 require-extension ~~~~~~~~~~~~~~~~~~~~~~~~~ Fully supported. SRFI-60 Integer as Bits ~~~~~~~~~~~~~~~~~~~~~~~ Only following procedures are implemented. - Bitwise Operations * *procedure:* logand n1 ... * *procedure:* bitwise-and n1 ... * *procedure:* logior n1 ... * *procedure:* bitwise-ior n1 ... * *procedure:* logxor n1 ... * *procedure:* bitwise-xor n1 ... * *procedure:* lognot n * *procedure:* bitwise-not n * *procedure:* bitwise-if mask n0 n1 * *procedure:* bitwise-merge mask n0 n1 * *procedure:* logtest j k * *procedure:* any-bits-set? j k And the others listed below are not. - Integer Properties * *procedure:* logcount n * *procedure:* bit-count n * *procedure:* integer-length n * *procedure:* log2-binary-factors n * *procedure:* first-set-bit n - Bit Within Word * *procedure:* logbit? index n * *procedure:* bit-set? index n * *procedure:* copy-bit index from bit - Field of Bits * *procedure:* bit-field n start end * *procedure:* copy-bit-field to from start end * *procedure:* ash n count * *procedure:* arithmetic-shift n count * *procedure:* rotate-bit-field n count start end * *procedure:* reverse-bit-field n start end - Bits as Booleans * *procedure:* integer->list k len * *procedure:* integer->list k * *procedure:* list->integer list * *procedure:* booleans->integer bool1 ... SRFI-69 Basic hash tables ~~~~~~~~~~~~~~~~~~~~~~~~~ Fully supported. It is just the reference implementation of SRFI-69. The hash functions are not yet optimized for SigScheme. SRFI-95 Sorting and Merging ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fully supported. It is just the reference implementation of SRFI-95 (sort.scm of SLIB). R6RS conformance ---------------- R6RS characters ~~~~~~~~~~~~~~~ R6RS characters are partially implemented based on R5.92RS. But since R6RS specification is not finalized yet, future SigScheme may change the specification around R6RS characters. Current R6RS characters status ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Supports Unicode character literals such as #\λ - Supports #x<digit16>+ literals - Supports \x<digit16>+; literals in string - Supports Unicode identifiers (but lacks character category validation) - Supports all named chars such as #\backspace, #\esc, and #\nul - Quoted-symbol by vertical bar (such as '|-symbol|) is not supported yet - Invalidly accepts non-all-lower-case character names such as #\Tab and #TAB TODOs ^^^^^ - Support new escapes in string (\<linefeed> and \<space>) - Confirm symbol escape syntax (not defined in R6RS yet) - Support Unicode character category validation for identifiers - Disable #\newline on R6RS-compatible mode SigScheme extensions -------------------- If `--enable-sscm-extensions` is specified for configure script explicitly or implicitly, these features can be used. Legacy macro ~~~~~~~~~~~~ Legacy and common `define-macro` is provided to define syntactic closures. But SigScheme's implementation is having some limitations. - The closure can only be defined with top-level environment due to an internal implementation trick - Macro forms are kept untransformed in program, and expanded immediately before each evaluation, as if they are ordinary syntaxes - `define-macro` destructively modifies closure object To avoid problems related to these limitations, side-effect in macros are strongly discouraged. See following examples for each limitation. .The closure can only be defined with top-level environment ---------------------------------------------------------------- (define-macro m (let ((?var 3)) (lambda () ?var))) ==> error ---------------------------------------------------------------- .Macro forms are kept untransformed in program ---------------------------------------------------------------- (define cnt 0) (define-macro m (lambda () (set! cnt (+ cnt 1)) cnt)) (define proc-m (lambda () (m))) (proc-m) ==> 1 (proc-m) ==> 2 ;; ordinary Scheme implementation returns 1 (proc-m) ==> 3 ;; ordinary Scheme implementation returns 1 ---------------------------------------------------------------- .`define-macro` destructively modifies closure object ---------------------------------------------------------------- (define f (lambda (x) (* x x))) (f (+ 1 2)) ==> 9 (procedure? f) ==> #t (define-macro mf f) (mf (+ 1 2)) ==> error (mf 3) ==> 9 (f (+ 1 2)) ==> error (procedure? f) ==> error ---------------------------------------------------------------- See `test-legacy-macro.scm` to know more detailed specification. let-optionals* ~~~~~~~~~~~~~~ `let-optionals*` is provided for optional argument processing. The specification is exactly same as Gauche 0.8.8. See `test-sscm-ext.scm` for further information. SIOD compatibility ------------------ SIOD specific features ~~~~~~~~~~~~~~~~~~~~~~ If `--enable-compat-siod` is specified for configure script explicitly or implicitly, some SIOD compatible features are provided. - symbol-value - set-symbol-value! - verbose - undefine - eof-val - %%closure-code - bit-and - bit-or - bit-xor - bit-not SIOD specific behaviors ~~~~~~~~~~~~~~~~~~~~~~~ If `--enable-compat-siod-bugs` is specified for configure script explicitly or implicitly, some procedures and syntaxes emulate SIOD's behavior. We call them 'bugs' for convention although they are not actually bugs on non-R5RS-compliant SIOD implementation. - `#f` is identical to null list - `(car \'())` evaluates to `()` - `(cdr \'())` evaluates to `()` - `(if #f #t)` evaluates to `#f` - `let` and `let*` accepts value-less binding forms - `=` predicate can be applied to non-number objects .#f is identical to null list ---------------------------------------------------------------- (null? #f) ==> #t (null? '()) ==> #t (boolean? #f) ==> #t (boolean? '()) ==> #t (if '() 'true 'false) ==> false ---------------------------------------------------------------- .`let` and `let*` accepts value-less binding forms ---------------------------------------------------------------- (let ((var)) var) ==> #f (let* ((var)) var) ==> #f (letrec ((var)) var) ==> error ---------------------------------------------------------------- .`=` predicate can be applied to non-number objects ---------------------------------------------------------------- (require-extension (siod)) (= 3 (+ 1 2)) ==> #t (= 3 (+ 1 2) 3) ==> error ;; only accepts 2 args (define lst '(0 1 2)) (= lst lst) ==> #t (= lst (list 0 1 2)) ==> #f ;; '=' is defined as follows if --enable-compat-siod-bugs (define = (let ((%= =)) (lambda (x y) (or (eq? x y) (%= x y))))) ----------------------------------------------------------------