Sophie

Sophie

distrib > Fedora > 15 > i386 > by-pkgid > d07d7ab417d79053e7e0155c99e1a1c8 > files > 2596

mlton-20100608-3.fc15.i686.rpm

\section{The MLRISC Machine Description Language}

\subsection{ Overview }

\newdef{MDGen} is a machine description language 
is designed to automate
various mundane and error prone tasks in developing a back-end for 
MLRISC.  Currently, to target a new
architecture the programmer must provide the following set of modules
written in Standard ML:

\begin{itemize}
  \item \codehref{instructions/cells.sig}{CELLS} -- 
   the properties of the register set and (some part of) memory hierarchy. 
  \item \codehref{instructions/instructions.sig}{INSTRUCTIONS} -- 
   the concrete instruction set representation.
  \item \codehref{instructions/insnProps.sig}{INSNS_PROPERTIES}  --
   properties of the instructions.
  \item \codehref{instructions/shuffle.sig}{SHUFFLE} --
   methods to emit linearized code from parallel copies.
  \item \codehref{emit/instruction-emitter.sig}{ASSEMBLER} --
   the assembler
  \item \codehref{emit/instruction-emitter.sig}{MC} --
   the machine code emitter
  \item \codehref{../backpatch/sdi-jumps.sig}{ SDI_JUMPS } --
   methods for resolving span-dependent instructions. 
  \item <a href="../backpatch/delaySlotProps.sig" target=code> DELAY_SLOTS_PROPERTIES 
        </a> -- machine properties for delay slot filling, if a machine 
    architecture contains branch delay slots or load delay slots.
  \item \codehref{../SSA/ssaProps.sig}{ SSA_PROPERTIES } --
    semantics properties for performing optimizations in Static Single
  Assignment form.
\end{itemize}

In general, writing a backend is tedious even with  
SML's abstraction capabilities. 
Furthermore, the machine description is procedural in natural 
and must be checked by hand.  

\subsection{ What is in MDGen? }
The MDGen tool simplifies the process of developing a new MLRISC backend.  
MDGen provides the following:
\begin{itemize}
   \item A representation description language for specifying the
     machine encoding of the instruction set,
     using an extension of ML's algebraic datatype facility.
   \item A semantics description language for specifying the abstract semantics
      of the instructions.
\end{itemize}

Both sub-languages are based on ML's syntax and semantics, so
they should be readily familiar to all MLRISC users.

A backend developer can specify a new machine architecture using the MDGen 
language, and in turn, the MDGen tool generates ML modules that are
required by the MLRISC system.

The basic concepts of MDGen are inspired largely from 
Norman Ramsey's <a href="www.cs.virginia.edu/~nr/toolkit">
New Jersey Machine Code Tool Kit </a> and 
Ramsey and Davidson's
<a href="http://www.cs.virginia.edu/zephyr/csdl/lrtlindex.html">
Lambda RTL </a>

\subsection{A Sample Description}

Here we present a sample MDGen description, using the Alpha as an example.
We highlight all keywords in the MDGen language 
in.  A typical machine description
is structured as follows:

\begin{SML}
architecture Alpha =
   struct

   name "Alpha"

   superscalar

   little endian

   <font color=#FF0000>lowercase assembly</font>

   \href{#cells}{Storage cells and locations}
   \href{#encoding}{Instruction encoding formats specification}
   \href{#instruction}{Instruction definition}
<font color=#FF0000>end</font>
\end{SML}

Here, we declare that the Alpha is a superscalar machine using
little endian encoding.  Furthermore, assembly output should be displayed
in lowercase-- this is for personal esthetic reasons only; most assemblers
are case insensitive.



\subsubsection{ <a name="cells">Specifying Storage Cells and Locations </a>}

A <font color="#ff0000">cell</font> is an abstract resource location 
for holding data values.  On typical machines, the types of
cells include general purpose registers, floating point registers,
and condition code registers.

The \sml{storage} declaration defines different 
<font color="#ff0000">cellkinds</font>.  MLRISC requires the
cellkinds \sml{GP}, \sml{FP}, \sml{CC} to be defined.
These are the cellkinds for general purpose registers, floating point
registers and condition code registers.

In the following sequence of declarations, a few things are defined:
\begin{itemize}
  \item The cellkinds \sml{GP, FP, CC} are defined.
        Furthermore, the cellkinds \sml{MEM, CTRL}, which stand
        for memory and control (dependence), are also implicitly defined.
  \item The \sml{assembly as} clauses specify how a specific cell type is
       to be displayed.    Here, we specify that register 30, the
       stack pointer, should be displayed specially as \sml{$sp}.
  \item The \sml{in cellset} clause, when attached, tells MDGen that
       the associated cellkind should be part of the 
       \href{cellset.html}{ cellset }.  The clause \sml{in cellset GP}
       tells MDGen that the a cell of type \sml{CC} should be treated
       the same as a \sml{GP}
  \item The \sml{locations} declarations define a few abbreviations:
        \sml{stackptrR} is the stack pointer, \sml{asmTmpR} is
       the assembly temporary, \sml{fasmTmp} is the floating point
       assembly temporary etc.
\end{itemize}

<tt>
\begin{SML}
   <font color=#FF0000>storage</font>
     GP = 32 <font color=#FF0000>cells <font color=#FF0000>of</font> 64 bits in cellset called</font> "register" 
       	<font color=#FF0000>assembly as</font> (fn 30 => "$sp"
                      | r => "$"^Int.toString r)
   | FP = 32 <font color=#FF0000>cells <font color=#FF0000>of</font> 64 bits in cellset called</font> "floating point register" 
       	<font color=#FF0000>assembly as</font> (fn f => "f"^Int.toString f)
   | CC = <font color=#FF0000>cells <font color=#FF0000>of</font> 64 bits in cellset GP called</font> "condition code register"
                <font color=#FF0000>assembly as</font> "cc"
   <font color=#FF0000>locations</font>
       stackptrR = <font color=#008800>$</font>GP[30]
   <font color=#FF0000>and</font> asmTmpR   = <font color=#008800>$</font>GP[28]
   <font color=#FF0000>and</font> fasmTmp   = <font color=#008800>$</font>FP[30]
   <font color=#FF0000>and</font> GPReg r   = <font color=#008800>$</font>GP[r]
   <font color=#FF0000>and</font> FPReg f   = <font color=#008800>$</font>GP[f]
\end{SML}

<h3> <a name="instruction">
     Specifying the Representation of Instructions</a></h3> 
\begin{SML}
   <font color=#FF0000>structure</font> Instruction = 
   <font color=#FF0000>struct</font>
   <font color=#FF0000>datatype</font> ea = 
       Direct <font color=#FF0000>of</font> <font color=#008800>$</font>GP 
     | FDirect <font color=#FF0000>of</font> <font color=#008800>$</font>FP        
     | Displace <font color=#FF0000>of</font> {base: <font color=#008800>$</font>GP, disp:int}
 
   <font color=#FF0000>datatype</font> operand = 
       REGop <font color=#FF0000>of</font> <font color=#008800>$</font>GP       		``<GP>'' (GP)
     | IMMop <font color=#FF0000>of</font> int       		``<int>''
     | HILABop <font color=#FF0000>of</font> LabelExp.labexp       ``hi(<emit_labexp labexp>)''
     | LOLABop <font color=#FF0000>of</font> LabelExp.labexp       ``lo(<emit_labexp labexp>)''
     | LABop <font color=#FF0000>of</font> LabelExp.labexp       	``<emit_labexp labexp>''
     | CONSTop <font color=#FF0000>of</font> Constant.const       ``<emit_const const>''

   (* 
    * When I say ! after the datatype</font> name XXX, it means generate a
    * function emit_XXX that converts the constructors into the corresponding
    * assembly text.  By default, it uses the same name as the constructor,
    * but may be modified by the lowercase/uppercase assembly directive.
    * 
    *)
   <font color=#FF0000>datatype</font> branch! = 
      BR  0x30  
                | BSR 0x34  
                           | BLBC 0x3
    | BEQ  0x39 | BLT 0x3a | BLE  0x3b
    | BLBS 0x3c | BNE 0x3d | BGE  0x3e 
    | BGT  0x3f

   <font color=#FF0000>datatype</font> fbranch! =
                  FBEQ 0x31 | FBLT 0x32
    | FBLE 0x33             | FBNE 0x35
    | FBGE 0x36 | FBGT 0x37 
 
   <font color=#FF0000>datatype</font> load! = LDL 0x28 | LDL_L 0x2A | LDQ 0x29 | LDQ_L 0x2B | LDQ_U 0x0B
   <font color=#FF0000>datatype</font> store! = STL 0x2C | STQ 0x2D | STQ_U 0x0F
   <font color=#FF0000>datatype</font> fload[0x20..0x23]! = LDF | LDG | LDS | LDT 
   <font color=#FF0000>datatype</font> fstore[0x24..0x27]! = STF | STG | STS | STT 

   (* non-trapping opcodes *) 
   <font color=#FF0000>datatype</font> operate! = (* table C-5 *)
       ADDL  (0wx10,0wx00)                       | ADDQ (0wx10,0wx20) 
                           | CMPBGE(0wx10,0wx0f) | CMPEQ (0wx10,0wx2d) 
     | CMPLE (0wx10,0wx6d) | CMPLT (0wx10,0wx4d) | CMPULE (0wx10,0wx3d) 
     | CMPULT(0wx10,0wx1d) | SUBL  (0wx10,0wx09) 
     | SUBQ  (0wx10,0wx29) 
     | S4ADDL(0wx10,0wx02) | S4ADDQ (0wx10,0wx22) | S4SUBL (0wx10,0wx0b)
     | S4SUBQ(0wx10,0wx2b) | S8ADDL (0wx10,0wx12) | S8ADDQ (0wx10,0wx32)
     | S8SUBL(0wx10,0wx1b) | S8SUBQ (0wx10,0wx3b) 

     | AND   (0wx11,0wx00) | BIC    (0wx11,0wx08) | BIS    (0wx11,0wx20)
     | CMOVEQ(0wx11,0wx24) | CMOVLBC(0wx11,0wx16) | CMOVLBS(0wx11,0wx14)
     | CMOVGE(0wx11,0wx46) | CMOVGT (0wx11,0wx66) | CMOVLE (0wx11,0wx64)
     | CMOVLT(0wx11,0wx44) | CMOVNE (0wx11,0wx26) | EQV (0wx11,0wx48)
     | ORNOT (0wx11,0wx28) | XOR    (0wx11,0wx40)

     | EXTBL (0wx12,0wx06) | EXTLH  (0wx12,0wx6a) | EXTLL(0wx12,0wx26)
     | EXTQH (0wx12,0wx7a) | EXTQL  (0wx12,0wx36) | EXTWH(0wx12,0wx5a)
     | EXTWL (0wx12,0wx16) | INSBL  (0wx12,0wx0b) | INSLH(0wx12,0wx67)
     | INSLL (0wx12,0wx2b) | INSQH  (0wx12,0wx77) | INSQL(0wx12,0wx3b)
     | INSWH (0wx12,0wx57) | INSWL  (0wx12,0wx1b) | MSKBL(0wx12,0wx02)
     | MSKLH (0wx12,0wx62) | MSKLL  (0wx12,0wx22) | MSKQH(0wx12,0wx72)
     | MSKQL (0wx12,0wx32) | MSKWH  (0wx12,0wx52) | MSKWL(0wx12,0wx12)
     | SLL   (0wx12,0wx39) | SRA    (0wx12,0wx3c) | SRL  (0wx12,0wx34)
     | ZAP   (0wx12,0wx30) | ZAPNOT (0wx12,0wx31)
     | MULL  (0wx13,0wx00)                        | MULQ (0wx13,0wx20)
                           | UMULH  (0wx13,0wx30) 
     | SGNXL "addl" (0wx10,0wx00) (* same as ADDL *)

   (* conditional moves *) 
 
   <font color=#FF0000>datatype</font> pseudo_op! = DIVL | DIVLU
 
   <font color=#FF0000>datatype</font> operateV! = (* table C-5 opc/func *)
        ADDLV (0wx10,0wx40) | ADDQV (0wx10,0wx60)
      | SUBLV (0wx10,0wx49) | SUBQV (0wx10,0wx69) 
      | MULLV (0wx13,0wx00) | MULQV (0wx13,0wx60)
 
   <font color=#FF0000>datatype</font> foperate! =   (* table C-6 *)
      CPYS    (0wx17,0wx20)  | CPYSE (0wx17,0wx022)    | CPYSN   (0wx17,0wx021)
    | CVTLQ   (0wx17,0wx010) | CVTQL (0wx17,0wx030)    | CVTQLSV (0wx17,0wx530)
    | CVTQLV  (0wx17,0wx130)
    | FCMOVEQ (0wx17,0wx02a) | FCMOVEGE (0wx17,0wx02d) | FCMOVEGT (0wx17,0wx02f)
    | FCMOVLE (0wx17,0wx02e) | FCMOVELT (0wx17,0wx02c) | FCMOVENE (0wx17,0wx02b)
    | MF_FPCR (0wx17,0wx025) | MT_FPCR  (0wx17,0wx024)

                         (* table C-7 *)
    | CMPTEQ  (0wx16,0wx0a5) | CMPTLT (0wx16,0wx0a6)   | CMPTLE  (0wx16,0wx0a7)
    | CMPTUN  (0wx16,0wx0a4)

   <font color=#FF0000>datatype</font> foperateV! = 
          ADDSSUD  0wx5c0
        | ADDTSUD  0wx5e0
        | CVTQSC   0wx3c
        | CVTQTC   0wx3e
        | CVTTSC   0wx2c
        | CVTTQC   0wx2f
        | DIVSSUD  0wx5ec
        | DIVTSUD  0wx5c3
        | MULSSUD  0wx5c2
        | MULTSUD  0wx5e2
        | SUBSSUD  0wx5c1
        | SUBTSUD  0wx5e1
 
   <font color=#FF0000>datatype</font> osf_user_palcode! = 
      BPT 0x80 | BUGCHK 0x81 | CALLSYS 0x83 
    | GENTRAP 0xaa | IMB 0x86 | RDUNIQUE 0x9e | WRUNIQUE 0x9f

   end (* Instruction *)
\end{SML}

<h3> <a name="encoding">
     Specifying the Instruction Encoding Formats </a></h3>

    The Alpha has very simple instruction encoding formats.

<tt>
\begin{SML}
   <font color=#FF0000>instruction formats 32 bits</font>
     Memory{opc:6, ra:5, rb:GP 5, disp: signed 16} (* p3-9 *)
      (* derived from Memory *) 
   | LoadStore{opc,ra,rb,disp} =
       <font color=#FF00000>let val</font> disp = 
           <font color=#FF00000>case</font> disp <font color=#FF0000>of</font>
             I.REGop rb => emit_GP rb
           | I.IMMop i  => itow i
           | I.HILABop le => itow(LabelExp.valueOf le)
           | I.LOLABop le => itow(LabelExp.valueOf le)
           | I.LABop le => itow(LabelExp.valueOf le)
           | I.CONSTop c => itow(Constant.valueOf c)
       in  Memory{opc,ra,rb,disp}
       end
   | ILoadStore{opc,r:GP,b,d} = LoadStore{opc,ra=r,rb=b,disp=d}
   | FLoadStore{opc,r:FP,b,d} = LoadStore{opc,ra=r,rb=b,disp=d}

   | Jump{opc:6,ra:GP 5,rb:GP 5,h:2,disp:int signed 14}   (* table C-3 *)
   | Memory_fun{opc:6, ra:GP 5, rb:GP 5, func:16}     (* p3-9 *)
   | Branch{opc:branch 6, ra:GP 5, disp:signed 21}           (* p3-10 *)
   | Fbranch{opc:fbranch 6, ra:FP 5, disp:signed 21}          (* p3-10 *)
        (* p3-11 *)
   | Operate0{opc:6,ra:GP 5,rb:GP 5,sbz:13..15,_:1=0,func:5..11,rc:GP 5} 
        (* p3-11 *)
   | Operate1{opc:6,ra:GP 5,lit:signed 13..20,_:1=1,func:5..11,rc:GP 5} 
   | Operate{opc,ra,rb,func,rc} =
        (<font color=#FF00000>case</font> rb <font color=#FF0000>of</font>
          I.REGop rb => Operate0{opc,ra,rb,func,rc,sbz=0w0}
        | I.IMMop i  => Operate1{opc,ra,lit=itow i,func,rc}
        | I.HILABop le => Operate1{opc,ra,lit=itow(LabelExp.valueOf le),func,rc}
        | I.LOLABop le => Operate1{opc,ra,lit=itow(LabelExp.valueOf le),func,rc}
        | I.LABop le => Operate1{opc,ra,lit=itow(LabelExp.valueOf le),func,rc}
        | I.CONSTop c => Operate1{opc,ra,lit=itow(Constant.valueOf c),func,rc}
        )
   | Foperate{opc:6,fa:FP 5,fb:FP 5,func:5..15,fc:FP 5}
   | Pal{opc:6=0,func:26}
\end{SML}
</tt>

\subsubsection{ Specifying the instruction set }
<tt>
\begin{SML}
   <font color=#FF0000>structure</font> MC =
   <font color=#FF0000>struct</font>
      (* compute displacement address *)
      <font color=#FF0000>fun</font> disp lab = itow(Label.addrOf lab - !loc - 4) ~>> 0w2
   <font color=#FF0000>end</font>

   (*
    * The main instruction set definition consists <font color=#FF0000>of</font> the following:
    *  1) constructor-like declaration defines the view <font color=#FF0000>of</font> the instruction,
    *  2) assembly directive in funny quotes `` '',
    *  3) machine encoding expression,
    *  4) semantics expression in [[ ]],
    *  5) delay slot directives etc (not necessary in this architecture!)
    *) 
   <font color=#FF0000>instruction</font>
     DEFFREG <font color=#FF0000>of</font> <font color=#008800>$</font>FP       (* define a floating point register *)
       ``deffreg <FP>''
        (* Pseudo instruction for the register allocator *)
 
   (* Load/Store *)
   | LDA <font color=#FF0000>of</font> {r: <font color=#008800>$</font>GP, b: <font color=#008800>$</font>GP, d:operand}       (* use of REGop is illegal *)
     ``lda\t<r>, <d>()''
     ILoadStore{opc=0w08,r,b,d}

   | LDAH <font color=#FF0000>of</font> {r: <font color=#008800>$</font>GP, b: <font color=#008800>$</font>GP, d:operand} (* use of REGop is illegal *)
     ``ldah\t<r>, <d>()''
     ILoadStore{opc=0w09,r,b,d}

   | LOAD <font color=#FF0000>of</font> {ldOp:load, r: <font color=#008800>$</font>GP, b: <font color=#008800>$</font>GP, d:operand, mem:Region.region}
     ``<ldOp>\t<r>, <d>()''
     ILoadStore{opc=emit_load ldOp,r,b,d}

   | STORE <font color=#FF0000>of</font> {stOp:store, r: <font color=#008800>$</font>GP, b: <font color=#008800>$</font>GP, d:operand, mem:Region.region}
     ``<stOp>\t<r>, <d>()''
     ILoadStore{opc=emit_store stOp,r,b,d}

   | FLOAD <font color=#FF0000>of</font> {ldOp:fload, r: <font color=#008800>$</font>FP, b: <font color=#008800>$</font>GP, d:operand, mem:Region.region}
     ``<ldOp>\t<r>, <d>()''
     FLoadStore{opc=emit_fload ldOp,r,b,d}

   | FSTORE <font color=#FF0000>of</font> {stOp:fstore, r: <font color=#008800>$</font>FP, b: <font color=#008800>$</font>GP, d:operand, mem:Region.region}
     ``<stOp>\t<r>, <d>()''
     FLoadStore{opc=emit_fstore stOp,r,b,d}
 
   (* Control Instructions *)
   | JMPL <font color=#FF0000>of</font> {r: <font color=#008800>$</font>GP, b: <font color=#008800>$</font>GP, d:int} * Label.label list
     ``jmpl\t<r>, <d>()''
     Jump{opc=0wx1a,h=0w0,ra=r,rb=b,disp=d}   (* table C-3 *)

   | JSR <font color=#FF0000>of</font> {r: <font color=#008800>$</font>GP, b: <font color=#008800>$</font>GP, d:int} * C.cellset * C.cellset
     ``jsr\t<r>, <d>()''
     Jump{opc=0wx1a,h=0w1,ra=r,rb=b,disp=d}

   | RET <font color=#FF0000>of</font> {r: <font color=#008800>$</font>GP, b: <font color=#008800>$</font>GP, d:int} 
     ``ret\t<r>, <d>()''
     Jump{opc=0wx1a,h=0w2,ra=r,rb=b,disp=d}

   | BRANCH <font color=#FF0000>of</font> branch * <font color=#008800>$</font>GP * Label.label   
     ``<branch> <GP>, <label>''
     Branch{opc=branch,ra=GP,disp=disp label}

   | FBRANCH <font color=#FF0000>of</font> fbranch * <font color=#008800>$</font>FP * Label.label  
     ``<fbranch> <FP>, <label>''
     Fbranch{opc=fbranch,ra=FP,disp=disp label}
 
   (* Integer Operate *)
   | OPERATE <font color=#FF0000>of</font> {oper:operate, ra: <font color=#008800>$</font>GP, rb:operand, rc: <font color=#008800>$</font>GP}
       ``<oper>\t<ra>, <rb>, <rc>''
        (let val (opc,func) = emit_operate oper
         in  Operate{opc,func,ra,rb,rc} 
         end)

   | OPERATEV <font color=#FF0000>of</font> {oper:operateV, ra: <font color=#008800>$</font>GP, rb:operand, rc: <font color=#008800>$</font>GP}
       ``<oper>\t<ra>, <rb>, <rc>''
        (let val (opc,func) = emit_operateV oper
         in  Operate{opc,func,ra,rb,rc} 
         end)

   | PSEUDOARITH <font color=#FF0000>of</font> {oper: pseudo_op, ra: <font color=#008800>$</font>GP, rb:operand, rc: <font color=#008800>$</font>GP, 
       	     tmps: C.cellset}
       ``<oper>\t<ra>, <rb>, <rc>''
 
   (* Copy instructions *)
   | COPY <font color=#FF0000>of</font> {dst: <font color=#008800>$</font>GP list, src: <font color=#008800>$</font>GP list, 
              impl:instruction list option ref, tmp: ea option}
       ``<app emitInstr (Shuffle.shuffle{regmap,tmp,dst,src})>''
   | FCOPY <font color=#FF0000>of</font> {dst: <font color=#008800>$</font>FP list, src: <font color=#008800>$</font>FP list, 
               impl:instruction list option ref, tmp: ea option}
       ``<app emitInstr (Shuffle.shufflefp{regmap,tmp,dst,src})>''
 
   (* Floating Point Operate *)
   | FOPERATE <font color=#FF0000>of</font> {oper:foperate, fa: <font color=#008800>$</font>FP, fb: <font color=#008800>$</font>FP, fc: <font color=#008800>$</font>FP}
       ``<oper>\t<fa>, <fb>, <fc>''
       (let val (opc,func) = emit_foperate oper
        in  Foperate{opc,func,fa,fb,fc}
        end)

   (* Trapping versions <font color=#FF0000>of</font> the above *)
   | FOPERATEV <font color=#FF0000>of</font> {oper:foperateV, fa: <font color=#008800>$</font>FP, fb: <font color=#008800>$</font>FP, fc: <font color=#008800>$</font>FP}
       ``<oper>\t<fa>, <fb>, <fc>''
        Foperate{opc=0wx16,func=emit_foperateV oper,fa,fb,fc}
 
   (* Misc *)
   | TRAPB       			(* Trap barrier *)
       ``trapb''
        Memory_fun{opc=0wx18,ra=31,rb=31,func=0wx0}
 
   | CALL_PAL <font color=#FF0000>of</font> {code:osf_user_palcode, def: <font color=#008800>$</font>GP list, use: <font color=#008800>$</font>GP list}
       ``call_pal <code>''
        Pal{func=emit_osf_user_palcode code}
 end
\end{SML}
</tt>


\subsection{ 4 Machine Descriptions }
Here are some machine descriptions in varing degree of completion.

\begin{itemize}
 \item \codehref{../sparc/sparc.mdl}{ Sparc } 
 \item \codehref{../hppa/hppa.mdl}{ Hppa } 
 \item \codehref{../alpha/alpha.mdl}{ Alpha }
 \item \codehref{../ppc/ppc.mdl}{ PowerPC } 
 \item \codehref{../x86/x86.mdl}{ X86 } 
\end{itemize}

\subsection{ Syntax Highlighting Macros }

\begin{itemize}
  \item \href{md.vim}{ For vim 5.3 }
\end{itemize}

</body>
</html>