Published on 11 September 2023.
This is a work in progress that will change. Like to see it finished? Let me know by sending me an email.
I’ve been on a quest to learn “real agile”. Before I researched CI. Now its time for user stories.
https://ronjeffries.com/articles/-y023/python/-w030/038/
What people say a story is:
…
“Do you put backlog refinement on the backlog?”
XP: “Write stories about improving your software development process. What is your budget for improvement?”
…
What value? To whom?
…
Ron: “What kinds of value does your project deliver to its users? To your organization? To the team? What value does it deliver to you?”
Information has value. (Spike solution.)
…
Acceptance test: when is a story done?
…
Adding features/capabilities to product needs story? Refacotring does not add any ability? Testing neither. Both increase quality, but not capability.
…
Example: SSL certs
Example: Testing
Example: Moving hosting provider
…
AI chronicels: stories show progress towards a larger goal.
Published on 2020-05-20.
This is a work in progress that will change. Like to see it finished? Let me know by sending me an email.In Parsing left associative operators using RLMeta we looked at how to parse a subset of mathematical expressions. In this article we will take it further and compile such expressions to x86 machine code. Thus creating a real compiler for expressions.
The compiler is implemented by transforming an expression in different stages.
Parser -> Abstract codegen -> X86 codegen -> GAS -> Assembler
The parser turns an expression into an AST, which the abstract code generator turns it into instructions for an imaginary stack machine, which the X86 code generator turns into assembly instructions for the x86 architecture. The final stage either turns those instructions into textual assembly instructions suitable for compilation with the GNU assembler (GAS) or assembles them straight into machine code.
The style of this article is inspired by Chains of meaning in the STEPS system (mirror) where a compiler is explained by explaining all stages in it. Examples are shown throughout what the input and output of the different stages are.
The parser is the same as in the previous article. It turns an expression into an AST:
Parser { expr = digit:x (op:y digit:z -> [y z])*:xs -> parseOps(x xs) digit = '0'-'9':x -> int(x) op = | '+' -> Op(makeAstNode("ADD") 1 "left") | '-' -> Op(makeAstNode("SUB") 1 "left") | '*' -> Op(makeAstNode("MUL") 2 "left") | '/' -> Op(makeAstNode("DIV") 2 "left") | '^' -> Op(makeAstNode("POW") 3 "right") }
The support functions are also defined similarly:
class Op(object): def __init__(self, fn, prec, assoc): self.fn = fn self.prec = prec self.assoc = assoc
def makeAstNode(name): def op(left, right): return [name, left, right] return op
def parseOps(expr, items, min_level=0): while items and items[0][0].prec >= min_level: op, rhs = items.pop(0) if op.assoc == "left": next_level = op.prec + 1 else: next_level = op.prec expr = op.fn(expr, parseOps(rhs, items, next_level)) return expr
Parsing the simple expression
3
looks like this:
3 =V========== Parser ============ 3
Parsing the slightly more complicated expression
1+2*3
looks like this:
1+2*3 =V========== Parser ============ ['ADD', 1, ['MUL', 2, 3]]
The abstract code generator turns an AST into instructions for an imaginary stack machine:
AbstractCodeGen { expr = | [.:name expr:left expr:right] -> [~right ~left name] | .:leaf -> ["PUSH" leaf] }
The right hand side of an expression is generated before the left hand side. This order does not matter, but it turns out that it's a bit easier to generate x86 instructions if the right hand side comes first.
Generating stack machine instructions for the simple expression
3
looks like this:
3 =V========== Parser ============ 3 =V===== AbstractCodeGen ======== ['PUSH', 3]
Generating stack machine instructions for the slightly more complicated expression
1+2*3
looks like this:
1+2*3 =V========== Parser ============ ['ADD', 1, ['MUL', 2, 3]] =V===== AbstractCodeGen ======== ['PUSH', 3, 'PUSH', 2, 'MUL', 'PUSH', 1, 'ADD']
The X86 code generator turns instructions for the imaginary stack machine into assembly instructions for the x86 architecture.
An assembly instruction is represented as a list with its mnemonic and operands:
["mnemonic" op1 op2 ..]
For an instruction performing a binary operation, op1
is the destination and op2
is the source. The binary operation performed is therefore the following:
op1 = op1 `operation` op2
An operand can be either a constant, a register, or a memory location.
rdi
is the stack pointer where the stack grows downwards (64-bit integer)eax
is a TOS (top of stack) register (32-bit signed integer)TODO: explain what registers and so on that we use
X86CodeGen { expr = [%*:xs] -> [~~xs ["ret"]] <<instructions>> }
ADD = -> [ ["add" ["reg" "eax"] ["addr" "rdi"]] ["sub" ["reg" "rdi"] ["const" 4 ]] ]
SUB = -> [ ["sub" ["reg" "eax"] ["addr" "rdi"]] ["sub" ["reg" "rdi"] ["const" 4 ]] ]
MUL = -> [ ["imul" ["reg" "eax"] ["addr" "rdi"]] ["sub" ["reg" "rdi"] ["const" 4 ]] ]
DIV = -> [ ["cdq" ] ["idiv" "long" ["addr" "rdi"] ] ["sub" ["reg" "rdi"] ["const" 4]] ]
POW = -> [ ]
PUSH = .:x -> [ ["add" ["reg" "rdi"] ["const" 4 ]] ["mov" ["addr" "rdi"] ["reg" "eax"]] ["mov" ["reg" "eax"] ["const" x ]] ]
Generating assembly instructions for the simple expression
3
looks like this:
3 =V========== Parser ============ 3 =V===== AbstractCodeGen ======== ['PUSH', 3] =V======== X86CodeGen ========== [['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 3]], ['ret']]
Generating assembly instructions for the slightly more complicated expression
1+2*3
looks like this:
1+2*3 =V========== Parser ============ ['ADD', 1, ['MUL', 2, 3]] =V===== AbstractCodeGen ======== ['PUSH', 3, 'PUSH', 2, 'MUL', 'PUSH', 1, 'ADD'] =V======== X86CodeGen ========== [['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 3]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 2]], ['imul', ['reg', 'eax'], ['addr', 'rdi']], ['sub', ['reg', 'rdi'], ['const', 4]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 1]], ['add', ['reg', 'eax'], ['addr', 'rdi']], ['sub', ['reg', 'rdi'], ['const', 4]], ['ret']]
The GAS generator turns assembly instructions into textual assembly instructions suitable for compilation with the GNU assembler (GAS): gcc -nostdlib file.s
:
GasGen { expr = [instr*:xs] -> { ".global expr\n" "expr:\n" xs } instr = | [mnemonic:x op:target op:source] -> { x " " source "," target "\n" } | [mnemonic:x op:arg] -> { x " " arg "\n" } | [mnemonic:x] -> { x "\n" } op = | ["reg" .:name] -> { "%" name } | ["addr" .:name] -> { "(%" name ")" } | ["const" .:value] -> { "$" value } mnemonic = .:x size:y -> pad(x y) size = "long" -> "l" | -> "" }
def pad(text, suffix): return (text+suffix).ljust(7)
#include <stdlib.h> #include <stdio.h> int expr(void* mem); int main() { void* mem = (void*)malloc(32*sizeof(int)); int result = expr(mem); printf("%d\n", result); }
Generating GAS instructions (and compiling and running them) for the simple expression
3
looks like this:
3 =V========== Parser ============ 3 =V===== AbstractCodeGen ======== ['PUSH', 3] =V======== X86CodeGen ========== [['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 3]], ['ret']] =V========== GasGen ============ .global expr expr: add $4,%rdi mov %eax,(%rdi) mov $3,%eax ret =V===== GccCompileAndRun ======= 3
Generating GAS instructions (and compiling and running them) for the slightly more complicated expression
1+2*3
looks like this:
1+2*3 =V========== Parser ============ ['ADD', 1, ['MUL', 2, 3]] =V===== AbstractCodeGen ======== ['PUSH', 3, 'PUSH', 2, 'MUL', 'PUSH', 1, 'ADD'] =V======== X86CodeGen ========== [['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 3]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 2]], ['imul', ['reg', 'eax'], ['addr', 'rdi']], ['sub', ['reg', 'rdi'], ['const', 4]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 1]], ['add', ['reg', 'eax'], ['addr', 'rdi']], ['sub', ['reg', 'rdi'], ['const', 4]], ['ret']] =V========== GasGen ============ .global expr expr: add $4,%rdi mov %eax,(%rdi) mov $3,%eax add $4,%rdi mov %eax,(%rdi) mov $2,%eax imul (%rdi),%eax sub $4,%rdi add $4,%rdi mov %eax,(%rdi) mov $1,%eax add (%rdi),%eax sub $4,%rdi ret =V===== GccCompileAndRun ======= 7
Generating GAS instructions (and compiling and running them) for the expression that uses all operators
2^4/2*3-4+5
looks like this:
2^4/2*3-4+5 =V========== Parser ============ ['ADD', ['SUB', ['MUL', ['DIV', ['POW', 2, 4], 2], 3], 4], 5] =V===== AbstractCodeGen ======== ['PUSH', 5, 'PUSH', 4, 'PUSH', 3, 'PUSH', 2, 'PUSH', 4, 'PUSH', 2, 'POW', 'DIV', 'MUL', 'SUB', 'ADD'] =V======== X86CodeGen ========== [['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 5]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 4]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 3]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 2]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 4]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 2]], ['cdq'], ['idiv', 'long', ['addr', 'rdi']], ['sub', ['reg', 'rdi'], ['const', 4]], ['imul', ['reg', 'eax'], ['addr', 'rdi']], ['sub', ['reg', 'rdi'], ['const', 4]], ['sub', ['reg', 'eax'], ['addr', 'rdi']], ['sub', ['reg', 'rdi'], ['const', 4]], ['add', ['reg', 'eax'], ['addr', 'rdi']], ['sub', ['reg', 'rdi'], ['const', 4]], ['ret']] =V========== GasGen ============ .global expr expr: add $4,%rdi mov %eax,(%rdi) mov $5,%eax add $4,%rdi mov %eax,(%rdi) mov $4,%eax add $4,%rdi mov %eax,(%rdi) mov $3,%eax add $4,%rdi mov %eax,(%rdi) mov $2,%eax add $4,%rdi mov %eax,(%rdi) mov $4,%eax add $4,%rdi mov %eax,(%rdi) mov $2,%eax cdq idivl (%rdi) sub $4,%rdi imul (%rdi),%eax sub $4,%rdi sub (%rdi),%eax sub $4,%rdi add (%rdi),%eax sub $4,%rdi ret =V===== GccCompileAndRun ======= 1
The assembler turns assembly instructions into machine code:
Assembler { expr = [instr*:xs] -> [~~xs] instr = [%:x] -> x add = | reg64:r const:i -> [0x48 0x83 modRmDirect(r) ~littleEndian(i 1)] | reg32:r addr:m -> [] sub = | reg64:r const:i -> [] | reg32:r addr:m -> [] imul = | reg32:r addr:m -> [] cdq = -> [] idiv = | "long" addr:m -> [] mov = | addr:m reg32:r -> [0x89 modRmAddr(m r)] | reg32:r const:i -> [add(0xb8 r) ~littleEndian(i 4)] ret = -> [0xc3] addr = ["addr" reg64n:n] -> n reg64 = ["reg" reg64n:n] -> n reg64n = | "rdi" -> 7 reg32 = ["reg" reg32n:n] -> n reg32n = | "eax" -> 0 const = ["const" .:i] -> i }
def modRmDirect(register): return 0xc0 | register
def modRmAddr(addrRegister, desinationRegister): return 0x00 | (desinationRegister << 3) | addrRegister
def add(x, y): return x + y
def littleEndian(number, numBytes): if number < 0 or number >= 2**(8*numBytes): raise ValueError("{} is not in range [0, max]".format(number)) values = [] for i in range(numBytes): values.append(number & 0xff) number >>= 8 return values
import ctypes import mmap
libc = ctypes.cdll.LoadLibrary(None)
fn_mmap = libc.mmap fn_mmap.restype = ctypes.c_void_p fn_mmap.argtypes = ( ctypes.c_void_p, ctypes.c_size_t, ctypes.c_int, ctypes.c_int, ctypes.c_int, ctypes.c_size_t, )
code = "".join([chr(x) for x in machine_code]) code_address = fn_mmap( None, len(code), mmap.PROT_READ|mmap.PROT_WRITE|mmap.PROT_EXEC, mmap.MAP_PRIVATE|mmap.MAP_ANONYMOUS, -1, 0 ) ctypes.memmove(code_address, code, len(code))
fn_malloc = libc.malloc fn_malloc.restype = ctypes.c_void_p fn_malloc.argtypes = (ctypes.c_size_t,)
expr_fn_type = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.c_void_p) expr_fn = ctypes.cast(code_address, expr_fn_type)
result = expr_fn(fn_malloc(1024))
Generating machine code (and running it) for the simple expression
3
looks like this:
3 =V========== Parser ============ 3 =V===== AbstractCodeGen ======== ['PUSH', 3] =V======== X86CodeGen ========== [['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 3]], ['ret']] =V======== Assembler =========== [72, 131, 199, 4, 137, 7, 184, 3, 0, 0, 0, 195] =V======== X86Runner =========== 3
Generating machine code (and running it) for the slightly more complicated expression
1+2*3
looks like this:
1+2*3 =V========== Parser ============ ['ADD', 1, ['MUL', 2, 3]] =V===== AbstractCodeGen ======== ['PUSH', 3, 'PUSH', 2, 'MUL', 'PUSH', 1, 'ADD'] =V======== X86CodeGen ========== [['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 3]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 2]], ['imul', ['reg', 'eax'], ['addr', 'rdi']], ['sub', ['reg', 'rdi'], ['const', 4]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 1]], ['add', ['reg', 'eax'], ['addr', 'rdi']], ['sub', ['reg', 'rdi'], ['const', 4]], ['ret']] =V======== Assembler =========== [72, 131, 199, 4, 137, 7, 184, 3, 0, 0, 0, 72, 131, 199, 4, 137, 7, 184, 2, 0, 0, 0, 72, 131, 199, 4, 137, 7, 184, 1, 0, 0, 0, 195] =V======== X86Runner =========== 1
Generating machine code (and running it) for the expression that uses all operators
2^4/2*3-4+5
looks like this:
2^4/2*3-4+5 =V========== Parser ============ ['ADD', ['SUB', ['MUL', ['DIV', ['POW', 2, 4], 2], 3], 4], 5] =V===== AbstractCodeGen ======== ['PUSH', 5, 'PUSH', 4, 'PUSH', 3, 'PUSH', 2, 'PUSH', 4, 'PUSH', 2, 'POW', 'DIV', 'MUL', 'SUB', 'ADD'] =V======== X86CodeGen ========== [['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 5]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 4]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 3]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 2]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 4]], ['add', ['reg', 'rdi'], ['const', 4]], ['mov', ['addr', 'rdi'], ['reg', 'eax']], ['mov', ['reg', 'eax'], ['const', 2]], ['cdq'], ['idiv', 'long', ['addr', 'rdi']], ['sub', ['reg', 'rdi'], ['const', 4]], ['imul', ['reg', 'eax'], ['addr', 'rdi']], ['sub', ['reg', 'rdi'], ['const', 4]], ['sub', ['reg', 'eax'], ['addr', 'rdi']], ['sub', ['reg', 'rdi'], ['const', 4]], ['add', ['reg', 'eax'], ['addr', 'rdi']], ['sub', ['reg', 'rdi'], ['const', 4]], ['ret']] =V======== Assembler =========== [72, 131, 199, 4, 137, 7, 184, 5, 0, 0, 0, 72, 131, 199, 4, 137, 7, 184, 4, 0, 0, 0, 72, 131, 199, 4, 137, 7, 184, 3, 0, 0, 0, 72, 131, 199, 4, 137, 7, 184, 2, 0, 0, 0, 72, 131, 199, 4, 137, 7, 184, 4, 0, 0, 0, 72, 131, 199, 4, 137, 7, 184, 2, 0, 0, 0, 195] =V======== X86Runner =========== 2
Here is all source code needed to turn expressions into x86 machine code. Code to run machine code is dependent on situation, so it is not included here.
Parser:
Parser { expr = digit:x (op:y digit:z -> [y z])*:xs -> parseOps(x xs) digit = '0'-'9':x -> int(x) op = | '+' -> Op(makeAstNode("ADD") 1 "left") | '-' -> Op(makeAstNode("SUB") 1 "left") | '*' -> Op(makeAstNode("MUL") 2 "left") | '/' -> Op(makeAstNode("DIV") 2 "left") | '^' -> Op(makeAstNode("POW") 3 "right") }
Abstract code generator:
AbstractCodeGen { expr = | [.:name expr:left expr:right] -> [~right ~left name] | .:leaf -> ["PUSH" leaf] }
X86 code generator:
X86CodeGen { expr = [%*:xs] -> [~~xs ["ret"]] ADD = -> [ ["add" ["reg" "eax"] ["addr" "rdi"]] ["sub" ["reg" "rdi"] ["const" 4 ]] ] SUB = -> [ ["sub" ["reg" "eax"] ["addr" "rdi"]] ["sub" ["reg" "rdi"] ["const" 4 ]] ] MUL = -> [ ["imul" ["reg" "eax"] ["addr" "rdi"]] ["sub" ["reg" "rdi"] ["const" 4 ]] ] DIV = -> [ ["cdq" ] ["idiv" "long" ["addr" "rdi"] ] ["sub" ["reg" "rdi"] ["const" 4]] ] POW = -> [ ] PUSH = .:x -> [ ["add" ["reg" "rdi"] ["const" 4 ]] ["mov" ["addr" "rdi"] ["reg" "eax"]] ["mov" ["reg" "eax"] ["const" x ]] ] }
Assembler:
Assembler { expr = [instr*:xs] -> [~~xs] instr = [%:x] -> x add = | reg64:r const:i -> [0x48 0x83 modRmDirect(r) ~littleEndian(i 1)] | reg32:r addr:m -> [] sub = | reg64:r const:i -> [] | reg32:r addr:m -> [] imul = | reg32:r addr:m -> [] cdq = -> [] idiv = | "long" addr:m -> [] mov = | addr:m reg32:r -> [0x89 modRmAddr(m r)] | reg32:r const:i -> [add(0xb8 r) ~littleEndian(i 4)] ret = -> [0xc3] addr = ["addr" reg64n:n] -> n reg64 = ["reg" reg64n:n] -> n reg64n = | "rdi" -> 7 reg32 = ["reg" reg32n:n] -> n reg32n = | "eax" -> 0 const = ["const" .:i] -> i }
Support functions:
class Op(object): def __init__(self, fn, prec, assoc): self.fn = fn self.prec = prec self.assoc = assoc def makeAstNode(name): def op(left, right): return [name, left, right] return op def parseOps(expr, items, min_level=0): while items and items[0][0].prec >= min_level: op, rhs = items.pop(0) if op.assoc == "left": next_level = op.prec + 1 else: next_level = op.prec expr = op.fn(expr, parseOps(rhs, items, next_level)) return expr def modRmDirect(register): return 0xc0 | register def modRmAddr(addrRegister, desinationRegister): return 0x00 | (desinationRegister << 3) | addrRegister def add(x, y): return x + y def littleEndian(number, numBytes): if number < 0 or number >= 2**(8*numBytes): raise ValueError("{} is not in range [0, max]".format(number)) values = [] for i in range(numBytes): values.append(number & 0xff) number >>= 8 return values
In this article I use a version of RLMeta that builds upon the VM based version from memoizing failures, but has the following changes:
~
operators.%
operator.I will not explain how I made those changes here. The full source code is available on GitHub.
I used the following script to run the examples:
set -e compile() { echo "import sys" echo "import pprint" python rlmeta/rlmeta.py --support cat "support.py" python rlmeta/rlmeta.py < parser.rlmeta python rlmeta/rlmeta.py < abstractcodegen.rlmeta python rlmeta/rlmeta.py < x86codegen.rlmeta python rlmeta/rlmeta.py < gas.rlmeta python rlmeta/rlmeta.py < assembler.rlmeta echo "main()" } python <(compile) "$@"
def main(): grammars = { "parser": Parser(), "acodegen": AbstractCodeGen(), "xcodegen": X86CodeGen(), "gas": GasGen(), "assembler": Assembler(), "xrunner": X86Runner(), } try: expr = sys.stdin.read().strip() first = True for grammar_name in sys.argv[1:]: if grammar_name.startswith("@"): with open(grammar_name[1:], "w") as f: f.write(str(expr)) continue grammar = grammars[grammar_name] print_expr(expr) print_box(grammar.__class__.__name__) expr = grammar.run("expr", expr) print_expr(expr) except _MatchError as e: sys.stderr.write(e.describe())
def print_box(name): print("") print("=V{}==".format(" {} ".format(name).center(28, "="))) print("")
def print_expr(expr): if isinstance(expr, str): print(expr.strip()) else: pprint.pprint(expr, width=20)
set -e cat | bash compile.sh parser acodegen xcodegen gas @expr.s gcc driver.c expr.s -o expr echo "" echo "=V===== GccCompileAndRun =======" echo "" ./expr
set -e bash compile.sh parser acodegen xcodegen assembler xrunner
class X86Runner(object): def run(self, name, machine_code): <<run machine code>> return result
<<compiler>>
Published on 2020-04-10.
This is a work in progress that will change. Like to see it finished? Let me know by sending me an email.In this article I will port RLMeta to C++. The main reason for doing it is to learn C++. I have the impression that C++ is a large and complex language, but I think I should be able to grasp the basics pretty quickly because I know C quite well and I am also familiar with object oriented programming. This article will describe my learning journey as well as the port. Another reason for doing the port is to see how RLMeta looks like when C++ is the target language. Will it fit well? Will performance increase?
The first thing I think about is how to represent the data that RLMeta needs. In particular I think about how to store the mapping between rule names and their location in the program. The map in the C++ standard library looks like a dict in Python. But there is also something about an allocation strategy. I need to learn more about how memory is handled in C++.
I read about classes, constructors, destructors and initialisation. I create a small example to clarify my thinking. Here I also learn about basic C++ syntax.
#include <iostream> class Foo { private: int id; public: Foo(int id) : id(id) { std::cout << "Creating foo " << id << "\n"; } ~Foo() { std::cout << "Destructing foo " << id << "\n"; } }; int main() { std::cout << "Entering main\n"; Foo foo1(1); Foo *foo2 = new Foo(2); std::cout << "Exiting main\n"; return 0; }
Output:
Entering main Creating foo 1 Creating foo 2 Exiting main Destructing foo 1
This shows me that objects are destructed automatically if allocated on the stack. But not when newed.
I go back to the map and make a small example to learn how to use it:
#include <iostream> #include <map> int main() { std::map<std::string, int> rules; rules.insert(std::pair<std::string, int>("foo", 10)); rules.insert(std::pair<std::string, int>("bar", 15)); std::cout << "foo = " << rules["foo"] << "\n"; std::cout << "bar = " << rules["bar"] << "\n"; return 0; }
Output:
foo = 10 bar = 15
I think the type declarations make the code hard to read. But the map seems to do what I want. So I'm happy. I also conclude that the map allocates memory to make space for all pairs. But at the end of main
it will all be freed automatically. So no memory will leak.
Next I'm thinking about how to represent instructions. In the Python version an instruction is represented as a tuple (name, arg1, arg2)
. Where the arguments can be of arbitrary type (string, integer, lambda, etc.).
Getting stuck. I think about how to represent the values that RLMeta works with instead. Lists, strings, integers, characters.
#include <iostream> #include <list> #include <memory> class RLMetaObject { public: virtual ~RLMetaObject() { } }; class RLMetaInteger : public RLMetaObject { private: int value; public: RLMetaInteger(int value) : value(value) { } ~RLMetaInteger() { std::cout << "Destructing integer " << value << "\n"; } }; class RLMetaString : public RLMetaObject { private: std::string value; public: RLMetaString(std::string value) : value(value) { } ~RLMetaString() { std::cout << "Destructing string " << value << "\n"; } }; int main() { std::list<RLMetaObject> objects; objects.push_back(RLMetaInteger(5)); objects.push_back(RLMetaString("hello")); std::list<std::unique_ptr<RLMetaObject>> objectPointers; objectPointers.push_back( std::make_unique<RLMetaInteger>(6) ); objectPointers.push_back( std::make_unique<RLMetaString>("world") ); return 0; }
Destructing integer 5 Destructing string hello Destructing integer 6 Destructing string world
Since C++ containers can not store objects of arbitrary type, I introduce a base class. I also experiment with smart pointers. I find out that a virtual destructor is needed to make proper destruction. That really confused me.
Dynamic cast can work as isinstance
in Python.
#include <iostream> #include <list> #include <memory> class RLMetaObject { virtual bool matches(const RLMetaObject &other) { return false; } }; class RLMetaInteger : public RLMetaObject { private: int value; public: RLMetaInteger(int value) : value(value) { } bool matches(const RLMetaObject &other) { try { const RLMetaInteger &otherInt = dynamic_cast<const RLMetaInteger&>(other); return otherInt.value == value; } catch (std::bad_cast) {} return false; } }; int main() { RLMetaObject obj; RLMetaInteger int5(5); RLMetaInteger int6(6); std::cout << "Integer(5) matches Object? " << int5.matches(obj) << "\n"; std::cout << "Integer(5) matches Integer(6)? " << int5.matches(int6) << "\n"; std::cout << "Integer(6) matches Integer(6)? " << int6.matches(int6) << "\n"; return 0; }
Integer(5) matches Object? 0 Integer(5) matches Integer(6)? 0 Integer(6) matches Integer(6)? 1
For this to work, there has to be some virtual member in the base class.
cat > tmp.cpp && g++ tmp.cpp -o tmp && ./tmp && rm tmp.cpp tmp
Published on 2019-06-27.
This is a work in progress that will change. Like to see it finished? Let me know by sending me an email.In this article I describe how to parse off-side rule languages using RLMeta.
Got the idea from A PEG Markdown parser.
Failure messages will have wrong location.
]]>