BASIC interpreter
The application described in this section is a program interpreter for
Basic. Thus, it is a program that can run other programs written in
Basic. Of course, we will only deal with a restricted language,
which contains the following commands:
-
PRINT expression
| |
Prints the result of the evaluation of the expression. |
- INPUT variable
| |
Prints a prompt (?), reads an integer typed in by the
user, and assigns its value to the variable. |
- LET variable = expression
| |
Assigns the result of the evaluation of expression to the variable. |
- GOTO line number
| |
Continues execution at the given line. |
- IF condition THEN line number
| |
Continues execution at the given line if the condition is true. |
- REM any string
Every line of a Basic program is labelled with a line number, and
contains only one command. For instance, a program that computes and
then prints the factorial of an integer given by the user is written:
5 REM inputting the argument
10 PRINT " factorial of:"
20 INPUT A
30 LET B = 1
35 REM beginning of the loop
40 IF A <= 1 THEN 80
50 LET B = B * A
60 LET A = A - 1
70 GOTO 40
75 REM prints the result
80 PRINT B
We also wish to write a small text editor, working as a toplevel interactive
loop. It should be able to add new lines, display a program, execute
it, and display the result.
Execution of the program is started with the
RUN command. Here is an example of the evaluation of this
program:
> RUN
factorial of: ? 5
120
The interpreter is implemented in several distinct parts:
- Description of the abstract syntax
- : describes the definition of data
types to represent Basic programs, as well as their components
(lines, commands, expressions, etc.).
- Program pretty printing
- : consists of transforming the
internal representation of Basic programs to strings, in order
to display them.
- Lexing and parsing
- : accomplish the inverse
transformation, that is, transform a string into the internal
representation of a Basic program (the abstract syntax).
- Evaluation
- : is the heart of the interpreter. It controls
and runs the program. As we will see, functional languages, such as
Objective CAML, are particularly well adapted for this kind of problem.
- Toplevel interactive loop
- : glues together all the previous parts.
Abstract syntax
Figure 6.2 introduces the concrete syntax, as a BNF
grammar, of the Basic we will implement. This kind of description
for language syntaxes is described in chapter 11,
page ??.
| Unary_Op |
::= |
- | ! |
| Binary_Op |
::= |
+ | - | * | / |
% |
| |
| |
= | < | > | <= |
>= | <> |
| |
| |
& | ' | ' |
| Expression |
::= |
integer |
| |
| |
variable |
| |
| |
"string" |
| |
| |
Unary_Op Expression |
| |
| |
Expression Binary_Op Expression |
| |
| |
( Expression ) |
| Command |
::= |
REM string |
| |
| |
GOTO integer |
| |
| |
LET variable = Expression |
| |
| |
PRINT Expression |
| |
| |
INPUT variable |
| |
| |
IF Expression THEN integer |
| |
| Line |
::= |
integer Command |
| |
| Program |
::= |
Line |
| |
| |
Line Program |
| |
| Phrase |
::= |
Line | RUN | LIST | END |
Figure 6.2: BASIC Grammar.
We can see that the way expressions are defined does not ensure that a
well formed expression can be evaluated. For instance, 1+"hello" is an expression, and yet it is not possible to evaluate
it. This deliberate choice lets us simplify both the abstract syntax
and the parsing of the Basic language. The price to pay for
this choice is that a syntactically correct Basic program may generate
a runtime error because of a type mismatch.
Defining Objective CAML data types for this abstract syntax is easy,
we simply translate the concrete syntax into a sum type:
# type unr_op = UMINUS | NOT ;;
# type bin_op = PLUS | MINUS | MULT | DIV | MOD
| EQUAL | LESS | LESSEQ | GREAT | GREATEQ | DIFF
| AND | OR ;;
# type expression =
ExpInt of int
| ExpVar of string
| ExpStr of string
| ExpUnr of unr_op * expression
| ExpBin of expression * bin_op * expression ;;
# type command =
Rem of string
| Goto of int
| Print of expression
| Input of string
| If of expression * int
| Let of string * expression ;;
# type line = { num : int ; cmd : command } ;;
# type program = line list ;;
We also define the abstract syntax for the commands for the small
program editor:
# type phrase = Line of line | List | Run | PEnd ;;
It is convenient to allow the programmer to skip some parentheses in
arithmetic expressions. For instance, the expression 1+3*4 is
usually interpreted as 1+(3*4). To this end, we associate an integer
with each operator of the language:
# let priority_uop = function NOT -> 1 | UMINUS -> 7
let priority_binop = function
MULT | DIV -> 6
| PLUS | MINUS -> 5
| MOD -> 4
| EQUAL | LESS | LESSEQ | GREAT | GREATEQ | DIFF -> 3
| AND | OR -> 2 ;;
val priority_uop : unr_op -> int = <fun>
val priority_binop : bin_op -> int = <fun>
These integers indicate the priority of the operators. They
will be used to print and parse programs.
Program pretty printing
To print a program, one needs to be able to convert abstract syntax
program lines into strings.
Converting operators is easy:
# let pp_binop = function
PLUS -> "+" | MULT -> "*" | MOD -> "%" | MINUS -> "-"
| DIV -> "/" | EQUAL -> " = " | LESS -> " < "
| LESSEQ -> " <= " | GREAT -> " > "
| GREATEQ -> " >= " | DIFF -> " <> " | AND -> " & " | OR -> " | "
let pp_unrop = function UMINUS -> "-" | NOT -> "!" ;;
val pp_binop : bin_op -> string = <fun>
val pp_unrop : unr_op -> string = <fun>
Expression printing needs to take into account operator priority to
print as few parentheses as possible. For instance, parentheses are
put around a subexpression at the right of an operator only if the
subexpression's main operator has a lower priority that the main
operator of the whole expression. Also, arithmetic operators are
left-associative, thus the expression 1-2-3 is interpreted as
(1-2)-3.
To deal with this, we use two auxiliary functions ppl and
ppr to print left and right subtrees, respectively. These
functions take two arguments: the tree to print and the priority of
the enclosing operator, which is used to decide if parentheses are
necessary. Left
and right subtrees are distinguished to deal with associativity. If
the current operator priority is the same than the enclosing operator
priority, left trees do not need parentheses whereas right ones may
require them, as in 1-(2-3) or 1-(2+3).
The initial tree is taken as a left subtree with minimal priority
(0).
The expression pretty printing function pp_expression is:
# let parenthesis x = "(" ^ x ^ ")";;
val parenthesis : string -> string = <fun>
# let pp_expression =
let rec ppl pr = function
ExpInt n -> (string_of_int n)
| ExpVar v -> v
| ExpStr s -> "\"" ^ s ^ "\""
| ExpUnr (op,e) ->
let res = (pp_unrop op)^(ppl (priority_uop op) e)
in if pr=0 then res else parenthesis res
| ExpBin (e1,op,e2) ->
let pr2 = priority_binop op
in let res = (ppl pr2 e1)^(pp_binop op)^(ppr pr2 e2)
(* parenthesis if priority is not greater *)
in if pr2 >= pr then res else parenthesis res
and ppr pr exp = match exp with
(* right subtrees only differ for binary operators *)
ExpBin (e1,op,e2) ->
let pr2 = priority_binop op
in let res = (ppl pr2 e1)^(pp_binop op)^(ppr pr2 e2)
in if pr2 > pr then res else parenthesis res
| _ -> ppl pr exp
in ppl 0 ;;
val pp_expression : expression -> string = <fun>
Command pretty printing uses the expression pretty printing
function. Printing a line consists of printing the line number before
the command.
# let pp_command = function
Rem s -> "REM " ^ s
| Goto n -> "GOTO " ^ (string_of_int n)
| Print e -> "PRINT " ^ (pp_expression e)
| Input v -> "INPUT " ^ v
| If (e,n) -> "IF "^(pp_expression e)^" THEN "^(string_of_int n)
| Let (v,e) -> "LET " ^ v ^ " = " ^ (pp_expression e) ;;
val pp_command : command -> string = <fun>
# let pp_line l = (string_of_int l.num) ^ " " ^ (pp_command l.cmd) ;;
val pp_line : line -> string = <fun>
Lexing
Lexing and parsing do the inverse transformation of
printing, going from a string to a syntax tree. Lexing
splits the text of a command line into independent lexical units
called lexemes, with Objective CAML type:
# type lexeme = Lint of int
| Lident of string
| Lsymbol of string
| Lstring of string
| Lend ;;
A particular lexeme denotes the end of an expression: Lend.
It is not present in the text of the expression, but is created by the
lexing function (see the lexer function, page
??).
The string being lexed is kept in a record that contains a mutable
field indicating the position after which lexing has not been
done yet. Since the size of the string is used several times and does
not change, it is also stored in the record:
# type string_lexer = {string:string; mutable current:int; size:int } ;;
This representation lets us define the lexing of a string as the
application of a function to a value of type string_lexer
returning a value of type lexeme. Modifying the current
position in the string is done as a side effect.
# let init_lex s = { string=s; current=0 ; size=String.length s } ;;
val init_lex : string -> string_lexer = <fun>
# let forward cl = cl.current <- cl.current+1 ;;
val forward : string_lexer -> unit = <fun>
# let forward_n cl n = cl.current <- cl.current+n ;;
val forward_n : string_lexer -> int -> unit = <fun>
# let extract pred cl =
let st = cl.string and pos = cl.current in
let rec ext n = if n<cl.size && (pred st.[n]) then ext (n+1) else n in
let res = ext pos
in cl.current <- res ; String.sub cl.string pos (res-pos) ;;
val extract : (char -> bool) -> string_lexer -> string = <fun>
The following functions extract a lexeme from the string and modify
the current position. The two functions extract_int and
extract_ident extract an integer and an identifier,
respectively.
# let extract_int =
let is_int = function '0'..'9' -> true | _ -> false
in function cl -> int_of_string (extract is_int cl)
let extract_ident =
let is_alpha_num = function
'a'..'z' | 'A'..'Z' | '0' .. '9' | '_' -> true
| _ -> false
in extract is_alpha_num ;;
val extract_int : string_lexer -> int = <fun>
val extract_ident : string_lexer -> string = <fun>
The lexer function uses the two previous functions to extract
a lexeme.
# exception LexerError ;;
exception LexerError
# let rec lexer cl =
let lexer_char c = match c with
' '
| '\t' -> forward cl ; lexer cl
| 'a'..'z'
| 'A'..'Z' -> Lident (extract_ident cl)
| '0'..'9' -> Lint (extract_int cl)
| '"' -> forward cl ;
let res = Lstring (extract ((<>) '"') cl)
in forward cl ; res
| '+' | '-' | '*' | '/' | '%' | '&' | '|' | '!' | '=' | '(' | ')' ->
forward cl; Lsymbol (String.make 1 c)
| '<'
| '>' -> forward cl;
if cl.current >= cl.size then Lsymbol (String.make 1 c)
else let cs = cl.string.[cl.current]
in ( match (c,cs) with
('<','=') -> forward cl; Lsymbol "<="
| ('>','=') -> forward cl; Lsymbol ">="
| ('<','>') -> forward cl; Lsymbol "<>"
| _ -> Lsymbol (String.make 1 c) )
| _ -> raise LexerError
in
if cl.current >= cl.size then Lend
else lexer_char cl.string.[cl.current] ;;
val lexer : string_lexer -> lexeme = <fun>
The lexer function is very simple: it matches the current
character of a string and, based on its value, extracts the
corresponding lexeme and modifies the current position to the start of
the next lexeme. The code is simple because, for all characters except
two, the current character defines which lexeme to extract. In the
more complicated cases of '<', we need to look at the next
character, which might be a '=' or a '>', producing two different
lexemes. The same problem arises with '>'.
Parsing
The only difficulty in parsing our language comes from expressions.
Indeed, knowing the beginning of an expression is not enough to know
its structure. For instance, having parsed the beginning of an
expression as being 1+2+3, the resulting syntax tree for this part
depends on the rest of the expression: its structure is different when
it is followed by +4 or *4 (see figure 6.3).
Figure 6.3: Basic: abstract syntax tree examples
.
However, since the tree structure for 1+2 is the same in both cases,
it can be built. As the position of +3 in the structure is not fully
known, it is temporarily stored.
To build the abstract syntax tree, we use a pushdown automaton
similar to the one built by yacc (see page
??). Lexemes are read one by one and put
on a stack until there is enough information to build the
expression. They are then removed from the stack and replaced by the
expression. This latter operation is called reduction.
The stack elements have type:
# type exp_elem =
Texp of expression (* expression *)
| Tbin of bin_op (* binary operator *)
| Tunr of unr_op (* unary operator *)
| Tlp (* left parenthesis *) ;;
Right parentheses are not stored on the stack as only left parentheses
matter for reduction.
Figure 6.4 illustrates the way the stack is used to
parse the expression (1+2*3)+4. The character above the arrow is the
current character of the string.
Figure 6.4: Basic: abstract syntax tree construction
example
.
We define an exception for syntax errors.
# exception ParseError ;;
The first step consists of transforming symbols into operators:
# let unr_symb = function
"!" -> NOT | "-" -> UMINUS | _ -> raise ParseError
let bin_symb = function
"+" -> PLUS | "-" -> MINUS | "*" -> MULT | "/" -> DIV | "%" -> MOD
| "=" -> EQUAL | "<" -> LESS | "<=" -> LESSEQ | ">" -> GREAT
| ">=" -> GREATEQ | "<>" -> DIFF | "&" -> AND | "|" -> OR
| _ -> raise ParseError
let tsymb s = try Tbin (bin_symb s) with ParseError -> Tunr (unr_symb s) ;;
val unr_symb : string -> unr_op = <fun>
val bin_symb : string -> bin_op = <fun>
val tsymb : string -> exp_elem = <fun>
The reduce function implements stack reduction. There are two
cases to consider, whether the stack starts with:
-
an expression followed by a unary operator,
-
an expression followed by a binary operator and an expression.
Moreover, reduce takes an argument indicating the minimal
priority that an operator should have to trigger reduction. To avoid
this reduction condition, it suffices to give the minimal value, zero,
as the priority.
# let reduce pr = function
(Texp e)::(Tunr op)::st when (priority_uop op) >= pr
-> (Texp (ExpUnr (op,e)))::st
| (Texp e1)::(Tbin op)::(Texp e2)::st when (priority_binop op) >= pr
-> (Texp (ExpBin (e2,op,e1)))::st
| _ -> raise ParseError ;;
val reduce : int -> exp_elem list -> exp_elem list = <fun>
Notice that expression elements are stacked as they are read. Thus it
is necessary to swap them when they are arguments of a binary
operator.
The main function of our parser is stack_or_reduce that,
according to the lexeme given in argument, puts it on the stack or
triggers a reduction.
# let rec stack_or_reduce lex stack = match lex , stack with
Lint n , _ -> (Texp (ExpInt n))::stack
| Lident v , _ -> (Texp (ExpVar v))::stack
| Lstring s , _ -> (Texp (ExpStr s))::stack
| Lsymbol "(" , _ -> Tlp::stack
| Lsymbol ")" , (Texp e)::Tlp::st -> (Texp e)::st
| Lsymbol ")" , _ -> stack_or_reduce lex (reduce 0 stack)
| Lsymbol s , _
-> let symbol =
if s<>"-" then tsymb s
(* remove the ambiguity of the ``-'' symbol *)
(* according to the last exp element put on the stack *)
else match stack
with (Texp _)::_ -> Tbin MINUS
| _ -> Tunr UMINUS
in ( match symbol with
Tunr op -> (Tunr op)::stack
| Tbin op ->
( try stack_or_reduce lex (reduce (priority_binop op)
stack )
with ParseError -> (Tbin op)::stack )
| _ -> raise ParseError )
| _ , _ -> raise ParseError ;;
val stack_or_reduce : lexeme -> exp_elem list -> exp_elem list = <fun>
Once all lexemes are defined and stacked, the function
reduce_all builds the abstract syntax tree with the elements
remaining in the stack. If the expression being parsed is well formed,
only one element should remain in the stack, containing the tree for
this expression.
# let rec reduce_all = function
| [] -> raise ParseError
| [Texp x] -> x
| st -> reduce_all (reduce 0 st) ;;
val reduce_all : exp_elem list -> expression = <fun>
The parse_exp function is the main expression parsing
function. It reads a string, extracts its lexemes and passes them to the
stack_or_reduce function. Parsing stops when the current
lexeme satisfies a predicate that is given as an argument.
# let parse_exp stop cl =
let p = ref 0 in
let rec parse_one stack =
let l = ( p:=cl.current ; lexer cl)
in if not (stop l) then parse_one (stack_or_reduce l stack)
else ( cl.current <- !p ; reduce_all stack )
in parse_one [] ;;
val parse_exp : (lexeme -> bool) -> string_lexer -> expression = <fun>
Notice that the lexeme that made the parsing stop is not used to build
the expression. It is thus necessary to modify the current position to
its beginning (variable p) to parse it later.
We can now parse a command line:
# let parse_cmd cl = match lexer cl with
Lident s -> ( match s with
"REM" -> Rem (extract (fun _ -> true) cl)
| "GOTO" -> Goto (match lexer cl with
Lint p -> p
| _ -> raise ParseError)
| "INPUT" -> Input (match lexer cl with
Lident v -> v
| _ -> raise ParseError)
| "PRINT" -> Print (parse_exp ((=) Lend) cl)
| "LET" ->
let l2 = lexer cl and l3 = lexer cl
in ( match l2 ,l3 with
(Lident v,Lsymbol "=") -> Let (v,parse_exp ((=) Lend) cl)
| _ -> raise ParseError )
| "IF" ->
let test = parse_exp ((=) (Lident "THEN")) cl
in ( match ignore (lexer cl) ; lexer cl with
Lint n -> If (test,n)
| _ -> raise ParseError )
| _ -> raise ParseError )
| _ -> raise ParseError ;;
val parse_cmd : string_lexer -> command = <fun>
Finally, we implement the function to parse commands typed by the
user:
# let parse str =
let cl = init_lex str
in match lexer cl with
Lint n -> Line { num=n ; cmd=parse_cmd cl }
| Lident "LIST" -> List
| Lident "RUN" -> Run
| Lident "END" -> PEnd
| _ -> raise ParseError ;;
val parse : string -> phrase = <fun>
Evaluation
A Basic program is a list of lines. Execution starts at the first
line. Interpreting a program line consists of executing the task
corresponding to its command. There are three different kinds of
commands: input-output (PRINT and INPUT), variable
declaration or modification (LET), and flow control
(GOTO and IF...THEN). Input-output commands interact
with the user and use the corresponding Objective CAML functions.
Variable declaration and modification commands need to know how to
compute the value of an arithmetic expression and the memory
location to store the result. Expression evaluation returns an
integer, a boolean, or a string. Their type is value.
# type value = Vint of int | Vstr of string | Vbool of bool ;;
Variable declaration should allocate some memory to store the
associated value. Similarly, variable modification requires the
modification of the associated value. Thus, evaluation of a Basic
program uses an environment that stores the association
between a variable name and its value. It is represented by an
association list of tuples (name,value):
# type environment = (string * value) list ;;
The variable name is used to access its value. Variable modification
modifies the association.
Flow control commands, conditional or unconditional, specify the
number of the next line to execute. By default, it is the next
line. To do this, it is necessary to remember
the number of the current line.
The list of commands representing the program being edited under
the toplevel is not an efficient data structure for running the
program. Indeed, it is then necessary to look at the whole list of
lines to find the line indicated by a flow control command
(If and goto). Replacing the list of lines with an array
of commands allows direct access to the command following a
flow control command, using the array index instead of the line
number in the flow control command. This solution requires some
preprocessing called assembly before executing a RUN
command. For reasons that will be detailed shortly, a program
after assembly is not represented as an array of commands but as an
array of lines:
# type code = line array ;;
As in the calculator example of previous chapters, the interpreter
uses a state that is modified for each command evaluation. At each
step, we need to remember the whole program, the next line to
interpret and the values of the variables. The program being
interpreted is not exactly the one that was entered in the toplevel:
instead of a list of commands, it is an array of commands. Thus
the state of a program during execution is:
# type state_exec = { line:int ; xprog:code ; xenv:environment } ;;
Two different reasons may lead to an error during the evaluation of a
line: an error while computing an expression, or branching to an
absent line. They must be dealt with so that the interpreter exits
nicely, printing an error message. We define an exception as well as a
function to raise it, indicating the line where the error occurred.
# exception RunError of int
let runerr n = raise (RunError n) ;;
exception RunError of int
val runerr : int -> 'a = <fun>
Assembly
Assembling a program that is a list of numbered lines (type
program) consists of transforming this list into an array and
modifying the flow control commands. This last modification only
needs an association table between line numbers and array indexes.
This is easily provided by storing lines (with their line numbers),
instead of commands, in the array: to find the association between
a line number and the index in the array, we look the line number up
in the array and return the corresponding index. If no line is found
with this number, the index returned is -1.
# exception Result_lookup_index of int ;;
exception Result_lookup_index of int
# let lookup_index tprog num_line =
try
for i=0 to (Array.length tprog)-1 do
let num_i = tprog.(i).num
in if num_i=num_line then raise (Result_lookup_index i)
else if num_i>num_line then raise (Result_lookup_index (-1))
done ;
(-1 )
with Result_lookup_index i -> i ;;
val lookup_index : line array -> int -> int = <fun>
# let assemble prog =
let tprog = Array.of_list prog in
for i=0 to (Array.length tprog)-1 do
match tprog.(i).cmd with
Goto n -> let index = lookup_index tprog n
in tprog.(i) <- { tprog.(i) with cmd = Goto index }
| If(c,n) -> let index = lookup_index tprog n
in tprog.(i) <- { tprog.(i) with cmd = If (c,index) }
| _ -> ()
done ;
tprog ;;
val assemble : line list -> line array = <fun>
Expression evaluation
The evaluation function does a depth-first traversal on the abstract syntax
tree, and executes the operations indicated at each node.
The RunError exception is raised in case of type
inconsistency, division by zero, or an undeclared variable.
# let rec eval_exp n envt expr = match expr with
ExpInt p -> Vint p
| ExpVar v -> ( try List.assoc v envt with Not_found -> runerr n )
| ExpUnr (UMINUS,e) ->
( match eval_exp n envt e with
Vint p -> Vint (-p)
| _ -> runerr n )
| ExpUnr (NOT,e) ->
( match eval_exp n envt e with
Vbool p -> Vbool (not p)
| _ -> runerr n )
| ExpStr s -> Vstr s
| ExpBin (e1,op,e2)
-> match eval_exp n envt e1 , op , eval_exp n envt e2 with
Vint v1 , PLUS , Vint v2 -> Vint (v1 + v2)
| Vint v1 , MINUS , Vint v2 -> Vint (v1 - v2)
| Vint v1 , MULT , Vint v2 -> Vint (v1 * v2)
| Vint v1 , DIV , Vint v2 when v2<>0 -> Vint (v1 / v2)
| Vint v1 , MOD , Vint v2 when v2<>0 -> Vint (v1 mod v2)
| Vint v1 , EQUAL , Vint v2 -> Vbool (v1 = v2)
| Vint v1 , DIFF , Vint v2 -> Vbool (v1 <> v2)
| Vint v1 , LESS , Vint v2 -> Vbool (v1 < v2)
| Vint v1 , GREAT , Vint v2 -> Vbool (v1 > v2)
| Vint v1 , LESSEQ , Vint v2 -> Vbool (v1 <= v2)
| Vint v1 , GREATEQ , Vint v2 -> Vbool (v1 >= v2)
| Vbool v1 , AND , Vbool v2 -> Vbool (v1 && v2)
| Vbool v1 , OR , Vbool v2 -> Vbool (v1 || v2)
| Vstr v1 , PLUS , Vstr v2 -> Vstr (v1 ^ v2)
| _ , _ , _ -> runerr n ;;
val eval_exp : int -> (string * value) list -> expression -> value = <fun>
Command evaluation
To evaluate a command, we need a few additional functions.
We add an association to an environment by removing a previous
association for the same variable name if there is one:
# let rec add v e env = match env with
[] -> [v,e]
| (w,f)::l -> if w=v then (v,e)::l else (w,f)::(add v e l) ;;
val add : 'a -> 'b -> ('a * 'b) list -> ('a * 'b) list = <fun>
A function that prints the value of an integer or string is useful
for evaluation of the PRINT command.
# let print_value v = match v with
Vint n -> print_int n
| Vbool true -> print_string "true"
| Vbool false -> print_string "false"
| Vstr s -> print_string s ;;
val print_value : value -> unit = <fun>
The execution of a command corresponds to a transition from one
state to another. More precisely, the environment is modified if the
command is an assignment. Furthermore, the next line to execute is always
modified. As a convention, if the next line to execute does not exist,
we set its value to -1
# let next_line state =
let n = state.line+1 in
if n < Array.length state.xprog then n else -1 ;;
val next_line : state_exec -> int = <fun>
# let eval_cmd state =
match state.xprog.(state.line).cmd with
Rem _ -> { state with line = next_line state }
| Print e -> print_value (eval_exp state.line state.xenv e) ;
print_newline () ;
{ state with line = next_line state }
| Let(v,e) -> let ev = eval_exp state.line state.xenv e
in { state with line = next_line state ;
xenv = add v ev state.xenv }
| Goto n -> { state with line = n }
| Input v -> let x = try read_int ()
with Failure "int_of_string" -> 0
in { state with line = next_line state;
xenv = add v (Vint x) state.xenv }
| If (t,n) -> match eval_exp state.line state.xenv t with
Vbool true -> { state with line = n }
| Vbool false -> { state with line = next_line state }
| _ -> runerr state.line ;;
val eval_cmd : state_exec -> state_exec = <fun>
On each call of the transition function eval_cmd, we
look up the current line, run it, then set the number of the next
line to run as the current line. If the last line of the program is
reached, the current line is given the value -1. This
will tell us when to stop.
Program evaluation
We recursively apply the transition function until we reach a state
where the current line number is -1.
# let rec run state =
if state.line = -1 then state else run (eval_cmd state) ;;
val run : state_exec -> state_exec = <fun>
Finishing touches
The only thing left to do is to write a small editor and to plug
together all the functions we wrote in the previous sections.
The insert function adds a new line in the program at the
requested place.
# let rec insert line p = match p with
[] -> [line]
| l::prog ->
if l.num < line.num then l::(insert line prog)
else if l.num=line.num then line::prog
else line::l::prog ;;
val insert : line -> line list -> line list = <fun>
The print_prog function prints the source code of a
program.
# let print_prog prog =
let print_line x = print_string (pp_line x) ; print_newline () in
print_newline () ;
List.iter print_line prog ;
print_newline () ;;
val print_prog : line list -> unit = <fun>
The one_command function processes the insertion of a line
or the execution of a command. It modifies the state of the toplevel
loop, which consists of a program and an environment. This
state, represented by the loop_state type, is different from
the evaluation state.
# type loop_state = { prog:program; env:environment } ;;
# exception End ;;
# let one_command state =
print_string "> " ; flush stdout ;
try
match parse (input_line stdin) with
Line l -> { state with prog = insert l state.prog }
| List -> (print_prog state.prog ; state )
| Run
-> let tprog = assemble state.prog in
let xstate = run { line = 0; xprog = tprog; xenv = state.env } in
{state with env = xstate.xenv }
| PEnd -> raise End
with
LexerError -> print_string "Illegal character\n"; state
| ParseError -> print_string "syntax error\n"; state
| RunError n ->
print_string "runtime error at line ";
print_int n ;
print_string "\n";
state ;;
val one_command : loop_state -> loop_state = <fun>
The main function is the go function, which starts the
toplevel loop of our Basic.
# let go () =
try
print_string "Mini-BASIC version 0.1\n\n";
let rec loop state = loop (one_command state) in
loop { prog = []; env = [] }
with End -> print_string "See you later...\n";;
val go : unit -> unit = <fun>
The loop is implemented by the local function loop. It
stops when the End exception is raised by the
one_command function.
Example: C+/C-
We return to the example of the C+/C- game described in chapter 3,
page ??. Here is the Basic program corresponding to that
Objective CAML program:
10 PRINT "Give the hidden number: "
20 INPUT N
30 PRINT "Give a number: "
40 INPUT R
50 IF R = N THEN 110
60 IF R < N THEN 90
70 PRINT "C-"
80 GOTO 30
90 PRINT "C+"
100 GOTO 30
110 PRINT "CONGRATULATIONS"
And here is a sample run of this program.
> RUN
Give the hidden number:
64
Give a number:
88
C-
Give a number:
44
C+
Give a number:
64
CONGRATULATIONS
Further work
The Basic we implemented is minimalist. If you want to go further,
the following exercises hint at some possible extensions.
- Floating-point numbers: as is, our language only deals with
integers, strings and booleans. Add floats, as well as the
corresponding arithmetic operations in the language grammar. We need
to modify not only parsing, but also evaluation, taking into
account the implicit conversions between integers and floats.
- Arrays: Add to the syntax the command DIM
var[x] that declares an array var of size x, and the
expression var[i] that references the ith element of the
array var.
- Toplevel directives: Add the toplevel directives SAVE "file_name" and LOAD "file_name" that save a Basic
program to the hard disk, and load a Basic program from the hard
disk, respectively.
- Sub-program: Add sub-programs. The GOSUB line
number command calls a sub-program by branching to the given
line number while storing the line from where the call is made. The
RETURN command resumes execution at the line following the
last GOSUB call executed, if there is one, or exits the
program otherwise. Adding sub-programs requires evaluation to
manage not only the environement but also a stack containing the
return addresses of the current GOSUB calls. The GOSUB
command adds the possibility of defining recursive sub-programs.