Download: http://www.cs.ust.hk/~dekai/151/assignments/a1.tar.gz
Assignment page: http://www.cs.ust.hk/~dekai/151/assignments/a1/html/
Course page: http://www.cs.ust.hk/~dekai/151/
You've been assigned to write an expression evaluator. The input expressions to your evaluator will be strings in a traditional computer science notation known as s-expressions, which is a prefix notation that is fully parenthesized to avoid ambiguity, for example:
(+ 2 3 4 5)
(+ 2 3.5)
(* (+ 2 3) (* 3.14 2.17) (+ 6 4.5))
(+ "hello world" "bye-" "bye")
(floor 2.6)
Notice that you're dealing with four distinct types of atomic elements in these expressions. There are three different types of literals used as operands: ints, doubles, and strings. In addition, the operator symbols are functions whose symbol names are +
, *
, and floor
.
Also notice that operands do not have to be literals. They also can recursively be sub-expressions.
Given such an input expression, your evaluator should return the result of evaluating the expression, for example:
(+ 2 3 4 5)
should return the int 14
(+ 2 3.5)
should return the double 5.5
(* (+ 2 3) (* 3.14 2.17) (+ 6 4.5))
should return the double 357.7245
(+ "hello world" "bye-" "bye")
should return the string "hello worldbye-bye"
(floor 2.6)
should return the int 2
Your expression evaluator should proceed in two stages:
Thankfully, you don't have to write a parser. You've already found a parser written by someone else, capable of reading and understanding strings in s-expression format. Download the package at the top of this page, then:
$ tar xzf a1.tar.gz
// unpack the archive
$ cd a1
$ make
g++ -c main.cpp
g++ -c parse.cpp
g++ -c eval.cpp
g++ -c Cell.cpp
g++ -DOP_ASSIGN -o main main.o parse.o eval.o Cell.o -lm
$ main testinput.txt
You will see the parser's test program run. You can also run the program in interactive mode by typing main
without any command-line arguments. In interactive mode, the program will prompt you to type in an expression, and then it'll print the expression back to you, and repeat until you type control-D to generate an EOF (end-of-file).
You wish to re-use this code for parsing s-expressions. The only problem is that this parser, as originally written, does not actually build a tree data structure. So you will need to modify the code to suit the purposes of this assignment.
But you are lucky that the author of the parser encapsulated his code quite neatly. You will not even have to touch his parser code (which is in parse.hpp
and parse.cpp
). As the parser analyzes the input string, it needs to call various operations to record the nested tree structure that it is finding in the s-expression. The author wisely encapsulated all these operations in a separate header file cons.hpp
that keeps all the concrete tree implementation details hidden from the parser:
Cell* make_int(const int i)
returns a newly constructed int cell, i.e., a leaf node initialized to hold the int value i
Cell* make_double(const double d)
returns a newly constructed double cell, i.e., a leaf node initialized to hold the double value d
Cell* make_string(const char* const s)
returns a newly constructed string cell, i.e., a leaf node initialized to hold the string value s
Cell* make_symbol(const char* const s)
returns a newly constructed symbol cell, i.e., a leaf node initialized to hold the symbol name s
Cell* cons(Cell* const my_car, Cell* const my_cdr)
returns a newly constructed cons cell, i.e., an internal node initialized to hold the two pointers my_car
and my_cdr
bool intp(Cell* const c)
returns true iff the cell pointed to by c
is an int cell bool doublep(Cell* const c)
returns true iff the cell pointed to by c
is a double cell bool stringp(Cell* const c)
returns true iff the cell pointed to by c
is a string cell bool symbolp(Cell* const c)
returns true iff the cell pointed to by c
is a symbol cell bool listp(Cell* const c)
returns true iff the cell pointed to by c
is a cons cell int get_int(Cell* const c)
returns the value in the int cell pointed to by c
(error if c
is not an int cell) double get_double(Cell* const c)
returns the value in the double cell pointed to by c
(error if c
is not a double cell) string get_string(Cell* const c)
returns the value in the string cell pointed to by c
(error if c
is not a string cell) string symbol_to_string(Cell* const c)
returns the symbol name in the symbol cell pointed to by c
(error if c
is not a symbol cell) Cell* car(Cell* const c)
returns the car pointer of the cons cell pointed to by c
(error if c
is not a cons cell) Cell* cdr(Cell* const c)
returns the cdr pointer of the cons cell pointed to by c
(error if c
is not a cons cell) ostream& operator<<(ostream& os, Cell* const c)
prints the cons cell pointed to by c
to the output stream os
, using s-expression notation
Notice that even though these are just ordinary functions (not member functions of a class), nevertheless this set of functions constitutes an interface for an abstract data type; we'll discuss more background on this particular ADT below. The way that cons.hpp
specifies the ADT interface is nice because it makes very few assumptions about how the concrete implementation might be done. (Good object-oriented programming style does not necessarily require using classes! For example, see Scott Myer's article "<a href="http://www.ddj.com/dept/cpp/184401197">How Non-Member Functions Improve Encapsulation".)
The file cons.hpp
that came with the parser just connects all the functions to a dummy example implementation of the ADT that makes use of a super-simple, incomplete Cell
class probably written by someone else (found in Cell.hpp
and Cell.cpp
). As mentioned earlier, this dummy example implementation isn't actually capable of building a tree data structure with pointers and nodes. Instead, it just records the sub-trees using a string encoding, rather than a tree data structure. Most of the functions needed by cons.hpp
aren't even working yet in the dummy Cell
class, since the basic parser can be run without the accessor functions intp
, doublep
, stringp
, symbolp
, listp
, int
, double
, string
, symbol_to_string
, car
and cdr
. You can build trees without using these accessor functions, although it's not terribly useful without the accessor functions since nobody can access the tree you've built. (And in fact, the string encoding that's used is actually just s-expression notation again! Of course this circular set-up is not terribly useful, either; it's just a partial example to show you how to work with the cons.hpp
ADT interface by connecting it to a concrete implementation in Cell.hpp
.)
You realize that the cleanest and easiest way to proceed is to leave cons.hpp
exactly the way it is. That way, you only need to replace the Cell
with your own implementation.
You will stick exactly to the existing interface of the Cell
class when implementing your own version of Cell
; you won't delete or rename any functions or member functions, to avoid breaking the parser. (However, you might wish to add some new functions and member functions, to support manipulating and accessing the tree data structure you build.)
Looking at the public interface specified by the Cell
class in Cell.hpp
, you see there are a few constructors that the parser's cons.hpp
will expect your own implementation to provide:
Cell(const int i)
constructs an int cell, i.e., a leaf node that holds the int value i
Cell(const double d)
constructs a double cell, i.e., a leaf node that holds a double value d
Cell(const char* const s)
constructs a string cell, i.e., a leaf node that holds the string value s
Cell(bool is_symbol, const char* const s)
constructs a symbol cell, i.e., a leaf node that holds the symbol name s
Cell(Cell* const my_car, Cell* const my_cdr)
constructs a cons cell, i.e., an internal node that holds two pointers my_car
and my_cdr
Other than these constructors, you'll need to implement these public member functions:
bool is_int() const
returns true iff this
is an int cell bool is_double() const
returns true iff this
is a double cell bool is_string() const
returns true iff this
is a string cell bool is_symbol() const
returns true iff this
is a symbol cell bool is_cons() const
returns true iff this
is a cons cell int get_int() const
returns the value in this int cell (error if this
is not an int cell) double get_double() const
returns the value in this double cell (error if this
is not a double cell) string get_string() const
returns the value in this string cell (error if this
is not a string cell) string get_symbol() const
returns the symbol name in this symbol cell (error if this
is not a symbol cell) Cell* get_car() const
returns the car pointer of this cons cell (error if this
is not a cons cell) Cell* get_cdr() const
returns the cdr pointer of this cons cell (error if this
is not a cons cell) void print(ostream& os) const
prints the subtree rooted at this cell, in s-expression notation
The abstract data type (ADT) that the parser author chose to use is a traditional cons list ADT dating back to around 1958. This ADT only requires a nil or NULL constant representing an empty list (which can be written ()
using s-expression notation), plus only three essential core operations cons, car and cdr:
cons(1, nil)
returns (1)
cons(1, cons(2, nil))
returns (1 2)
cons(1, cons(2, cons(3, nil)))
returns (1 2 3)
cons(cons(1, cons(2, nil)), cons(3, nil))
returns ((1 2) 3)
car(cons(1, nil))
returns 1
car(cons(1, cons(2, nil)))
returns (1)
car(cons(cons(1, cons(2, nil)), cons(3, nil)))
returns (1 2)
cdr(cons(1, cons(2, cons(3, nil))))
returns (2 3)
cdr(cons(cons(1, cons(2, nil)), cons(3, nil)))
returns (3)
Building up large trees this way can get a bit tedious, so in practice typically people also define equivalent shorthand functions. Mathematically speaking, such functions are not really necessary, and exist only for the sake of convenience. For example,
list(1)
is shorthand for cons(1, nil)
and returns (1)
list(1, 2, 3)
is shorthand for cons(1, cons(2, cons(3, nil)))
and returns (1 2 3)
list(list(1, 2), 3)
is shorthand for cons(cons(1, cons(2, nil)), cons(3, nil))
and returns ((1 2) 3)
To keep your life simple, you are not required to implement list
even though we'll use it in written examples later.
How do you implement the cons list ADT? In fact there exist many different implement approaches.
You've decided to use the most traditional approach, which is to use a singly linked list data structure. This is the most natural and intuitively obvious way to implement the cons list ADT. As explained in Wikipedia:
Specifically, each node in the linked list is a cons cell, also called a pair. As the name pair implies, a cons cell consists of two values: the first one is the car
, and the second is the cdr
. For list(1, 2, 3)
, there are three cons cells, or pairs. The first cons cell has the number 1 in the first slot, and a pointer to the second cons cell in the second. The second cons cell has the number 2 in the first slot, and a pointer to the third cons cell in the second slot. The third cons cell has the number 3 in the first slot and a NULL constant in the second slot. The NULL constant is usually represented by ()
. The cons function constructs these cons cells, which is why cons(1, list(2, 3))
gives the list (1 2 3)
.
Here's a picture of the linked list to hold ((1 2) 3)
, using what's known as box-and-pointer notation. Notice how it naturally matches the exact structure of cons(cons(1, cons(2, nil)), cons(3, nil))
. There's one box for each cons
, and two nil
pointers.
In this assignment, cells can hold five kinds of values: int, double, string, symbol, and cons. You need a way to store any of these types of values in your cells. How do you do this efficiently?
The obvious naive way is just to have five data members:
class Cell { ... int int_m; double double_m; char* string_m; char* symbol_m; ConsPair conspair_m; };
But this is a terribly inefficient waste of memory. Any instance of a Cell
will only be using one of these data members. So a better way is to use a union:
class Cell { ... union { int int_m; double double_m; char* string_m; char* symbol_m; ConsPair conspair_m; }; };
Unions exist in both C and C++. A union is sort of like a struct, except that a struct provides memory space to simultaneously store all its members, whereas a union provides memory space to store only one of its members at a time. So this way, we won't waste memory space.
The only problem is: when you're using a union, how do you know which of the data members is the valid one? C++ won't tell you; it is your program's own responsibility to keep track of this. If you don't, you can easily create horrible bugs. For example:
Cell* c = new Cell;
c->double_m = 2.718;
cout << c->int_m;
// whoops! will print out a garbage int
cout << c->string_m;
// whoops! will either seg fault or do wildly unpredictable things
To keep track of which member in the union is currently valid, you should use what's known as a tagged union or discriminated union approach:
class Cell { ... enum TypeTag {type_int, type_double, type_string, type_symbol, type_conspair}; TypeTag tag_m; union { int int_m; double double_m; char* string_m; char* symbol_m; ConsPair conspair_m; }; };
Now you can write your code so that you make sure that the tag_m
always holds an up-to-date value indicating which of the union members is valid at the current time.
After you've gotten the parser to store the expression into your real tree data structure (your concrete implementation of the cons list ADT), now you need to write a function that evaluates it.
To do this, you merely need to perform a depth-first traversal starting from the root. As you complete each subtree, the evaluator should return the value resulting from evaluating the sub-expression stored in that subtree. By the time you finish the entire traversal, you will be back at the root---so the evaluator should return the value resulting from evaluating the entire expression.
The problem is: what is the type that should be returned by your evaluator function?
It would seem that this depends on the type of the operands---remember, from the introduction:
(+ 2 3 4 5)
should return the int 14
(+ 2 3.5)
should return the double 5.5
(* (+ 2 3) (* 3.14 2.17) (+ 6 4.5))
should return the double 357.7245
(+ "hello world" "bye-" "bye")
should return the string "hello worldbye-bye"
(floor 2.6)
should return the int 2
But C++ won't let us write a function that returns different types depending on the result!
To solve this problem, we'll re-use your Cell
implementation. Remember, Cell
already gives us a way to store different types of values in the same class type, by using a tagged union. So your function evaluator should return a Cell*
that points to a Cell
holding the value resulting from evaluating the expression:
Cell* eval(Cell* const c)
returns the value resulting from evaluating the expression tree whose root is pointed to by c
(error if c
is not a well-formed expression)
You'll find this declaration in the header file eval.hpp
, which you should not alter. Place your implementation for the evaluator in your own eval.cpp
file, replacing the dummy eval.cpp
that came with the parser.
When you're implementing the eval
function, think carefully about all the different cases (what type are the operands?) The +
and *
functions can be applied to any combination of the numeric int and double types, but what is the return type? Only the +
function can be applied to the string type. No function can be applied to the symbol type. The floor
function can be applied to double types (and perhaps int types, but meaninglessly).
Notice that the dummy example implementation of eval.cpp
doesn't do anything. It just returns the same Cell*
that was passed to it. That's why running the main
program prints out the same s-expression that you give it as input. After you've replaced eval.cpp
with your real evaluator implementation, running the same main
program will instead print out the result of evaluation---which is exactly what we want!
So you don't need to touch main.cpp
at all.
You must follow the design approach outlined in this document. Do not just implement the required functionality using a different design.
Do not use virtual functions or templates in this assignment. This assignment is about static OO support in C++. (You'll get a chance to use dynamic OO polymorphism and templates in the following assignments...)
Do not edit the files parse.hpp
, parse.cpp
, cons.hpp
, eval.hpp
, or main.cpp
. The programming assignments are mini-exercises in how multiple programmers are supposed to interact and communicate in the real world; these files are owned and maintained by the other author(s).
You will need to add things to Cell.hpp
, but do not delete anything from it. Replace the files Cell.cpp
and eval.cpp
with your own implementations. Depending on your approach, you may or may not also wish to add more files.
Depending on your approach, you may or may not need to change the Makefile
. Whether you changed it or not, always make sure you include whatever Makefile
is needed to build your program, when you submit assignment. Otherwise, the graders cannot build your program.
You must write the final version of the program on your own. Sophisticated plagiarism detection systems are in operation, and they are pretty good at catching copying! If you worked in study groups, you must also acknowledge your collaborators in the write-up for each problem, whether or not they are classmates. Other cases will be dealt with as plagiarism. Re-read the policy on the course home page, and note the University's tougher policy this year regarding cheating.
Your programming style (how clearly and how well you speak C++) is what will be graded. Correct functioning of your program is necessary but not sufficient!