COMP151 Programming Assignment 1, Spring 2006

Author:: Dekai Wu

Date:: Due 2006.04.28 at 23:59 by CASS

Download: http://www.cs.ust.hk/~dekai/151/assignments/a1.tar.gz

Assignment page: http://www.cs.ust.hk/~dekai/151/assignments/a1/html/

Course page: http://www.cs.ust.hk/~dekai/151/

Your assignment

You've been assigned to write an expression evaluator. The input expressions to your evaluator will be strings in a traditional computer science notation known as s-expressions, which is a prefix notation that is fully parenthesized to avoid ambiguity, for example:

(+ 2 3 4 5)
(+ 2 3.5)
(* (+ 2 3) (* 3.14 2.17) (+ 6 4.5))
(+ "hello world" "bye-" "bye")
(floor 2.6)

Notice that you're dealing with four distinct types of atomic elements in these expressions. There are three different types of literals used as operands: ints, doubles, and strings. In addition, the operator symbols are functions whose symbol names are +, *, and floor.

Also notice that operands do not have to be literals. They also can recursively be sub-expressions.

Given such an input expression, your evaluator should return the result of evaluating the expression, for example:

(+ 2 3 4 5) should return the int 14
(+ 2 3.5) should return the double 5.5
(* (+ 2 3) (* 3.14 2.17) (+ 6 4.5)) should return the double 357.7245
(+ "hello world" "bye-" "bye") should return the string "hello worldbye-bye"
(floor 2.6) should return the int 2

Your expression evaluator should proceed in two stages:

Parse the input string to analyze and understand its structure. The parser should return a tree data structure in which it records (a) the nested structure of the parenthesized expressions and sub-expressions, and (b) the types of all the elements in the expression.
Evaluate the expression stored in the tree data structure, by doing a depth-first traversal. The evaluator should return the value resulting from evaluating the entire expression.

Borrowing someone else's expression parser

Thankfully, you don't have to write a parser. You've already found a parser written by someone else, capable of reading and understanding strings in s-expression format. Download the package at the top of this page, then:

$ tar xzf a1.tar.gz // unpack the archive
$ cd a1
$ make
g++ -c main.cpp
g++ -c parse.cpp
g++ -c eval.cpp
g++ -c Cell.cpp
g++ -DOP_ASSIGN -o main main.o parse.o eval.o Cell.o -lm
$ main testinput.txt

You will see the parser's test program run. You can also run the program in interactive mode by typing main without any command-line arguments. In interactive mode, the program will prompt you to type in an expression, and then it'll print the expression back to you, and repeat until you type control-D to generate an EOF (end-of-file).

You wish to re-use this code for parsing s-expressions. The only problem is that this parser, as originally written, does not actually build a tree data structure. So you will need to modify the code to suit the purposes of this assignment.

But you are lucky that the author of the parser encapsulated his code quite neatly. You will not even have to touch his parser code (which is in parse.hpp and parse.cpp). As the parser analyzes the input string, it needs to call various operations to record the nested tree structure that it is finding in the s-expression. The author wisely encapsulated all these operations in a separate header file cons.hpp that keeps all the concrete tree implementation details hidden from the parser:

Cell* make_int(const int i) returns a newly constructed int cell, i.e., a leaf node initialized to hold the int value i
Cell* make_double(const double d) returns a newly constructed double cell, i.e., a leaf node initialized to hold the double value d
Cell* make_string(const char* const s) returns a newly constructed string cell, i.e., a leaf node initialized to hold the string value s
Cell* make_symbol(const char* const s) returns a newly constructed symbol cell, i.e., a leaf node initialized to hold the symbol name s
Cell* cons(Cell* const my_car, Cell* const my_cdr) returns a newly constructed cons cell, i.e., an internal node initialized to hold the two pointers my_car and my_cdr

bool intp(Cell* const c) returns true iff the cell pointed to by c is an int cell
bool doublep(Cell* const c) returns true iff the cell pointed to by c is a double cell
bool stringp(Cell* const c) returns true iff the cell pointed to by c is a string cell
bool symbolp(Cell* const c) returns true iff the cell pointed to by c is a symbol cell
bool listp(Cell* const c) returns true iff the cell pointed to by c is a cons cell

int get_int(Cell* const c) returns the value in the int cell pointed to by c (error if c is not an int cell)
double get_double(Cell* const c) returns the value in the double cell pointed to by c (error if c is not a double cell)
string get_string(Cell* const c) returns the value in the string cell pointed to by c (error if c is not a string cell)
string symbol_to_string(Cell* const c) returns the symbol name in the symbol cell pointed to by c (error if c is not a symbol cell)
Cell* car(Cell* const c) returns the car pointer of the cons cell pointed to by c (error if c is not a cons cell)
Cell* cdr(Cell* const c) returns the cdr pointer of the cons cell pointed to by c (error if c is not a cons cell)

ostream& operator<<(ostream& os, Cell* const c) prints the cons cell pointed to by c to the output stream os, using s-expression notation

Notice that even though these are just ordinary functions (not member functions of a class), nevertheless this set of functions constitutes an interface for an abstract data type; we'll discuss more background on this particular ADT below. The way that cons.hpp specifies the ADT interface is nice because it makes very few assumptions about how the concrete implementation might be done. (Good object-oriented programming style does not necessarily require using classes! For example, see Scott Myer's article "<a href="http://www.ddj.com/dept/cpp/184401197">How Non-Member Functions Improve Encapsulation".)

The file cons.hpp that came with the parser just connects all the functions to a dummy example implementation of the ADT that makes use of a super-simple, incomplete Cell class probably written by someone else (found in Cell.hpp and Cell.cpp). As mentioned earlier, this dummy example implementation isn't actually capable of building a tree data structure with pointers and nodes. Instead, it just records the sub-trees using a string encoding, rather than a tree data structure. Most of the functions needed by cons.hpp aren't even working yet in the dummy Cell class, since the basic parser can be run without the accessor functions intp, doublep, stringp, symbolp, listp, int, double, string, symbol_to_string, car and cdr. You can build trees without using these accessor functions, although it's not terribly useful without the accessor functions since nobody can access the tree you've built. (And in fact, the string encoding that's used is actually just s-expression notation again! Of course this circular set-up is not terribly useful, either; it's just a partial example to show you how to work with the cons.hpp ADT interface by connecting it to a concrete implementation in Cell.hpp.)

Adapting the parser to this assignment

You realize that the cleanest and easiest way to proceed is to leave cons.hpp exactly the way it is. That way, you only need to replace the Cell with your own implementation.

You will stick exactly to the existing interface of the Cell class when implementing your own version of Cell; you won't delete or rename any functions or member functions, to avoid breaking the parser. (However, you might wish to add some new functions and member functions, to support manipulating and accessing the tree data structure you build.)

Looking at the public interface specified by the Cell class in Cell.hpp, you see there are a few constructors that the parser's cons.hpp will expect your own implementation to provide:

Cell(const int i) constructs an int cell, i.e., a leaf node that holds the int value i
Cell(const double d) constructs a double cell, i.e., a leaf node that holds a double value d
Cell(const char* const s) constructs a string cell, i.e., a leaf node that holds the string value s
Cell(bool is_symbol, const char* const s) constructs a symbol cell, i.e., a leaf node that holds the symbol name s
Cell(Cell* const my_car, Cell* const my_cdr) constructs a cons cell, i.e., an internal node that holds two pointers my_car and my_cdr

Other than these constructors, you'll need to implement these public member functions:

bool is_int() const returns true iff this is an int cell
bool is_double() const returns true iff this is a double cell
bool is_string() const returns true iff this is a string cell
bool is_symbol() const returns true iff this is a symbol cell
bool is_cons() const returns true iff this is a cons cell

int get_int() const returns the value in this int cell (error if this is not an int cell)
double get_double() const returns the value in this double cell (error if this is not a double cell)
string get_string() const returns the value in this string cell (error if this is not a string cell)
string get_symbol() const returns the symbol name in this symbol cell (error if this is not a symbol cell)
Cell* get_car() const returns the car pointer of this cons cell (error if this is not a cons cell)
Cell* get_cdr() const returns the cdr pointer of this cons cell (error if this is not a cons cell)

void print(ostream& os) const prints the subtree rooted at this cell, in s-expression notation

Some background on the cons list ADT

The abstract data type (ADT) that the parser author chose to use is a traditional cons list ADT dating back to around 1958. This ADT only requires a nil or NULL constant representing an empty list (which can be written () using s-expression notation), plus only three essential core operations cons, car and cdr:

cons constructs a new list with a given car value and cdr list. For example:
- cons(1, nil) returns (1)
- cons(1, cons(2, nil)) returns (1 2)
- cons(1, cons(2, cons(3, nil))) returns (1 2 3)
- cons(cons(1, cons(2, nil)), cons(3, nil)) returns ((1 2) 3)
car gives the value of the head node of the list. For example:
- car(cons(1, nil)) returns 1
- car(cons(1, cons(2, nil))) returns (1)
- car(cons(cons(1, cons(2, nil)), cons(3, nil))) returns (1 2)
cdr (pronounced "could-er" or "cudder") gives the list after the head node. For example:
- cdr(cons(1, cons(2, cons(3, nil)))) returns (2 3)
- cdr(cons(cons(1, cons(2, nil)), cons(3, nil))) returns (3)

Building up large trees this way can get a bit tedious, so in practice typically people also define equivalent shorthand functions. Mathematically speaking, such functions are not really necessary, and exist only for the sake of convenience. For example,

list builds a new list containing the operands. For example:
- list(1) is shorthand for cons(1, nil) and returns (1)
- list(1, 2, 3) is shorthand for cons(1, cons(2, cons(3, nil))) and returns (1 2 3)
- list(list(1, 2), 3) is shorthand for cons(cons(1, cons(2, nil)), cons(3, nil)) and returns ((1 2) 3)

To keep your life simple, you are not required to implement list even though we'll use it in written examples later.

Implementing the cons list ADT

How do you implement the cons list ADT? In fact there exist many different implement approaches.

You've decided to use the most traditional approach, which is to use a singly linked list data structure. This is the most natural and intuitively obvious way to implement the cons list ADT. As explained in Wikipedia:

Specifically, each node in the linked list is a cons cell, also called a pair. As the name pair implies, a cons cell consists of two values: the first one is the car, and the second is the cdr. For list(1, 2, 3), there are three cons cells, or pairs. The first cons cell has the number 1 in the first slot, and a pointer to the second cons cell in the second. The second cons cell has the number 2 in the first slot, and a pointer to the third cons cell in the second slot. The third cons cell has the number 3 in the first slot and a NULL constant in the second slot. The NULL constant is usually represented by (). The cons function constructs these cons cells, which is why cons(1, list(2, 3)) gives the list (1 2 3).

Here's a picture of the linked list to hold ((1 2) 3), using what's known as box-and-pointer notation. Notice how it naturally matches the exact structure of cons(cons(1, cons(2, nil)), cons(3, nil)). There's one box for each cons, and two nil pointers.

In this assignment, cells can hold five kinds of values: int, double, string, symbol, and cons. You need a way to store any of these types of values in your cells. How do you do this efficiently?

The obvious naive way is just to have five data members:

class Cell {
...
  int int_m;
  double double_m; 
  char* string_m;
  char* symbol_m;
  ConsPair conspair_m;
};

But this is a terribly inefficient waste of memory. Any instance of a Cell will only be using one of these data members. So a better way is to use a union:

class Cell {
...
  union {
    int int_m;
    double double_m; 
    char* string_m;
    char* symbol_m;
    ConsPair conspair_m;
  };
};

Unions exist in both C and C++. A union is sort of like a struct, except that a struct provides memory space to simultaneously store all its members, whereas a union provides memory space to store only one of its members at a time. So this way, we won't waste memory space.

The only problem is: when you're using a union, how do you know which of the data members is the valid one? C++ won't tell you; it is your program's own responsibility to keep track of this. If you don't, you can easily create horrible bugs. For example:

Cell* c = new Cell;
c->double_m = 2.718;
cout << c->int_m; // whoops! will print out a garbage int
cout << c->string_m; // whoops! will either seg fault or do wildly unpredictable things

To keep track of which member in the union is currently valid, you should use what's known as a tagged union or discriminated union approach:

class Cell {
...
  enum TypeTag {type_int, type_double, type_string, type_symbol,
                type_conspair};
  TypeTag tag_m;
  union {
    int int_m;
    double double_m; 
    char* string_m;
    char* symbol_m;
    ConsPair conspair_m;
  };
};

Now you can write your code so that you make sure that the tag_m always holds an up-to-date value indicating which of the union members is valid at the current time.

Evaluating the expression tree

After you've gotten the parser to store the expression into your real tree data structure (your concrete implementation of the cons list ADT), now you need to write a function that evaluates it.

To do this, you merely need to perform a depth-first traversal starting from the root. As you complete each subtree, the evaluator should return the value resulting from evaluating the sub-expression stored in that subtree. By the time you finish the entire traversal, you will be back at the root---so the evaluator should return the value resulting from evaluating the entire expression.

The problem is: what is the type that should be returned by your evaluator function?

It would seem that this depends on the type of the operands---remember, from the introduction:

But C++ won't let us write a function that returns different types depending on the result!

To solve this problem, we'll re-use your Cell implementation. Remember, Cell already gives us a way to store different types of values in the same class type, by using a tagged union. So your function evaluator should return a Cell* that points to a Cell holding the value resulting from evaluating the expression:

Cell* eval(Cell* const c) returns the value resulting from evaluating the expression tree whose root is pointed to by c (error if c is not a well-formed expression)

You'll find this declaration in the header file eval.hpp, which you should not alter. Place your implementation for the evaluator in your own eval.cpp file, replacing the dummy eval.cpp that came with the parser.

When you're implementing the eval function, think carefully about all the different cases (what type are the operands?) The + and * functions can be applied to any combination of the numeric int and double types, but what is the return type? Only the + function can be applied to the string type. No function can be applied to the symbol type. The floor function can be applied to double types (and perhaps int types, but meaninglessly).

Putting it all together

Notice that the dummy example implementation of eval.cpp doesn't do anything. It just returns the same Cell* that was passed to it. That's why running the main program prints out the same s-expression that you give it as input. After you've replaced eval.cpp with your real evaluator implementation, running the same main program will instead print out the result of evaluation---which is exactly what we want!

So you don't need to touch main.cpp at all.

Important reminders

You must follow the design approach outlined in this document. Do not just implement the required functionality using a different design.

Do not use virtual functions or templates in this assignment. This assignment is about static OO support in C++. (You'll get a chance to use dynamic OO polymorphism and templates in the following assignments...)

Do not edit the files parse.hpp, parse.cpp, cons.hpp, eval.hpp, or main.cpp. The programming assignments are mini-exercises in how multiple programmers are supposed to interact and communicate in the real world; these files are owned and maintained by the other author(s).

You will need to add things to Cell.hpp, but do not delete anything from it. Replace the files Cell.cpp and eval.cpp with your own implementations. Depending on your approach, you may or may not also wish to add more files.

Depending on your approach, you may or may not need to change the Makefile. Whether you changed it or not, always make sure you include whatever Makefile is needed to build your program, when you submit assignment. Otherwise, the graders cannot build your program.

You must write the final version of the program on your own. Sophisticated plagiarism detection systems are in operation, and they are pretty good at catching copying! If you worked in study groups, you must also acknowledge your collaborators in the write-up for each problem, whether or not they are classmates. Other cases will be dealt with as plagiarism. Re-read the policy on the course home page, and note the University's tougher policy this year regarding cheating.

Your programming style (how clearly and how well you speak C++) is what will be graded. Correct functioning of your program is necessary but not sufficient!

Generated on Fri Apr 21 18:12:51 2006 for a1 by

1.4.6