[CS Dept., U Chicago] Courses


CMSC 16100: Honors Introduction to Computer Programming I (Autumn 2003)

Homework

Assignment #4: Abstraction barriers in the PAC GUI



Assignment #4 is due at 11:59 PM on Wednesday 29 October, 2003.

This assignment is an exercise in the definition and use of data abstraction.

Section 2.1.2 of S&IoCP discusses data abstraction and its key component, the abstraction barrier. Data abstraction, does not provide any new power for computing particular functions. But it provides a crucial organizational principle, without which moderate to large pieces of software can fall into chaos. The basic idea of data abstraction is to divide the conceptual portions of a piece of software into independent domains, separated by abstraction barriers, so that we may work on any one of the domains without knowing the internal details of the others. This allows a team to divide the work on a large project among the different team members. It also allows an individual programmer to concentrate on one portion of a project at a time. For practical purposes, when dealing with complex piles of information such as programs, a single person on subsequent days of the week is a lot like a bunch of different people.

Last year's assignment #4 exercised abstraction barriers in a slightly more sophisticated fashion than this year's assignment.

0. Organization of the PAC

Look at the slightly censored code of my pretty bad prototype of the Perfectly Accurate Calculator. Make sure you are looking at the new version 1,mod1.2. I changed quite a bit from the version 1,mod1.1 that I posted with Assignment #3. As well as censoring some portions that I want you to figure out, I commented out some of my partial code, and replaced it with trivial function definitions to get off the ground.

Don't try to understand all of the code. But notice that the code has two key sections: Model, and View, with an abstraction barrier between them. The barrier is given by the list of functions starting with M-, and the list of functions starting with V-:

Model functions
Formula datatype
Operators
'+, '-, '*, '/, 'expt, 'fib
Formula constructors
  • M-empty_formula
  • M-number->formula
  • M-apply_operator_to_formulas
Formula tests and selectors
  • M-number_formula?
  • M-formula_number
  • M-formula_operator
  • M-formula_argument_list
Other Formula operations
  • M-evaluate_formula
Value datatype
Value constructor
  • M-number->value
Value tests and selectors
  • M-number_value?
  • M-value_number
Model state
  • M-set_input_formula
  • M-input_formula
  • M-set_output_formula
  • M-output_formula
View Functions

The Model component of the PAC is responsible for all of the manipulations of the basic concepts that the PAC works with---numerical values, numerical formulae, and states of operation with those. The View component is responsible for all of the interaction with a user. Definitions outside of the Model component (particularly, those in the View component) may only interact with a formula or model state through those functions from the Model component beginning with M-. Similarly, definitions outside of the View component may interact with the user interface through those functions from the View component beginning with V-. These rules establish a sort of abstraction barrier between the definitions inside each main component, and the rest of the PAC definitions.

Data abstraction is one of the key ideas in object-oriented programming. Some programming languages have specific constructs to indicate and enforce abstraction barriers. Abstraction barriers in Scheme are essentially voluntary. There are extra Scheme libraries that allow us to define and enforce abstraction barriers, but we will stick to the voluntary approach for now.

Within the Model component, there are subdivisions into the formula datatype, the value datatype, and the Model state. The formula and value datatypes have no need to use functions defined in the Model state subcomponent. The Model state subcomponent may only use the listed interface functions from the formula and value datatypes. It seemed excessive to add another mark for those functions, but the abstraction barriers around the two subcomponents are also important to keep the PAC code well organized.

The value datatype is so trivial that it's a bit annoying. But it will become much more sophisticated in a few weeks when we implement exact real values. By carrying along the abstraction barrier for the trivial value datatype, we'll be ready to plug in more sophisticated implementations of values as we cook them up.

I find that the proliferation of names in code like this makes it impossible to be either perfectly systematic, or enjoyably readable, much less both. I try to pick a middle course that's not too horrific, and to mostly go toward systematic. So I marked all the interface functions for the two major components, Model and View with M- and V-, respectively. But I didn't mark the interface components for the datatypes within the model. In the abstract datatypes, I gave constructors and other operations names beginning with the name of the datatype, I gave tests names ending in a question mark, and I gave selectors names ending in the name of the datatype. That's one of several conventions for naming functions in an abstract datatype. None of them seems to be really enjoyable. This one makes your eyes fuzz over distinguishing number->formula from formula_number, but we'll just squint and bear it.

Every abstraction barrier is an organizational aid, but also a constraint on further development of software (the two qualities are inextricably connected). As long as further development fits the constraints imposed by our abstraction barriers, the barriers simplify our thinking and help avoid lots of mistakes. Occasionally, we must change the barriers. That is a much more costly step than working within a single component. It may require changes to all software components that use the functions in the barrier that we change.

The key to testing an abstract datatype is to test the correct relationship between constructors and selectors and the other functions if any. You may look directly at the list representations to get an idea what's up. But as the datatypes become more sophisticated, that may be difficult or impossible, and you need test suites based on the relationships between functions.

Run the PAC, and just notice what the buttons do.

  1. Type a concrete syntax of an expression in the top window.

  2. Push the "Display Input" button to parse the concrete syntax into an abstract syntax, and display that abstract syntax in the middle window.

  3. Push the "Evaluate" button to evaluate the abstract syntax, and display the result in the bottom window.

A final product probably won't have a button to incorporate the input. That will more likely be done automatically on entry. Probably either the concrete syntax as entered by the user will be the only input display, or the display of abstract syntax will be substantially different, perhaps a graphical display instead of textual. The current layout makes it easier to observe what's going on. The current version shows the abstract syntax in a more compact form that you see in the DrScheme interaction window or test suite window. Instead of (list '+ 2 3) you just see (+ 2 3).

1. Incorporate your parser in the PAC code

From now on, you should routinely save every meaningful intermediate step in your work. I will stop mentioning that after one more time.

Replace my incredibly stupid parser, which just assumes that the only possible operation is addition, with your much better recursive descent parser. If you did sufficient testing in Assignment #3, there's not much point in running a test suite for this step. Observe the behavior of the PAC GUI with your new parser, as a sanity check that you put your parser in correctly. But the GUI is not good for much serious testing. Most serious testing works much better with direct calls to the defined functions. Since you haven't changed those, there's nothing in particular to test.

Some of the structure of the solution is already there, in commented sections. You can remove the comment marks, and then substitute in your code, or just use the commented parts as guideposts.

With your improved parser, the PAC GUI can evaluate any formula with addition, subtraction, multiplication, division, and exponentiation. But it can't deal with the Fibonacci function. That's not the fault of your parser. It's the fault of the evaluator, which has no interpretation of 'fib. This sort of skew between components is part of the reason that you should test through internal function calls, rather than through the GUI.

2. Adjust your parser to the abstraction barrier

Your work in step 1 violates the abstraction barrier around the formula datatype. When you construct an abstract syntax in the form (list <operator> <operand> <operand>) you use internal knowledge of the representation of a formula, which will change in later steps.

Fix your parser code to respect the abstraction barrier, by having it use the functions M-apply_operator_to_formulas, etc., instead of list. Rerun your test suite from Assignment #3 on the new code. It should produce exactly the same results. You should catch any little typos you made replacing the direct construction of lists with the abstract formula constructors.

Even though this step is conceptually trivial, I did it one parse_<nt> function at a time, testing in between.

3. Try a variant representation of a formula

Make sure that you saved your work from step 2. We will back out of step 3, and pick up the step 2 version in step 4.

As a quick exercise to demonstrate the independence of code within the formula datatype and code outside, we'll try an easy variation on the formula datatype. Instead of representing the application of a binary operator to its arguments in the form


(list <operator> <operand1> <operand2>)

use the alternate form


(cons (cons <operator> <operand1>) <operand2>)

Similarly, the three-argument operator fib applies in the form:


(cons (cons (cons <operator> <operand1>) <operand2>) <operand3>)

These forms are not well-formed Scheme lists, but they use the bottom level implementation a tiny bit more efficiently. They actually resemble the structure of the notation for function application in ML and Haskell.

Some of the constructors and selectors become recursive in this representation.

I corrected the following paragraph on 27 October.

You should be able to switch back and forth between the two representations of the formula datatype with no observable change in the behavior of the parser. The evaluator will fail, because it does not respect the abstraction barrier. If you have the energy, you can try the switch again with your evaluator in play.

4. Write your own evaluator

Back up to your version from step 2, which you saved. Replace my definition of M-evaluate_formula with your own recursive definition of the evaluation of a formula (in abstract syntax form) involving, addition, multiplication, subtraction, division, exponentiation, and the Fibonacci function. Use your best code for fib from Assignment #1.

My old definition of M-evaluate_formula is tiny, but it's still a tiny bit interesting. (eval fmla_to_eval) actually calls Scheme's very own evaluator, the same one that is evaluating all of our work, including the PAC GUI as a whole.

The availability of the whole programming language within one of its own function calls is a very interesting property of Scheme. Back in the old days, a lot of people found this notion of a program invoking the interpreter for its own programming language almost mystical. They called the LISP and Scheme implementations metacircular interpreters. I think that's a tad pretentious, but the feature is rather nice in a number of ways, both practical and conceptual.

The real reason that self-invocation is hard, and not usually available in other programming languages, has more to do with their complicated syntax than it does with the sophistication of their evaluation mechanisms. It's mainly the very simple syntax of LISP/Scheme that makes it relatively easy to incorporate the eval function.

The availability of eval let me create a prototype calculator with a fairly powerful evaluator in a few keystrokes. It's a very big help at the beginning of a project to have a lot of the work already done for you. So why are we replacing my M-evaluate_formula based on Scheme's eval? The only extra power that we are adding is the power to evaluate applications of the Fibonacci function. Although fib isn't predefined in Scheme, I'm sure I could have found a way to invoke Scheme's eval with some additional definitions. There are two remaining reasons to replace the easy (because somebody else did the work) eval implementation:

  1. If we eventually want to market our PAC as a stand-alone application, the incorporation of the whole Scheme implementation blows up the size of the code immensely.

  2. When we get to the exact real calculations, our form of evaluation will be substantially different from Scheme's, and we won't gain much by invoking eval---it might even complicate the structure of our solution.

Even when we replace them later, it is a lot easier to construct software when we have powerful starting points, such as eval, written by other people that allow us to get the structure of our code right before we have to fill in all of the details.

5. Use a higher-order generic evaluator

Rewrite the evaluator using my higher-order generic evaluator, which you'll find hidden in commented sections of the code. I also sketched the structure of the solution. Uncomment my stuff, and fill in where I left ellipses (...).

Although the whole computational structure of the solution is already there in my generic_evaluate_formula, I expect that many of you will find it harder to fill in the right arguments to generic_evaluate_formula for this step than to write your own recursive code in the previous step. As you get the hang of higher-order programming, the relative difficulty will shift in favor of using generic functions.

The reorganization of evaluation to use the generic evaluator also condenses the information about the formula and value datatypes and the association of constants and operators with their meanings, so that it will be easier to update things later on.

One little aspect of the definition of M-evaluate_formula didn't come out the way I wanted it to. In principle, we should have written a definition of the form


(define M-evaluate_formula

  ...

  ((generic_evaluate_formula ... ) ...)
  )

instead of


(define (M-evaluate_formula fmla)

  ...

  (((generic_evaluate_formula ... ) ...) fmla)
  )

The mention of the argument fmla is completely redundant, and disguises the structure of the higher-order definition a bit. I got stuck with this form because of a syntactic glitch in Scheme that I consider to be a design error. Scheme allows local defines at the beginning of a definition with function arguments, but not at the beginning of a definition with no function arguments. So, to use the first form of definition, I would have to change the style of presenting the local definitions following (define (M-evaluate_formula fmla) ..., which I judged to be too confusing at this stage.

6. Correctness, what to hand in

In assignments 1-3, if you followed the given structure pretty closely, and your function definitions computed the right results, you were probably doing very well. In this assignment, respecting the abstraction barriers may be more of a problem than getting the right results. Since Scheme does not enforce abstraction barriers formally, it is easy to accidentally refer to a specific implementation where you shouldn't, and still have a working program. The error bites you later when you change the implementation. For full credit, you must respect the abstraction barriers. Read your code carefully to ensure that you do. The trivial functions, such as M-number->value, are the easiest to forget. They might not be trivial in a later version.

You may be able to catch some breaches of the abstraction barrier by repeating the substitution of a different representation of formulas from step 3.

Hand in the following:

  1. A file named pac.scm containing the final versions of all the function definitions after step 5 above.

  2. A file named pac-tst.scm containing a test suite demonstrating the behavior of the functions you defined.

  3. A file named pac_varfmla.scm containing a PAC using your variant version of the formula datatype from step 3. If you repeated the variation at a later step, you may hand in that version.

  4. A file named pac_varfmla-tst.scm containing a test suite demonstrating your variant of the formula datatype. If you test the formula functions thoroughly, you don't need to test anything on the other side of the abstraction barrier.

  5. Optionally, if you have anything to report (including acknowledgments of where you got ideas), a file named comments.txt with your comments.

If you do not finish, you will get the best partial credit by handing in a complete working result for step 1, 2, or 4 with appropriate test suite. There will be much less partial credit for a dysfunctional and/or untested attempt at a step.


Valid HTML 4.0!


Last modified: Mon Oct 27 22:28:44 CST 2003