Due: Tuesday, Oct 19, end of your lab session

Lab4: Lists of Numbers, File I/O

The primary goal of this week's exercise is for you to gain experience writing functions that consume lists of numbers. This afternoon and in this week's homework, you will write functions to compute some common statistical measures of a dataset, such as the mean, variance and standard deviation. A secondary goal is to show you how to read text files from within the DrRacket environment. Loading and saving of files is known as I/O, for Input/Output. Using I/O, you will use your statistics functions to process (relatively) large datasets: namely, historical data about Major League Baseball players.


Preliminaries

Set the language level to Beginning Student with List Abbreviations (How?)

In order to get started with I/O operations, we will need to understand some basics about working with files in general. Since it would be inconvenient to have all files located in a single place, the file system has a hierarchical organization in which files can be grouped together in folders and folders can be nested to contain other folders.

Open the Finder application located on your system's Dock. Navigate to your Desktop folder and use the File -> New Folder menu item to create a new folder for this lab (say, lab4). At this time, you should also see the lab4 folder appear on your desktop. In the Finder application, or on the Desktop, right click on the lab4 folder and select the Get Info menu. This will show you the full path of this folder. You may also enable full path viewing in the Finder application by selecting the Show Path Bar item from the View menu. Notice how the folder above lab4 is the Desktop folder. For this lab, you will need to determine a full path when reading an existing file or when specifying the location for a file to be written.

Download the following three files into your lab folder.

Add the cs105_io.ss teachpack (How?). This teachpack provides two functions:

;; read-from-file: string -> list-of-num
;; write-to-file: list-of-num string -> boolean

The function read-from-file takes one argument: a string which is the name of a data file. That file is read into a list of number, one number at a time. The function write-to-file takes two arguments: the first is a list of numbers to be written to a file and the second is the name of the file to be written. The list provided is written to a file one by one and a success/failure message is produced as a response. If the file to be written already exists, an error is raised. For example, to write the following list

(define mywritelist (list (sqr 9) (sqrt 9) 9))
(write-to-file mywritelist
   "/nfs/harper/hc1/your-username/Desktop/lab4/mylist.txt")

will write this list to the lab4 folder. Navigate to your lab4 folder and click the mylist.txt file to view its contents. After right clicking, use the Get Info menu item to view the full path to the mylist.txt file and notice how it is identical to the path we specified when writing the file.

To read back in the contents of this file:

(define myreadlist   
  (read-from-file "/nfs/harper/hc1/your-username/Desktop/lab4/mylist.txt"))

Type myreadlist in your evaluations window to see that the list we read is identical to mywritelist - the list we wrote.


Part 1

At the top of the definitions window, provide a data definition for lists of numbers (list-of-num).

Write Racket definitions for count (which should give the number of elements in a list), sum and mean (which should be written in terms of functions count and sum). For empty lists, the counting function should produce 0, while the sum and mean functions should raise errors stating we cannot define these operations on empty lists. Provide contracts, purposes and test cases for each function.

To raise an error we can use DrRacket's error function:

(error 'myfunction "distance cannot be negative")

This will produce a message of the form:

myfunction: distance cannot be negative


Part 2

Once all of your functions are working properly, use them, together with the read-from-file teachpack function, to analyze the given baseball data. Your results should correspond to the numbers in the Benchmarks section below. To test your results you can rely on the check-within function provided by Racket. This function is similar to check-expect, except it adds a third argument which allows a test to pass if the actual result is within that amount of the expected result. For example, the first test succeeds while the second fails:

(check-within (+ 2 2.5) 4 0.5)
(check-within (+ 2 2.5) 4 0.3)

Test your results to be within 0.5 of the benchmark results. The tests on the W50s and W90s data should acoompany the initial check-expect tests originally written for the count, sum and mean functions.

You will need to feed full pathnames to read-from-file when you call it. First, determine the full pathnames to your files by using the method in the preliminaries. This pathname should look like:

/nfs/harper/hc1/your-username/Desktop/lab4/W50s.txt

Next, we will write these results to files W50sresult.txt and W90sresult.txt. To achieve this we could have:

(write-to-file
  (list
  (count (read-from-file "/nfs/harper/hc1/your-username/Desktop/lab4/W50s.txt"))
  (sum (read-from-file "/nfs/harper/hc1/your-username/Desktop/lab4/W50s.txt"))
  (mean (read-from-file "/nfs/harper/hc1/your-username/Desktop/lab4/W50s.txt"))

  )
  "/nfs/harper/hc1/your-username/Desktop/lab4/W50sresult.txt")

(write-to-file
  (list
  (count (read-from-file "/nfs/harper/hc1/your-username/Desktop/lab4/W90s.txt"))
  (sum (read-from-file "/nfs/harper/hc1/your-username/Desktop/lab4/W90s.txt"))
  (mean (read-from-file "/nfs/harper/hc1/your-username/Desktop/lab4/W90s.txt"))

  )
  "/nfs/harper/hc1/your-username/Desktop/lab4/W90sresult.txt")

This is a bad idea. Whenever we are copying code to be used in different contexts we are not being efficient, increase the chance of introducing errors and make it very difficult for this code to evolve in the future. Imagine requiring a small change to this code! This means that we have to locate all copies (which are logically the same, but slighly different nonetheless) and change them independently. Instead, we should abstract this functionality and parametrize it such that we can easily reuse and evolve this code. In our case, we can define the function compute-data which consumes two arguments: the first is the name of an input file and the second is the name of the output file (both specified as full paths)

This function can then be conveniently called:

(compute-data
  "/nfs/harper/hc1/your-username/Desktop/lab4/W50s.txt"
  "/nfs/harper/hc1/your-username/Desktop/lab4/W50sresult.txt")
(compute-data
  "/nfs/harper/hc1/your-username/Desktop/lab4/W90s.txt"
  "/nfs/harper/hc1/your-username/Desktop/lab4/W90sresult.txt")

Write function compute-data. Include a contract and purpose. Test it by evaluating it as above and checking that the results are as expected. Do not worry about reading in the file multiple times. Later in this class we will learn how to do this more efficiently.

[Note: Parts 1 and 2 should be finished by the end of your lab session. You should also get started on Part 2. Turn in all your work at the end of the lab session according to these submission instructions.]


Part 3

[Note: For full credit, you must start working on Part 3 of the lab. You are not required to finish by the end of the lab session. You will have a chance to finish Part 3 in this week's homework.]

There are different ways of defining variance and standard deviation in the realm of statistics. We will adhere to the following definitions.

Write Racket definitions for functions variance and stdev (standard deviation). Provide contracts, purposes and test cases for each. Provide additional tests using check-within to test that the results are within 0.5 of the benchmarks below.


Benchmarks

To make sure your numbers are not way off base, here are a few ballpark figures to check against your results:


Hand in your work

To receive full credit, you should complete Part 1 and Part 2 and have made a good start on Part 3 of the lab. As usual, the last part of this lab (Part 3) will be included in this week's homework.
Submit all your work (on all 3 parts of the lab) according to the submission instructions.

Save all your files. You will need some of them for your next lab.


Material designed by Adam Shaw and Gabriela Turcu.