You will need three data definitions in order to do this exercise:
datum, dataset and
datum will
consist simply of an x and a y. A dataset will consist of
a nonempty list of datums. A will
consist of a slope and a y-intercept (m and
b from the
familiar linear equation form
A linear regression analysis is applied to a set of data; its result is the line that mathematically best fits the data as it would appear on a two-dimensional plot. Operationally, with linear regression analysis, you put a dataset in and get a linear equation out. Your goal is to provide a function linreg whose contract is
In the following discussion, assume that n is the number of datums in the given dataset.
The best-fit slope for the linear model of a set of data is given by
The y-intercept for the linear model of a set of data is given by
The two formulas above can each be abstracted as functions, with the following contracts:
Write data definitions for datum, dataset and lineq. Recall that:
Review the mathematical formulas for the best fit slope and y-intercept. Decide what helper functions you would need to implement the final slope and intercept functions. The helper
functions you will need to write in order to get linreg
to work will mostly be structurally recursive functions that follow
the How to Design Programs template closely.
Be sure you surround all functions by contracts,
purposes and tests.
[Hint: All your helper functions should take a dataset as input, not a list of x, y values. For example, you will need a function to sum all x values :
;; sumx : dataset -> num
[Note: Part 1 (data definitions and definitions for all your helper functions) should be finished by the end of your lab session. You should also get started on Part 2. Turn in all your work at the end of the lab session according to these submission instructions.]
[Note: For full credit, you must start working on Part 2 of the lab. You are not required to finish by the end of the lab session, but should submit all your work. You will have a chance to finish Part 2 on this week's homework.]
Write functions slope, intercept and linreg.For testing, we will use a small data set of 17 observations of boiling temperatures of water (measured in F degrees - y values) at different barometric pressures (measures in inches of mercury - x values).
(list
(make-datum 20.79 194.5)
(make-datum 20.79 194.3)
(make-datum 22.4 197.9)
(make-datum 22.67 198.4)
(make-datum 23.15 199.4)
(make-datum 23.35 199.9)
(make-datum 23.89 200.9)
(make-datum 23.99 201.1)
(make-datum 24.02 201.4)
(make-datum 24.01 201.3)
(make-datum 25.14 203.6)
(make-datum 26.57 204.6)
(make-datum 28.49 209.5)
(make-datum 27.76 208.6)
(make-datum 29.04 210.7)
(make-datum 29.88 211.9)
(make-datum 30.06 212.2))
Test your functions on our sample data set and compare the results to results obtained using Excel. Download the dataset in Excel format from here. You will not need to submit the Excel file. Make sure your functions include a contract, a purpose statement and tests (using check-within).
[Note: If you have gotten this far, great work! Part 3 will not count towards your grade for Lab 5, but will be part of this week's homework.]
A full linear regression analysis includes the computation of a linear correlation coefficient. This coefficient is usually referred to as r and is a measure, roughly speaking, of "how linear the data is." Typically a linear regression analysis provides the value r2, which inhabits the interval [0,1]. The closer r2 is to 1, the stronger the correlation between the data and the line modeling the data. An r2 close to 0 means that the analysis has calculated a linear equation that bears no meaningful relationship to the data at hand.
An analysis will be a struct
consisting of two parts: a lineq and
a num, the latter being the value of
r2.
Your goal is to write a
full linear regression analysis with the contract
The linear correlation coefficient r is given by the following:
Please feel free to include your code from last week to compute the standard deviation as part of the computation of r. Use Excel to verify your results for computing r2. As always, make sure you include contracts, purpose statements and tests for all the functions you write.
[Note: the formula above gives r; the full analysis returns r2 .]
To receive full credit, you should complete Part 1 and have made a good start on Part 2. This week's homework will include Part 2 and Part 3 of this lab.
Save all your files and submit all your work (on all 3 parts of the lab) according to the submission instructions.
Material designed by faculty at the Dwight-Englewood School in Englewood, NJ. including Adam Shaw. Modified by Gabriela Turcu.