Dataset: California House Pricing

This dataset was obtained from the sklearn library: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html

You can learn more details about the dataset from the user guide: https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset

This dataset was originally derived from the 1990 U.S. census, using one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

In this data file, each row represents 8 numeric house features and one target variable of one block group. The 8 numeric features are:

  • MedInc: median income in block group

  • HouseAge: median house age in block group

  • AveRooms: average number of rooms per household

  • AveBedrms: average number of bedrooms per household

  • Population: block group population

  • AveOccup: average number of household members

  • Latitude: block group latitude

  • Longitude: block group longitude

The target variable is the median house value for that block group in California districts, expressed in hundreds of thousands of dollars ($100,000).

File paths
data/houseprice
Parameters
{"name": "City",
 "feature_vars": [0, 1, 2, 3, 4, 5, 6],
 "target_var": 7,
 "training_fraction": 0.55,
 "seed": 22992}

Results without feature standardization

Task 2a

House_price ~ 0.256815 + 0.468333 * MedInc
R2: 0.5245453271610767
House_price ~ 1.796538 + 0.007218 * HouseAge
R2: 0.0082768255966853
House_price ~ 2.000760 + -0.000247 * AveRooms
R2: 7.596089473538292e-06
House_price ~ 2.016189 + -0.012535 * AveBedrms
R2: 0.0013149529543335925
House_price ~ 2.242869 + -0.000196 * Population
R2: 0.015960450171492835
House_price ~ 3.665072 + -0.591596 * AveOccup
R2: 0.08139631112797319

Task 2b

House_price ~ 1.312890 + 0.563430 * MedInc + 0.010258 * HouseAge + -0.191912 * AveRooms + 0.703607 * AveBedrms + -0.000035 * Population + -0.483610 * AveOccup
R2: 0.6442660594018964

Task 3

House_price ~ 1.810577 + 0.462964 * MedInc + -0.544676 * AveOccup
R2: 0.5934733350264498

Task 4

House_price ~ 0.256815 + 0.468333 * MedInc
R2: 0.5245453271610767
House_price ~ 1.810577 + 0.462964 * MedInc + -0.544676 * AveOccup
R2: 0.5934733350264498
House_price ~ 1.362221 + 0.476506 * MedInc + -0.544324 * AveOccup + 0.014141 * HouseAge
R2: 0.6248083107735434
House_price ~ 1.431483 + 0.483634 * MedInc + -0.558525 * AveOccup + 0.014286 * HouseAge + -0.009193 * AveRooms
R2: 0.6351946171488188
House_price ~ 1.280118 + 0.560112 * MedInc + -0.496483 * AveOccup + 0.011160 * HouseAge + -0.182609 * AveRooms + 0.668786 * AveBedrms
R2: 0.6439166154906262
House_price ~ 1.312890 + 0.563430 * MedInc + -0.483610 * AveOccup + 0.010258 * HouseAge + -0.191912 * AveRooms + 0.703607 * AveBedrms + -0.000035 * Population
R2: 0.6442660594018964

Task 5

House_price ~ 0.256815 + 0.468333 * MedInc
Training R2: 0.5245453271610767
Testing R2: 0.5751997747555246
House_price ~ 1.810577 + 0.462964 * MedInc + -0.544676 * AveOccup
Training R2: 0.5934733350264498
Testing R2: 0.5114716911354342
House_price ~ 1.362221 + 0.476506 * MedInc + -0.544324 * AveOccup + 0.014141 * HouseAge
Training R2: 0.6248083107735434
Testing R2: 0.5341716557982221
House_price ~ 1.431483 + 0.483634 * MedInc + -0.558525 * AveOccup + 0.014286 * HouseAge + -0.009193 * AveRooms
Training R2: 0.6351946171488188
Testing R2: 0.5287404918548989
House_price ~ 1.280118 + 0.560112 * MedInc + -0.496483 * AveOccup + 0.011160 * HouseAge + -0.182609 * AveRooms + 0.668786 * AveBedrms
Training R2: 0.6439166154906262
Testing R2: 0.5600979280274867
House_price ~ 1.312890 + 0.563430 * MedInc + -0.483610 * AveOccup + 0.010258 * HouseAge + -0.191912 * AveRooms + 0.703607 * AveBedrms + -0.000035 * Population
Training R2: 0.6442660594018964
Testing R2: 0.5631186114298431

Results with feature standardization

Task 2a

House_price ~ -0.010302 + 0.738645 * MedInc
R2: 0.5245453271610765
House_price ~ -0.019287 + 0.085336 * HouseAge
R2: 0.008276825596684634
House_price ~ -0.024204 + -0.001938 * AveRooms
R2: 7.596089472983181e-06
House_price ~ -0.022961 + -0.025391 * AveBedrms
R2: 0.0013149529543334815
House_price ~ -0.030261 + -0.138358 * Population
R2: 0.01596045017149228
House_price ~ -0.059632 + -0.433471 * AveOccup
R2: 0.08139631112797252

Task 2b

House_price ~ -0.016991 + 0.888631 * MedInc + 0.121290 * HouseAge + -1.508016 * AveRooms + 1.425207 * AveBedrms + -0.024307 * Population + -0.354347 * AveOccup
R2: 0.6442660594018963

Task 3

House_price ~ -0.042974 + 0.730178 * MedInc + -0.399091 * AveOccup
R2: 0.5934733350264496

Task 4

House_price ~ -0.010302 + 0.738645 * MedInc
R2: 0.5245453271610765
House_price ~ -0.042974 + 0.730178 * MedInc + -0.399091 * AveOccup
R2: 0.5934733350264496
House_price ~ -0.032689 + 0.751535 * MedInc + -0.398833 * AveOccup + 0.167199 * HouseAge
R2: 0.6248083107735432
House_price ~ -0.028939 + 0.762777 * MedInc + -0.409239 * AveOccup + 0.168906 * HouseAge + -0.072234 * AveRooms
R2: 0.6351946171488186
House_price ~ -0.016750 + 0.883397 * MedInc + -0.363780 * AveOccup + 0.131949 * HouseAge + -1.434917 * AveRooms + 1.354675 * AveBedrms
R2: 0.643916615490626
House_price ~ -0.016991 + 0.888631 * MedInc + -0.354347 * AveOccup + 0.121290 * HouseAge + -1.508016 * AveRooms + 1.425207 * AveBedrms + -0.024307 * Population
R2: 0.6442660594018962

Task 5

House_price ~ -0.010302 + 0.738645 * MedInc
Training R2: 0.5245453271610765
Testing R2: 0.5751997747555249
House_price ~ -0.042974 + 0.730178 * MedInc + -0.399091 * AveOccup
Training R2: 0.5934733350264496
Testing R2: 0.5114716911354338
House_price ~ -0.032689 + 0.751535 * MedInc + -0.398833 * AveOccup + 0.167199 * HouseAge
Training R2: 0.6248083107735432
Testing R2: 0.5341716557982211
House_price ~ -0.028939 + 0.762777 * MedInc + -0.409239 * AveOccup + 0.168906 * HouseAge + -0.072234 * AveRooms
Training R2: 0.6351946171488186
Testing R2: 0.5287404918548977
House_price ~ -0.016750 + 0.883397 * MedInc + -0.363780 * AveOccup + 0.131949 * HouseAge + -1.434917 * AveRooms + 1.354675 * AveBedrms
Training R2: 0.643916615490626
Testing R2: 0.5600979280274874
House_price ~ -0.016991 + 0.888631 * MedInc + -0.354347 * AveOccup + 0.121290 * HouseAge + -1.508016 * AveRooms + 1.425207 * AveBedrms + -0.024307 * Population
Training R2: 0.6442660594018962
Testing R2: 0.5631186114298325