This document discusses using decision trees and linear regression to model housing prices using various housing data variables. It finds that:
1) A linear regression model has low r-squared of 0.24 and finds insignificant or unlikely relationships with latitude and longitude.
2) A regression tree model better overlaps with actual above-median price points but still leaves some out.
3) Adding median income and population improves the regression tree, with median income being the most important split variable.
4) Decision trees may overfit the data without limits, so the author proposes using PCA and cross-validation in future work.