This document presents an analysis of a dataset containing 200,000 mortgage loan applications to predict the interest rate spread. Key findings include:
- The most important predictive features were loan amount, loan type, property type, preapproval status, loan purpose, median family income, applicant income, and minority population percentage.
- A boosted decision tree regression model achieved the highest prediction accuracy with an R-squared of 0.77 on test data, outperforming linear regression and random forest models.
- The analysis included data exploration of relationships between numerical features, feature selection, model training, tuning, and validation.
Related topics: