AI

How Much Is Your Home Worth? We Optimized a Machine Learning Model

Joshua Gao ยท 8 Jun 2023

How Much Is Your Home Worth? We Optimized a Machine Learning Model

Zillow Dataset

I collect 9,750 listings from Zillow using the Realty Mole Property Api. Each listing includes data attributes such as address, number of bed and bathrooms, city, county, days on market, lot and square footage size, and year built. This data is filtered out for single familiy homes with prices lower than $1M, a lot size less than 20,000 sqft, and a square footage of less than 3,000.

Below is a distribution of the square footage and lot size vs price.

Zillow Square Footage vs Price
Zillow Lot Size vs Price

Finding the Best Model: Gradient Descent

To find the best fit line for the data, I implement a multivariable linear regression model using stochastic gradient descent. The variables include: number of bathrooms, number of bedrooms, days on market, lot size, square footage, and year built.

I train with a learning rate of 1e-10 over 5 iterations.

However, it seems that the coefficient for all variables except for square footage and lot size were close to zero - effectively remove their terms from the linear equation. Square footage was the variable that most affected the predicted listing price.

I train again with just the square footage and lot size variables.

SGD Animation

Valuation Model

After training, we get our final valuation model:

Valuation = 0.07 + (11.081 * lot size) + (148.837 * square footage)

As you can see, square footage has the largest coefficient which means it affects the valuation the most.