APC Australia

An Introducti­on to R – Part 3

Machine-learning can be applied to almost anything – even house prices. Darren Yates introduces matrices, data frames and regression analysis to predict home sale prices.

-

If there’s a topic that grabs attention, it’s real estate. Whether you’re trying to up-size, down-size or just plain get into any size, real estate in Australia isn’t for the feint-hearted. If you’ve ever been to a home auction, even if you’re just there to stickybeak, no doubt you’ve tried to predict the final sale price. Real estate agents probably get closer than most of us, thanks to their background or ‘domain’ knowledge. The question we ask this month is can we predict a house sale price using just machine-learning.

CLASSIFICA­TION VS REGRESSION

So far in this series, the prediction tasks we’ve worked on have all been about predicting a category class. For example, in the Kaggle Titanic competitio­n dataset, we were looking to predict whether a particular passenger survived or not. If we were in the weather forecastin­g business, we’d be trying to predict whether or not it will rain tomorrow. These are forms of machine-learning known as ‘classifica­tion’ – we’re trying to learn the pattern that most accurately categorise­s data into various classes. This time, we’re not interested in a category-based prediction result like ‘ yes/no’, but a numerical one – the house sale price. For this, we need a technique called ‘ regression analysis’.

THE DATA

This question of predicting house prices with machine-learning isn’t new – in fact, it dates back at least 40 years, when the ‘ Boston Housing’ dataset was published in a 1978 research paper. The data science competitio­n site, Kaggle, has a beginner’s competitio­n that uses a more recent housing dataset based on home sales in the U.S. town of

Ames, Iowa. It consists of nearly 3,000 records in two datasets for houses sold in the town, each with 80 factors or ‘attributes’ describing the house, everything from land zoning to the number of kitchens(!) to the type of roofing material. The last of those attributes in the training dataset is the sale price of the house. Our job is to find the pattern amongst the 1,500 records of the training dataset that links one or more of those attributes to the sale price and to turn that into a mathematic­al equation or ‘ model’. We then run that model against the second set of 1,500 records in the test dataset to predict each sale price, submit our prediction­s to Kaggle and see how we stack up on the leaderboar­d.

QUICK SETUP

If you’ve missed the previous parts of our R intro, don’t worry – setting up your system for basic machine learning is easy. We’re using a combinatio­n of the R programmin­g language plus RStudio integrated developmen­t environmen­t (IDE). They’re both free and open-source, plus you’ll find versions for Windows, Linux and macOS. The only thing you must do is make sure you install the R language first, followed by RStudio. The R language installer is available from the CSIRO’s CRAN mirror at cran. csiro. au, the download for RStudio Desktop is at tinyurl.com/apc462rstu­dio. Once they’re installed, fire up RStudio.

 ??  ?? Sign up to Kaggle, select the House Prices comp, choose ‘Data’ to get the data.
Sign up to Kaggle, select the House Prices comp, choose ‘Data’ to get the data.
 ??  ?? Download our simple R script solution at
Download our simple R script solution at
 ??  ?? Kaggle’ House Prices competitio­n will skill you up on regression analysis.
Kaggle’ House Prices competitio­n will skill you up on regression analysis.
 ??  ?? Don’t forget to install the R language before you install RStudio Desktop.
Don’t forget to install the R language before you install RStudio Desktop.

Newspapers in English

Newspapers from Australia