Another awesome framework (a.k.a library) in Python is scikit-learn. The framework provides a simple and consistent API for data modeling and analytics. With scikit-learn, it is easy to implement all the key machine learning algorithms e.g. Linear Regression, Support Vector Machines, Naives Bayes, Neural Network, Decision Tree, K-means … etc.

In this case study, we will build a model to predict the price of a condo unit using a Multivariate Regression algorithm. Actual condo transactions data from Q1 2019 was downloaded from Urban Redevelopment Authority (https://www.ura.gov.sg) into a csv file. The downloaded data is used as a baseline to construct a training dataset for our model.

Pandas’ DataFrame to hold the dataset comprising 1495 condo transactions (original 14 columns)
Stats for the ‘Sale Price’ of all the condo transactions
DataFrame is modified to remove non-relevant variables and also to incorporate a new variable ‘Age’ (10 columns instead of 14)

Prior to training the model, an exploratory data analysis is done on the dataset to determine the relationships among the different variables (a.k.a. features). This analysis would yield useful information for selecting the appropriate features as the Dependent Variables to predict the Sale Price (i.e. Target). See the histograms and pair-plots of the variables below.

From the exploratory data analysis, we select the appropriate dependent variables to predict Sale Price. The selected variables are ‘Floor Area’, ‘Age’, ‘Tenure’, ‘Near MRT’, ‘District’ and ‘Floor Level’. In order for the algorithm to compute, we transform categorical variables ‘Tenure’, ‘Near MRT’, ‘District’ and ‘Floor Level’ variables into numerical variables. See the transformation of the condo DataFrame below.

BEFORE Transformation: Top of 5 rows of the condo’s DataFrame
AFTER Transformation: Top 5 rows of condo’s DataFrame

Once the training of the model is completed, we could use it to predict the Sale Price of a condo unit. See the result of a test run to predict the Sale Price of a condo unit having specified features.

Result of a test run to predict the Sale Price of a condo unit having specified features

The Python code is shown below: