Decision Trees For Objective House Price Prediction
Decision Trees For Objective House Price Prediction
Zhishuo Zhang 1, *
1Jinan New Channel
Abstract-Different people buy houses with the same value at al. [10] introduced a hybrid algorithm based on fuzzy linear
different prices, which usually leads to dissatisfaction with regression and fuzzy cognitive graphs to solve the problem of
housing prices and unfair housing prices. To solve this problem, prediction and optimization of housing market fluctuations.
we designed an objective housing price prediction scheme based The experimental results showed that the machine learning
on a decision tree. First, we selected 5 important features based prediction of house prices could retrieve and combine more
on the decision tree for subsequent modeling. Then we designed a features to demonstrate more reasonable and accurate house
housing price prediction model based on a decision tree. To prices prediction results.
obtain the optimal parameters, we used grid search. The results
showed that the number of houses is the most important factor To provide accurate and objective house price prediction
affecting housing prices, followed by the local population's results, we designed the decision tree method, where the
quality, geographic location, education, and crime rate. To verify Boston Housing Information data were used to train and test
the effectiveness of the decision tree scheme, we compared it with the model and evaluate the performance of the model. To build
some other advanced machine learning models. The a more effective decision tree model, we screened the
implementation results show that our scheme achieves the best important features based on the information gain of the
results. decision tree and then built the housing price model based on
these important features.
Keywords-Decision trees; machine learning; house price
forecasting. The remainder of this article is organized as follows.
Section II describes the designed methodologies for house
I. INTRODUCTION price prediction. Section III describes the experimental results.
Section V concludes this article.
With the rapid development of the country's economy in
the past few years, housing price, which covers many
livelihood issues, has become a concerning domestic II. METHODS
economic problem. People buy houses at different prices
because they do not thoroughly understand the house price A. Dataset and data preprocessing
system. Besides, the house prices did not objectively evaluate In this work, we used the Boston house price dataset [11]
because they are influenced by many factors, such as politics to verify our methods. The dataset contains information about
and population [1]. To promote fairness in house prices, housing prices in Boston, Massachusetts, USA, which were
regulate people's psychological imbalance in house prices and collected by the US Census Bureau. It is a small dataset of 506
provide an objective way to assess house prices, house price cases. The sample dataset contains 14 attributes, the first 13 of
forecasting is particularly important. Currently, machine which are used as feature inputs to predict Boston house prices,
learning methods provide superior performance for predicting and the 14th attribute is used as a label to be predicted, shown
house prices [2, 3]. The current house price forecasting system in Table I.
is already quite strong, but it is possible to increase accuracy
and improve the prediction scheme. With the increase of TABLE I. THE FEATURE DESCRIPTION OF BOSTON HOUSE PRICE DATASET
instability in the house price market, the traditional statistical
analysis has become less applicable. Forecasting house prices Feature Description
based on computer systems has become the main trend that CRIM Crime rate per capita for each town
can offer specific quantitative analysis, mainly divided into The proportion of residential land zoned for lots over 25,000
two research projects: 1) Traditional statistical methods: the ZN
square feet.
methods predict house price based on related principles [4, 5]. INDUS The proportion of non-retail commercial acres in each town
In the early days, it was more reliable to use various Charles River dummy variable (1 if the block is connected to
mathematical, statistical techniques to predict house prices [6]. CHAS
the river; 0 otherwise)
The GDP, currency, and population can be quantified into NOX The concentration of nitric oxide (per 10 million)
numbers. Hence using these indicators to make relevant
RM The average number of rooms per dwelling
statistical regression forecasts is also a popular method
nowadays. The data-driven statistics may result in a single AGE The proportion of owner-occupied units built before 1940
one-sided, inaccurate, and biased correlation. 2) Machine DIS The weighted distance to Boston's five employment centers
learning-based house price prediction: machine learning has RAD Radial Access Highway Accessibility Index
shown its strength in many fields [7-9]. This technique uses
TAX The full property tax rate per $10,000
regression-based methods to predict housing prices. Azadeh et
281
Authorized licensed use limited to: ULAKBIM UASL - Osmangazi Universitesi. Downloaded on November 11,2024 at 19:32:43 UTC from IEEE Xplore. Restrictions apply.
often used to measure the prediction results of machine VR
learning models.
(4)
Linear Regression
0.0
0.5
0.4
0.3
0.2
0.1
0.0
y_ rify
y--'pr diet
o I---------r-------,-------,-------r-------I
o 10 40
Figure 1. The feature importance of the Boston house price dataset based on
the decision tree Deci ion Tree
c. Model evaluation
Based on the Boston housing price data set, we compared
the decision tree model with SVR and linear regression, and
the experimental results are shown in Table II.
o I---------r------,-------.-------.--------;
o 10 40
282
Authorized licensed use limited to: ULAKBIM UASL - Osmangazi Universitesi. Downloaded on November 11,2024 at 19:32:43 UTC from IEEE Xplore. Restrictions apply.
From figure 2, it is easy to see that the predicted and true linear regression model. The experimental results show that
values of the decision tree model are by and large the closest, the decision trees provide effective solutions for house price
but the deviations at certain points are still relatively larger. forecasting. However, this experiment only used the reference
The linear regression and SVR models have larger deviations characteristics, but many more factors influence house prices.
than the decision tree. When the decision tree has larger There are many more unknowns waiting to be explored in the
deviations, the deviations of the other two models will show a future.
greater trend.
REFERENCES
D. Tuning by grid search [1] E. Ahmed and M. Moustafa, "House price estimation from visual and
Firstly, we should tune the decision tree to see if we can textual features," arXiv preprint arXiv:1609.08399, 2016.
get better results. As the decision tree has a large number of [2] Y. Kang et aI., "Understanding house price appreciation using multi-
parameters, we choose these three parameters here, shown in source big geo-data and machine learning," Land Use Policy, p. 104919,
2020.
Table III.
[3] M. Thamarai and S. Malarvizhi, "House Price Prediction Modeling
U sing Machine Learning," International Journal of Information
TABLE III. THE TUNNING PARAMETERS FOR DECISION TREE Engineering & Electronic Business, vol. 12, no. 2, 2020.
[4] K. Adam, P. Kuang, and A. Marcet, "House price booms and the current
max depth Maximum depth account," NBER Macroeconomics Annual, vol. 26, no. 1, pp. 77-122,
min_impurity_decrease Minimum impurity of node branches 2012.
min_samples_leaf Minimum number of samples required
[5] M. Berlemann, 1. Freese, and S. Knoth, "Dating the start of the US house
for a node to exist
price bubble: an application of statistical process control," Empirical
Economics, vol. 58, no. 5, pp. 2287-2307,2020.
Grid search is an exhaustive searching method that [6] P. 1. S. Chang, "House price, income and finance structure's
specifies parameter values and optimizes the parameters of the mathematical model research," Special Zone Economy, p. 06, 2013.
estimation function by cross-validation to obtain the optimal [7] M. I. Jordan and T. M. Mitchell, "Machine learning: Trends,
learning algorithm [14]. The possible values of each parameter perspectives, and prospects," Science, vol. 349, no. 6245, pp. 255-260,
are arranged and combined to produce a "grid" of all possible 2015.
combinations. The combinations are then used for decision [8] B. T. Pham, T.-A. Hoang, D.-M. Nguyen, and D. T. Bui, "Prediction of
shear strength of soft soil using machine learning methods," CATENA,
tree training, and the performance is evaluated using cross- vol. 166, pp. 181-191,2018.
validation. After the fitting function has tried all the parameter
[9] D.-C. Feng et aI., "Machine learning-based compressive strength
combinations, a suitable classifier is returned and prediction for concrete: An adaptive boosting approach," Construction
automatically adjusted to the best combination of parameters, and Building Materials, vol. 230, p. 117000, 2020.
shown in Table IV. [10] A. Azadeh, B. Ziaei, and M. Moghaddam, "A hybrid fuzzy regression-
fuzzy cognitive map algorithm for forecasting and optimization of
housing market fluctuations," Expert Systems with Applications, vol. 39,
TABLE IV. THE OPTIMAL PARAMETERS FOR DECISION TREE no. 1, pp. 298-315,2012.
max depth 5 [11] Boston House Price Dataset. Available: http://t.cn/RfHTAgY
min_impurity_decrease 0.267 [12] W. W. Koczkodaj et aI., "On normalization of inconsistency indicators
min_samples_leaf 11 in pairwise comparisons," International Journal of Approximate
random state 4 Reasoning, vol. 86, pp. 73-79,2017.
[13] M. Adelino, A. Schoar, and F. Severino, "Credit supply and house prices:
IV. CONCLUSION evidence from mortgage market segmentation," National Bureau of
Economic Research2012.
This work designed a house price prediction scheme based
[14] B. Hssina, A. Merbouha, H. Ezzikouri, and M. Erritali, "A comparative
on the important feature selection scheme and predicted study of decision tree ID3 and C4. 5," International Journal ofAdvanced
models based on the decision tree. Through the screening of Computer Science and Applications, vol. 4, no. 2, pp. 13-19,2014.
important features, it can be found that the number of houses [15] L. Rutkowski, M. Jaworski, L. Pietruczuk, and P. Duda, "The CART
is the most important feature to determine the housing price. In decision tree for mining data streams," Information Sciences, vol. 266,
addition, the basic quality of the local population, education, pp. 1-15,2014.
public security, and geographic location are also very [16] T. Chai and R. R. Draxler, "Root mean square error (RMSE) or mean
important features that affect housing prices. We selected 5 absolute error (MAE)?-Arguments against avoiding RMSE in the
literature," Geoscientific model development, vol. 7, no. 3, pp. 1247-
important features and then designed the housing price 1250,2014.
prediction scheme of the decision tree. To prove the
effectiveness of our scheme, we compared it with SVR and
283
Authorized licensed use limited to: ULAKBIM UASL - Osmangazi Universitesi. Downloaded on November 11,2024 at 19:32:43 UTC from IEEE Xplore. Restrictions apply.