A roadmap to determine the important factors of the house value: a case study by using actual price registration data of Taipei housing transactions

Main Article Content

Pei-De Wang
Mingchin Chen


While many studies have applied data mining techniques to judge housing prices, few have decoded the important attributes or prioritized them simultaneously. This paper aims to utilize five data mining techniques to discover the important attributes for three major types of real estate in Taipei city. The datasets, involving a total of 22,480 transactions, were publicly available from the Taiwan Actual Price Registration from July 2013 to August 2015. The five models are decision trees, random forests, model trees, artificial neural networks and multiple regression. The criteria used to measure the forecasting accuracy are MAPE, R², RMSE, MAE and COR. The model with the best performance for all houses is the Model Tree with a MAPE value of 27.59. As for apartments, the best is Random Forests. Artificial Neural Networks perform best for suites and buildings with elevators. Different housing types need different models. Furthermore, the attributes importance helps us to conclude the really critical attributes, which include the floor area, administrative districts, parking area and land area, and their rankings. This variable ranking and selection procedure proposed by this research can also be adopted to improve the prediction efficiency for most big data applications other than the housing transactions.


Download data is not yet available.

Article Details



ACCIANI, C.; FUCILLI, V.; SARDARO, R. (2011) Data Mining in Real Estate Appraisal: A Model Tree and Multivariate Adaptive Regression Spline Approach. Aestimum, v. 58, p. 27-45.

BAHIA, I. S. H. (2013) A Data Mining Model by Using ANN for Predicting Real Estate Market: Comparative Study. International Journal of Intelligence Science, v. 3, n. 4. p. 162-169.

BREIMAN, L.; FRIEDMAN, J. H.; OLSHEN, R. A.; STONE, C. J. (1984) Classification and Regression Trees, Belmont, CA: Wadsworth.

BREIMAN, L. (2001) Random Forests. Machine Learning, v. 45, n. 1, p. 5-32.

BRACKE, P. (2015) House Prices and Rents: Microevidence From A Matched Data Set in Central London. Real Estate Economics, v. 43, n. 2, p. 403-431.

COAKLEY, J. R.; BROWN, C. E. (2000) Artificial Neural Networks in Accounting and Finance: Modeling Issues. International Journal of Intelligent Systems in Accounting, Finance and Management, v. 9, n. 2, p. 119-144.

CORTEZ, P. (2016) Package ‘rminer’. Available: https://cran.r-project.org/web/packages/rminer/rminer.pdf. Access: 2th September, 2016.

DEL CACHO, C. (2010) A Comparison of Data Mining Methods for Mass Real Estate Appraisal, n. 27378. Munich Personal RePEc Archive.

DELMENDO, L. C. (2016) Taiwanese House Prices Continue to Fall Due to Harsh Taxes. Retrieved on September 16, 2016, Available: http://www.globalpropertyguide.com/Asia/Taiwan/Price-History.

FAN, G. Z.; ONG, S. E.; KOH, H. C. (2006) Determinants of House Price: A Decision Tree Approach. Urban Studies, v. 43, n. 12, p. 2301-2315.

FIK, T. J.; LING, D. C.; MULLIGAN, G. F. (2003) Modeling Spatial Variation in Housing Prices: A Variable Interaction Approach. Real Estate Economics, v. 31, n. 4, p. 623-646.

FONG, S.; WAH, Y. B. (2013) A Prediction Model for Forecasting the Trend of Macau Property Price Movements and Understanding the Influential Factors. Journal of Emerging Technologies in Web Intelligence, v. 5, n. 2, p. 122-131.

GAN, V.; AGARWAL, V.; KIM, B. (2015) Data Mining Analysis and Predictions of Real Estate Prices. Issues in Information Systems, v. 16, n. 4, p. 30-36.

GOODMAN, A. C. (1978) Hedonic Prices, Price Indices and Housing Markets. Journal of Urban Economics, v. 5, n. 4, p. 471-484.

JAMES, G.; WITTEN, D.; HASTIE, T.; TIBSHIRANI, R. (2013) An Introduction to Statistical Learning, New York: Springer.

KASS, G. V. (1980) An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics, v. 29, n. 2, p. 119-127.

KUHN, M.; WESTON, S.; DEEFER, C.; COUTLER, N. (2016) Cubist Models for Regression, Available: https://cran.r-project.org/web/packages/Cubist/vignettes/cubist.pdf. Access: 10th December, 2016.

MAGIDSON, J. (1994) The CHAID Approach to Segmentation Modeling: Chi-squared Automatic Interaction Detection, in: BAGOZZI, R. P. (Ed.), Advanced Methods of Marketing Research. Malden (Mass. US): Blackwell Business, p. 118-159.

MANVILLE, M. (2013) Parking Requirements and Housing Development: Regulation and Reform in Los Angeles. Journal of the American Planning Association, v. 79, n. 1, p. 49-66.

MULLEY C. (Ed.), Parking: Issues and Policies. United Kingdom: Emerald Publishing, p. 87-113.

MUNUSAMY, M.; MUTHUVEERAPPAN, C.; BABA, M.; ABDULLAH, M. N.; ASMONI, M. (2015). An Overview of the Forecasting Methods Used in Real Estate Housing Price Modelling. Jurnal Teknologi, v. 73, n. 5, p. 189-193.

QUINLAN, J. R. (1986) Induction of Decision Trees. Machine Learning, v. 1, p. 81-106.

QUINLAN, J. R. (1992) C4. 5: Programming for Machine Learning, San Mateo, CA: Morgan Kauffmann.

SHOUP, D. (2014) The High Cost of Minimum Parking Requirements, in: ISON, S.;

SIRMANS, G. S.; MACDONALD, L.; MACPHERSON, D. A.; ZIETZ, E. N. (2006) The Value of Housing Characteristics: A Meta Analysis. The Journal of Real Estate Finance and Economics, v. 33, n. 3, p. 215-240.

WELCH, T. F.; GEHRKE, S. R.; WANG, F. (2016) Long-term Impact of Network Access to Bike Facilities and Public Transit Stations on Housing Sales Prices in Portland, Oregon. Journal of Transport Geography, v. 54, p. 264-272.

WITTEN, I. H.; FRANK, E. (2005) Data Mining: Practical Machine Learning Tools and Techniques, 5 ed. Boston, MA: Morgan Kaufmann.

WOODS, E.; KYRAL, E. (1997) Ovum Evaluates Data Mining, London: Ovum.

XIAO, Y.; ORFORD, S.; WEBSTER, C. J. (2016) Urban Configuration, Accessibility, and Property Prices: A Case Study of Cardiff, Wales. Environment and Planning B: Planning and Design, v. 43, n. 1, p. 108-129.