Abstract
Analysing stock financial data and producing an insight into it are not easy tasks for many stock investors, particularly individual investors. Therefore, building a good stock portfolio from a pool of stocks often requires Herculean efforts. This paper proposes a stock profiling framework, StockProF, for building stock portfolios rapidly. StockProF utilizes data mining approaches, namely, (1) Local Outlier Factor (LOF) and (2) Expectation Maximization (EM). LOF first detects outliers (stocks) that are superior or poor in financial performance. After removing the outliers, EM clusters the remaining stocks. The investors can then profile the resulted clusters using mean and 5-number summary. This study utilized the financial data of the plantation stocks listed on Bursa Malaysia. The authors used 1-year stock price movements to evaluate the performance of the outliers as well as the clusters. The results showed that StockProF is effective as the profiling corresponded to the average capital gain or loss of the plantation stocks.



Similar content being viewed by others
References
Abbas OA et al (2008) Comparisons between data clustering algorithms. Int Arab J Inf Technol 5(3):320–325
Ang A, Kjaer K (2011) Investing for the long run. A decade of challenges: a collection of essays on pensions and investments Andra AP-fonden, Second Swedish National Pension Fund-AP2
Basu S (1977) Investment performance of common stocks in relation to their price-earnings ratios: a test of the efficient market hypothesis. J Finance 32(3):663–682
Bellazzi R, Zupan B (2008) Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inf 77(2):81–97
Benmelech E, Dvir E (2013) Does short-term debt increase vulnerability to crisis? Evidence from the east asian financial crisis. J Int Econ 89(2):485–494
Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8
Bradley PS, Fayyad U, Reina C (1998) Scaling em (expectation-maximization) clustering to large databases. Tech. rep., Technical Report MSR-TR-98-35, Microsoft Research Redmond
Brenuig MM, Kriegel HP, Ng R, Sander J (2000) Lof: identifying density-based local outliers. ACM Sigmod Rec 29(2):79–104
Chang PC, Liu CH (2008) A tsk type fuzzy rule based system for stock price prediction. Exp Syst Appl 34(1):135–144
Chen AH, Siems TF (2004) The effects of terrorism on global capital markets. Eur J Polit Econ 20(2):349–366
Chipman H, Tibshirani R (2006) Hybrid hierarchical clustering with applications to microarray data. Biostatistics 7(2):286–301
Dechow PM, Hutton AP, Meulbroek L, Sloan RG (2001) Short-sellers, fundamental analysis, and stock returns. J Financ Econ 61(1):77–106
Do CB, Batzoglou S (2008) What is the expectation maximization algorithm? Nat Biotechnol 26(8):897–900
Enke D, Thawornwong S (2005) The use of data mining and neural networks for forecasting stock market returns. Exp Syst Appl 29(4):927–940
Estivill-Castro V, Yang J (2004) Fast and robust general purpose clustering algorithms. Data Mining Knowl Discov 8(2):127–150
Fama EF (1965) Random walks in stock market prices. Financ Anal J 21:55–59
Farmer RE (2012) The stock market crash of 2008 caused the great recession: theory and evidence. J Econ Dyn Control 36(5):693–707
Fisher PA (1997) Common stocks and uncommon profits, vol 16. Wiley, New York
Fung GPC, Yu JX, Lam W (2002) News sensitive stock trend prediction. In: Chen M-S, Yu PS, Liu B (eds) Advances in knowledge discovery and data mining, Springer, Berlin, pp 481–493
Graham B, McGowan B (2005) The intelligent investor. HarperCollins, New York
Greenwald BC, Kahn J, Sonkin PD, Van Biema M (2004) Value investing: from Graham to Buffett and beyond. Wiley, New York
Han J, Kamber M (2006) Data mining, Southeast Asia edition: concepts and techniques. Morgan kaufmann, Burlington
Hsu CM (2011) A hybrid procedure for stock price prediction by integrating self-organizing map and genetic programming. Exp Syst Appl 38(11):14,026–14,036
Huang CF, Chang BR, Cheng DW, Chang CH (2012) Feature selection and parameter optimization of a fuzzy-based stock selection model using genetic algorithms. Int J Fuzzy Syst 14(1):65–75
Huang W, Nakamori Y, Wang SY (2005) Forecasting stock market movement direction with support vector machine. Comput Oper Res 32(10):2513–2522
Huarng KH, Yu THK, Kao TT (2008) Analyzing structural changes using clustering techniques. Int J Innov Comput Inf Control 4(5):1195–1201
Jain AK, Murty MN, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Jin X, Han J (2010) Expectation maximization clustering. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning, Springer, Berlin, pp 382–383
Karabulut Y (2011) Can facebook predict stock market activity? SSRN eLibrary
Kasa K (1992) Common stochastic trends in international stock markets. J Monet Econ 29(1):95–124
Keller A (2000) Fuzzy clustering with outliers. In: Fuzzy Information Processing Society, 2000. NAFIPS. 19th international conference of the North American, IEEE, pp 143–147
Kirkos E, Spathis C, Manolopoulos Y (2007) Data mining techniques for the detection of fraudulent financial statements. Exp Syst Appl 32(4):995–1003
Klawonn F, Rehm F (2009) Cluster analysis for outlier detection, Chap 35. In: Wang J (ed) Encyclopedia of data warehousing and mining. IGI Global, pp 214–218
Knox EM, Ng RT (1998) Algorithms for mining distancebased outliers in large datasets. In: Proceedings of the international conference on very large data bases, Citeseer, pp 392–403
Kohara K, Ishikawa T, Fukuhara Y, Nakamura Y (1997) Stock price prediction using prior knowledge and neural networks. Intell Syst Account Finance Manag 6(1):11–22
Kusiak A, Shah S (2006) Data-mining-based system for prediction of water chemistry faults. IEEE Trans Ind Electron 53(2):593–603
Ladas A, Ferguson E, Aickelin U, Garibaldi J (2015) A data mining framework to model consumer indebtedness with psychological factors. CoRR arXiv:1502.05911
Lee AJ, Lin MC, Kao RT, Chen KT (2010) An effective clustering approach to stock market prediction. In: Pacific Asia Conference on Information Systems, pp 345–354
Lowe J (2007) Warren Buffett speaks: wit and wisdom from the world’s greatest investor. Wiley, New Jersey
Lu CL, Chen TC (2009) A study of applying data mining approach to the information disclosure for taiwans stock market investors. Exp Syst Appl 36(2):3536–3542
Lynch LPPS (1994) Beating the street. Simon and Schuster, New York
Mittermayer MA (2004) Forecasting intraday stock price trends with text mining techniques. In: System sciences, 2004. Proceedings of the 37th annual Hawaii international conference on, IEEE, pp 10
Mizuno H, Kosaka M, Yajima H, Komoda N (1998) Application of neural network to technical analysis of stock market prediction. Stud Inform Control 7(3):111–120
Nanda S, Mahanty B, Tiwari M (2010) Clustering indian stock market data for portfolio management. Exp Syst Appl 37(12):8793–8798
Norio O, Ye T, Kajitani Y, Shi P, Tatano H (2011) The 2011 eastern japan great earthquake disaster: overview and comments. Int J Disaster Risk Sci 2(1):34–42
Olson DL, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin Englewood Cliffs, New York
Ordonez C, Cereghini P (2000) Sqlem: fast clustering in sql using the em algorithm. IN: ACM SIGMOD Record, ACM 29:559–570
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. Sigkdd Explor 6(1):90–105
Penman SH (2007) Financial statement analysis and security valuation, 3rd edn. McGraw-Hill/Irwin, New York
Sim K, Liu G, Gopalkrishnan V, Li J (2011) A case study on financial ratios via cross-graph quasi-bicliques. Inf Sci 181(1):201–216
Siu A, Wong YR (2004) Economic impact of sars: the case of hong kong*. Asian Econ Pap 3(1):62–83
Sun J, Li H (2008) Data mining method for listed companies financial distress prediction. Knowl Based Syst 21(1):1–5
Tan CS, Yong CK, Tay YH (2012) Modeling financial ratios of malaysian plantation stocks using bayesian networks. In: Sustainable utilization and development in engineering and technology (STUDENT), 2012 IEEE Conference on, IEEE, pp 7–12
Teknomo K (2006) K-means clustering tutorial. Medicine 100(4):3
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B 63(2):411–423
Wang JT, Zaki MJ, Toivonen HT, Shasha D (2005) Int Data Mining Bioinform. Springer, Berlin
Wang YF (2003) Mining stock price using fuzzy rough set system. Exp Syst Appl 24(1):13–23
Whitley E, Ball J (2001) Statistics review 1: presenting and summarising data. Crit Care 6(1):66
Wittman T (2002) Time-series clustering and association analysis of financial data. University of Texas, Austin
Wong WK, Manzur M, Chew BK (2003) How rewarding is technical analysis? Evidence from singapore stock market. Appl Financ Econ 13(7):543–551
Yoon Y, Swales G (1991) Predicting stock price performance: A neural network approach. In: System Sciences, 1991. Proceedings of the twenty-fourth annual Hawaii international conference on, IEEE, vol 4, pp 156–162
Zhang Y, Wu L (2009) Stock market prediction of S&P 500 via combination of improved bco approach and bp neural network. Exp Syst Appl 36(5):8849–8854
Zingales L (2008) Causes and effects of the lehman brothers bankruptcy. Committee on Oversight and Government Reform US House of Representatives
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ng, KH., Khor, KC. StockProF: a stock profiling framework using data mining approaches. Inf Syst E-Bus Manage 15, 139–158 (2017). https://doi.org/10.1007/s10257-016-0313-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10257-016-0313-z