Abstract
This paper presents a new application for discovering useful knowledge from purchase history that can be helpful to create effective marketing strategy, using a machine learning algorithm, BONSAI, proposed by Shimozono et al. in 1994 which was originally developed for analyzing string patterns developed for knowledge discovery from amino acid sequences. In order to adapt BONSAI to our purpose, we translate purchase history of customers into character strings such that each symbol represents a brand purchased by a customer. For our purpose, we extend BONSAI in the following aspects; 1) While original BONSAI generates a decision tree over regular patterns which are limited to substrings, we extend it to subsequences. 2) We generate rules which contain not only regular patterns but numerical attributes such as age, the number of visits, profit and etc. 3) We extend regular expression so that we can consider whether a certain pattern occurs in some latter part of the whole string. 4) We implement majority voting based on 1-D and 2-D region rules on top of decision trees.
Applying the BONSAI extended in this manner to real customers’ purchase history of drugstore chain in Japan, we have succeeded in generating interesting business rules which practitioners have not yet recognized.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Agrawal, T. Imielinski,and A. Swami, Database mining: A performance perspective, IEEE Transactions on Knowledge and Data Engineering, Vol.5, pp. 914–925, 1993.
S. Arikawa, S. Miyano, A. Shinohara, S. Kuhara, Y. Mukouchi and T. Shinohara, A machine discovery from amino acid sequences by decision trees over regular patterns, New Generation Computing, Vol.11, pp. 361–375, 1993.
P.B. Chou, E. Grossman, D. Gunopulos, and P. Kamesam, Identifying Prospective Customers, Proc. KDD 2000, pp. 447–456, 2000.
C. Fishman, This is a Marketing Revolution, Fast Company, pp. 206–218, 1999.
Y. Hamuro, N. Katoh, Y. Matsuda and K. Yada, Mining Pharmacy Data Helps to Make Profits, Data Mining and Knowledge Discovery, Vol.2 No.4, pp. 391–398, 1998.
M. Hirao, H. Hoshino, A. Shinohara, M. Takeda, and S. Arikawa, A Practical Algorithm to Find the Best Subsequence Patterns, Proc. of 3rd International Conference on Discovery Science, LNAI 1967, pp. 141–154, 2000.
N. Horiguchi, K. Yada, Y. Hamuro, N. Katoh, and Y. Kambayashi, An Optimized Weighted Majority Decision, Proc. of INFORMS-KORMS Seoul 2000, pp. 1663–1669, 2000.
E. Ip K. Yada, Y. Hamuro, and N. Katoh, A Data Mining System for Managing Customer Relationship, Proc. of the 2000 Americas Conference on Information Systems, pp. 101–105, 2000.
B. Kitts, D. Freed, and M. Vrieze, Cross-sell: A Fast Promotion-Tunable Customeritem Recommendation Method Based on Conditionally Independent Probabilities, Proc. KDD 2000, pp. 437–446, 2000.
A. Nakaya, H. Furukawa, and S. Morishita, Weighted Majority Decision among Several Region Rules for Scientific, Proc. of Second International Conference on Discovery Science, LNAI 1721, Springer-Verlag, pp. 17–29, 1999.
J. R. Quinlan, Induction of Decision Trees, Machine Learning, Vol.1, pp. 81–106, 1986.
J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufman, 1993.
J. R. Quinlan, See5/C5.0, http://www.rulequest.com, Rulequest Research, 1999.
G. Piatetsky-Shapiro (Editor), Knowledge Discovery in Databases, AAAI Press, 1991.
S. Shimozono, A. Shinohara, T. Shinohara, S. Miyano, S. Kuhara and S. Arikawa, Knowledge Acquisition from Amino Acid Sequences by Machine Learning System BONSAI, Trans. Information Processing Society of Japan, Vol.35, pp. 2009–2018, 1994.
W. E. Spangler, J. H. May and L. G. Vargas, Choosing Data-mining Methods for Multiple Classification: Representational and Performance Measurement Implications for Decision Support, Journal of Management Information System, Vol.16 No.1, pp. 37–62, 1999.
T. K. Sung, H. M. Chung and P. Gray, Special Section: Data Mining, Journal of Management Information System, Vol.16 No.1, pp. 11–16, 1999.
R. Uthurusamy, U.M. Fayyad, and S. Spangler, Learning Useful Rules from Inconclusive Data, In [14], pp. 141–157, 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hamuro, Y., Kawata, H., Katoh, N., Yada, K. (2002). A Machine Learning Algorithm for Analyzing String Patterns Helps to Discover Simple and Interpretable Business Rules from Purchase History. In: Arikawa, S., Shinohara, A. (eds) Progress in Discovery Science. Lecture Notes in Computer Science(), vol 2281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45884-0_43
Download citation
DOI: https://doi.org/10.1007/3-540-45884-0_43
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43338-5
Online ISBN: 978-3-540-45884-5
eBook Packages: Springer Book Archive