Abstract
In recent years, researchers have begun to study inductive databases, a new generation of databases for leveraging decision support applications. In this context, the user interacts with the DBMS using advanced, constraint-based languages for data mining where constraints have been specifically introduced to increase the relevance of the results and, at the same time, to reduce its volume. In this paper we study the problem of mining frequent itemsets using an inductive database. We propose a technique for query answering which consists in rewriting the query in terms of union and intersection of the result sets of other queries, previously executed and materialized. Unfortunately, the exploitation of past queries is not always applicable. We then present sufficient conditions for the optimization to apply and show that these conditions are strictly connected with the presence of functional dependencies between the attributes involved in the queries. We show some experiments on an initial prototype of an optimizer which demonstrates that this approach to query answering is viable and in many practical cases it drastically reduces the query execution time.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abiteboul, S., & Duschka, O. M. (1998). Complexity of answering queries using materialized views. In Proceedings of the seventeenth ACM PODS conference, June 1–3, 1998, Seattle, Washington, (pp.) 254–263, ACM Press.
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, A. I. (1995). Fast Discovery of Association Rules. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.), Knowledge discovery in databases, Vol. 2. AAAI/MIT Press.
Chaudhuri, S., & Dayal, U. (1997). An overview of data warehousing and olap technology. Sigmod Record, 26 (1), 65–74.
Chaudhuri, S., Krishnamurthy, R., Potamianos, S., & Shim, K. (1995). Optimizing queries with materialized views. P. S. Yu and A. L. P. Chen (Eds.), In Proceedings of the eleventh international conference on data engineering, March 6–10, 1995, Taipei, Taiwan, (pp. 190–200), IEEE Computer Society.
Chaudhuri, S., Narasayya, V., & Sarawagi, S. (2002). Efficient evaluation of queries with mining predicates. In Proceedings of the 18th int'l conf. on data engineering, USA.
Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., & Ullman, J. (1998). Computing iceberg queries efficiently. In Proceeding of VLDB '98.
Gallo, A., Esposito, R., Meo, R., & Botta, M. (2005). Optimization of association rules extraction through exploitation of context dependent constraints. In Proceedings of the ninth AI*IA conf., lecture notes in artificial intelligence (to appear).
Goldberg, D. E. (1989). Genetic algorithms, in search, optimization and machine learning. Reading, MA: Addison-Wesley.
Günzel, H., Albrecht, J., & Lehner, W. (2000). Use and reuse of association rules in an OLAP environment. In Proceedings of the 2000 inf. resources management association int. conf. on challenges of information technology management in the 21st century, (pp. 566–569). Idea Group Publishing.
Gupta, A., Harinarayan, V., & Quass, D. (1995). Aggregate query processing in data warehousing environments. In Proceedings of VLDB '95, (pp. 358–369).
Hamming, R. W. (1950). E rror-D etecting and E rror-C orrecting C odes. Bell System Technical Journal, 29, 147–160.
Han, J., Fu, Y., Wang, W., Koperski, K., & Zaiane, O. (1996). DMQL : A data mining query language for relational databases. In Proceedings of SIGMOD-96 workshop on research issues on data mining and knowledge discovery.
Imielinski, T., & Mannila, H. (1996). A database perspective on knowledge discovery. Communications of the ACM, 39(11), 58–64.
Imielinski, T., Virmani, A., & Abdoulghani, A. (1996). DataMine: application programming interface and query language for database mining. KDD-96, (pp. 256–260).
Lenz, H.-J., & Shoshani, A. (1997). Summarizability in OLAP and statistical data bases. In Proceedings ninth international conference on scientific and statistical database management, (pp. 132–143). IEEE Computer Society.
Leung, C. K.-S., Lakshmanan, L. V. S., & Ng, R. T. (2002). Exploiting succinct constraints using fp-trees. ACM SIGKDD Explorations, 4(1), 40–49.
Levy, A.Y., Mendelzon, A.O., Sagiv, Y., & Srivastava, D. (1995). Answering queries using views. In Proceedings of the 14th ACM SIGACT -SIGMOD -SIGART symposium on principles of database systems, San Jose, Calif, (pp. 95–104).
Malvestuto, F. M. (1988). The derivation problem for summary data. In Proceedings of the ACM SIGMOD international conference on management of data, (pp. 82–89).
Meo, R. (2003). Optimization of a language for data mining. In Proceedings of the 2003 ACM symposium on applied computing. Melbourne, Florida.
Meo, R., Psaila, G., & Ceri, S. (1996). A new SQL -like operator for mining association rules. In Proceedings of the 22st VLDB conference. Bombay, India.
Ng, R. T., Lakshmanan, L. V. S., Han, J., & Pang, A. (1998). Exploratory mining and pruning optimizations of constrained associations rules. In Proceedings of 1998 ACM SIGMOD international conference management of data, (pp. 13–24).
Nutt, W., Sagiv, Y., & Shurin, S. (1998). Deciding equivalence among aggregate queries. In Proceedings of ACM PODS, (pp. 214–223).
Padmanabhan, B., & Tuzhilin, A. (1999). Unexpectedness as a measure of interestingness in knowledge discovery. Decis. Support Syst, 27 (3), 303–318.
Padmanabhan, B., & Tuzhilin, A. (2002). Knowledge refinement based on the discovery of unexpected patterns in data mining. Decis. Support Syst, 33 (3), 309–321.
Park, C.-S., Kim, M. H., & Lee, Y.-J. (2001). Rewriting OLAP queries using materialized views and dimension hierarchies in data warehouses. In Proceeding of ICDE'01.
Perng, C.-S., Wang, H., Ma, S., & Hellerstein, J. L. (2002). Discovery in multi-attribute data with user-defined constraints. ACM SIGKDD Explorations, 4(1), 56–64.
Pottinger, R., & Levy, A.Y. (2000). A scalable algorithm for answering queries using views. In The VLDB Journal, (pp. 484–495).
Quass, D., & Widom, J. (1997). On-line warehouse view maintenance. In Proceedings of ACM SIGMOD international conference on management of data, (pp. 393–404).
Raedt, L. D. (2002). A perspective on inductive databases. ACM SIGKDD Explorations, 4(2), 69–77.
Raedt, L. D., Jaeger, M., Lee, S. D., & Mannila, H. (2002). A theory of inductive query answering. In Proceedings of IEEE int'l conf. on data mining, (pp. 123–130).
Sebag, M., & Rouveirol, C. (2000). Resource-bounded relational reasoning: induction and deduction through stochastic matching. Machine Learning, 38(1/2), 41–62.
Srikant, R., Vu, Q., & Agrawal, R. (1997). Mining association rules with item constraints. In Proceedings of 1997 ACM KDD, (pp. 67–73).
Tsur, D., Ullman, J. D., Abiteboul, S., Clifton, C., Motwani, R., Nestorov, S., & Rosenthal, A. (1998). Query flocks: a generalization of association-rule mining. In Proceedings of 1998 ACM SIGMOD international conference management of data.
Wang, H., & Zaniolo, C. (1998). User defined aggregates for logical data languages. In Proceedings of DDLP (pp. 85–97).
Wang, X.S., & Li, C. (1999). Deriving orthogonality to optimize the search for summary data. Information Systems 24(1), 47–65.
Widom, J. (1995). Research problems in data warehousing. In Proceedings of fourth int. conf. on information and knowledge management, (pp. 25–30).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Esposito, R., Meo, R. & Botta, M. Answering constraint-based mining queries on itemsets using previous materialized results. J Intell Inf Syst 26, 95–111 (2006). https://doi.org/10.1007/s10844-006-5453-z
Issue Date:
DOI: https://doi.org/10.1007/s10844-006-5453-z