Improved EAV-Based Algorithm for Decision Rules Construction
<p>General idea of the approach for decision rules construction.</p> "> Figure 2
<p>Data table <span class="html-italic">T</span> transformed to matrix form.</p> "> Figure 3
<p>EAV table and ranking of attributes for data presented in <a href="#entropy-25-00091-f002" class="html-fig">Figure 2</a>.</p> "> Figure 4
<p>Separable subtables <math display="inline"><semantics> <mrow> <mi>T</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mn>1</mn> </msub> <mo>,</mo> <mi>h</mi> <mi>i</mi> <mi>g</mi> <mi>h</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>T</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mn>2</mn> </msub> <mo>,</mo> <mi>g</mi> <mi>o</mi> <mi>o</mi> <mi>d</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>T</mi> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mn>3</mn> </msub> <mo>,</mo> <mi>b</mi> <mi>i</mi> <mi>g</mi> <mo>)</mo> </mrow> </mrow> </semantics></math> of decision table <span class="html-italic">T</span>.</p> "> Figure 5
<p>The average length of rules relative to number of attributes in given dataset, obtained for 100%, 80% and 60% of best attributes from the ranking.</p> "> Figure 6
<p>Wilcoxon test results-comparison of the average rules length.</p> "> Figure 7
<p>Wilcoxon test results-comparison of the average rules support.</p> "> Figure 8
<p>The average accuracy of classification, obtained for 100%, 80% and 60% of best attributes from the ranking.</p> "> Figure 9
<p>Wilcoxon test results-comparison of the average classification accuracy.</p> ">
Abstract
:1. Introduction
2. Selection of Attributes for Rule Construction
3. Decision Rules Construction Approach
3.1. Data Transformation and Attribute Selection
Algorithm 1 Algorithm for conversion of symbolic values of attributes into numerical equivalents. |
Input: decision table T with condition attributes , row |
Output:-matrix data form of T |
; // is a set of unique pairs () from T |
for each r of T do |
add descriptor () to ; |
each for |
for each descriptor () from do |
add column to , named , filled with 0’s; |
end for |
for each r of T do |
set value to 1 for column named where and ; |
end for |
3.2. Construction of Decision Rules
Algorithm 2 Algorithm for induction of decision rules. |
Input: decision table T with numerical values of attributes, number p of best attributes to be taken into consideration |
Output: set of unique rules R |
; |
; |
convert T into table; |
calculate grouped by and create a ranking; |
select p attributes from the ranking and select descriptors from containing selected attributes; |
while all selected descriptors are not processed do |
create separable subtable ; |
; |
if is degenerate OR then |
, where is the most common decision for ; |
else |
; |
end if |
end while |
4. Selected Greedy Heuristics
Algorithm 3 Heuristic (M, or ) for induction of decision rules. |
Input: Decision table T with condition attributes and row r |
Output: Decision rule for T and given row r |
; |
; |
while is not degenerate do |
select attribute as follows: • heuristic M selects which minimizes the value ; • heuristic selects which minimizes the value ; • heuristic selects which maximizes the value ; |
; |
; |
; |
end while |
; |
- ,
- ,
- and
- , , ,;
- , , ,;
- , , ,, , ,;
5. Experimental Results
- balance-scale,
- breast-cancer,
- cars,
- flags,
- hayes-roth-data,
- house-votes,
- lymphography,
- tic-tac-toe.
5.1. Comparison from the Point of Data Representation
5.2. Comparison from The Point of Data Classification
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Stefanowski, J.; Vanderpooten, D. Induction of decision rules in classification and discovery-oriented perspectives. Int. J. Intell. Syst. 2001, 16, 13–27. [Google Scholar] [CrossRef]
- An, A.; Cercone, N. Rule Quality Measures Improve the Accuracy of Rule Induction: An Experimental Approach. In International Symposium on Methodologies for Intelligent Systems; Raś, Z.W., Ohsuga, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2000; Volume 1932, pp. 119–129. [Google Scholar]
- Wróbel, L.; Sikora, M.; Michalak, M. Rule Quality Measures Settings in Classification, Regression and Survival Rule Induction—An Empirical Approach. Fundam. Inform. 2016, 149, 419–449. [Google Scholar] [CrossRef]
- Nguyen, H.S.; Ślȩzak, D. Approximate Reducts and Association Rules-Correspondence and Complexity Results. In RSFDGrC 1999; Zhong, N., Skowron, A., Ohsuga, S., Eds.; Springer: Berlin/Heidelberg, Germany, 1999; Volume 1711, pp. 137–145. [Google Scholar]
- Moshkov, M.J.; Piliszczuk, M.; Zielosko, B. Greedy Algorithm for Construction of Partial Association Rules. Fundam. Inform. 2009, 92, 259–277. [Google Scholar] [CrossRef]
- Żabiński, K.; Zielosko, B. Algorithm based on eav model. Entropy 2021, 23, 14. [Google Scholar] [CrossRef]
- Zielosko, B.; Żabiński, K. Selected approaches for decision rules construction-comparative study. In Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 25th International Conference KES-2021, Szczecin, Poland, 8–10 September 2021; Watróbski, J., Salabun, W., Toro, C., Zanni-Merk, C., Howlett, R.J., Jain, L.C., Eds.; Elsevier: Szczecin, Poland, 2021; Volume 192, pp. 3667–3676. [Google Scholar]
- Alsolami, F.; Amin, T.; Moshkov, M.; Zielosko, B.; Żabiński, K. Comparison of heuristics for optimization of association rules. Fundam. Inform. 2019, 166, 1–14. [Google Scholar] [CrossRef]
- Guyon, I.; Gunn, S.; Nikravesh, M.; Zadeh, L. (Eds.) Feature Extraction: Foundations and Applications; Studies in Fuzziness and Soft Computing; Springer: Berlin/Heidelberg, Gernamy, 2006; Volume 207. [Google Scholar]
- Stańczyk, U.; Zielosko, B.; Jain, L.C. Advances in Feature Selection for Data and Pattern Recognition: An Introduction. In Advances in Feature Selection for Data and Pattern Recognition; Stańczyk, U., Zielosko, B., Jain, L.C., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 1–9. [Google Scholar]
- Reif, M.; Shafait, F. Efficient feature size reduction via predictive forward selection. Pattern Recognit. 2014, 47, 1664–1673. [Google Scholar] [CrossRef]
- Pawlak, Z.; Skowron, A. Rough sets and Boolean reasoning. Inf. Sci. 2007, 177, 41–73. [Google Scholar] [CrossRef] [Green Version]
- Ślȩzak, D.; Wróblewski, J. Order based genetic algorithms for the search of approximate entropy reducts. In RSFDGrC 2003; Wang, G., Liu, Q., Yao, Y., Skowron, A., Eds.; Springer: Berlin/Heidelberg, Gernamy, 2003; Volume 2639, pp. 308–311. [Google Scholar]
- Chen, Y.; Zhu, Q.; Xu, H. Finding rough set reducts with fish swarm algorithm. Knowl.-Based Syst. 2015, 81, 22–29. [Google Scholar] [CrossRef]
- Pawlak, Z.; Skowron, A. Rudiments of rough sets. Inf. Sci. 2007, 177, 3–27. [Google Scholar] [CrossRef]
- Zielosko, B.; Stańczyk, U. Reduct-based ranking of attributes. In Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 24th International Conference KES-2020, Virtual Event, 16–18 September 2020; Cristani, M., Toro, C., Zanni-Merk, C., Howlett, R.J., Jain, L.C., Eds.; Elsevier: Amsterdam, The Netherlands, 2020; Volume 176, pp. 2576–2585. [Google Scholar]
- Zielosko, B.; Piliszczuk, M. Greedy Algorithm for Attribute Reduction. Fundam. Inform. 2008, 85, 549–561. [Google Scholar]
- Yang, Y.; Chen, D.; Wang, H. Active Sample Selection Based Incremental Algorithm for Attribute Reduction With Rough Sets. IEEE Trans. Fuzzy Syst. 2017, 25, 825–838. [Google Scholar] [CrossRef]
- Raza, M.S.; Qamar, U. Feature selection using rough set-based direct dependency calculation by avoiding the positive region. Int. J. Approx. Reason. 2018, 92, 175–197. [Google Scholar] [CrossRef]
- Wang, C.; Shi, Y.; Fan, X.; Shao, M. Attribute reduction based on k-nearest neighborhood rough sets. Int. J. Approx. Reason. 2019, 106, 18–31. [Google Scholar] [CrossRef]
- Ferone, A. Feature selection based on composition of rough sets induced by feature granulation. Int. J. Approx. Reason. 2018, 101, 276–292. [Google Scholar] [CrossRef]
- Błaszczyński, J.; Słowiński, R.; Szela̧g, M. Sequential covering rule induction algorithm for variable consistency rough set approaches. Inf. Sci. 2011, 181, 987–1002. [Google Scholar] [CrossRef]
- Sikora, M.; Wróbel, L.; Gudyś, A. GuideR: A guided separate-and-conquer rule learning in classification, regression, and survival settings. Knowl.-Based Syst. 2019, 173, 1–14. [Google Scholar] [CrossRef] [Green Version]
- Fürnkranz, J. Separate-and-Conquer Rule Learning. Artif. Intell. Rev. 1999, 13, 3–54. [Google Scholar] [CrossRef]
- Valmarska, A.; Lavrač, N.; Fürnkranz, J.; Robnik-Šikonja, M. Refinement and selection heuristics in subgroup discovery and classification rule learning. Expert Syst. Appl. 2017, 81, 147–162. [Google Scholar] [CrossRef]
- Kotsiantis, S.B. Decision Trees: A Recent Overview. Artif. Intell. Rev. 2013, 13, 261–283. [Google Scholar] [CrossRef]
- Nguyen, H.S. Approximate Boolean reasoning: Foundations and applications in data mining. In Transactions on Rough Sets V; Peters, J.F., Skowron, A., Eds.; Springer: Berlin/Heidelberg, Gernamy, 2006; Volume 4100, pp. 334–506. [Google Scholar]
- Stańczyk, U.; Zielosko, B.; Żabiński, K. Application of Greedy Heuristics for Feature Characterisation and Selection: A Case Study in Stylometric Domain. In Proceedings of the Rough Sets–International Joint Conference, IJCRS 2018, Quy Nhon, Vietnam, 20–24 August 2018; Nguyen, H.S., Ha, Q., Li, T., Przybyla-Kasperek, M., Eds.; Springer: Berlin/Heidelberg, Gernamy, 2018; Volume 11103, pp. 350–362. [Google Scholar]
- Amin, T.; Chikalov, I.; Moshkov, M.; Zielosko, B. Relationships Between Length and Coverage of Decision Rules. Fundam. Inform. 2014, 129, 1–13. [Google Scholar] [CrossRef]
- Stańczyk, U.; Zielosko, B. Heuristic-based feature selection for rough set approach. Int. J. Approx. Reason. 2020, 125, 187–202. [Google Scholar] [CrossRef]
- Zielosko, B.; Żabiński, K. Optimization of Decision Rules Relative to Length Based on Modified Dynamic Programming Approach. In Advances in Feature Selection for Data and Pattern Recognition; Intelligent Systems Reference Library; Stańczyk, U., Zielosko, B., Jain, L.C., Eds.; Springer: Berlin/Heidelberg, Gernamy, 2018; Volume 138, pp. 73–93. [Google Scholar]
- Shang, Y. A combinatorial necessary and sufficient condition for cluster consensus. Neurocomputing 2016, 216, 611–616. [Google Scholar] [CrossRef]
- Tan, P.; Steinbach, M.; Karpatne, A.; Kumar, V. Introduction to Data Mining; Pearson: London, UK, 2019. [Google Scholar]
- Świeboda, W.; Nguyen, H.S. Rough Set Methods for Large and Spare Data in EAV Format. In Proceedings of the 2012 IEEE RIVF International Conference on Computing Communication Technologies, Research, Innovation, and Vision for the Future, Ho Chi Minh City, Vietnam, 27 February–1 March 2012; pp. 1–6. [Google Scholar]
- Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Chapman and Hall/CRC: Boca Raton, FL, USA, 1984. [Google Scholar]
- Agrawal, R.; Srikant, R. Fast algorithms for mining association rules in large databases. In VLDB; Bocca, J.B., Jarke, M., Zaniolo, C., Eds.; Morgan Kaufmann: San Francisco, CA, USA, 1994; pp. 487–499. [Google Scholar]
- Kowalski, M.; Stawicki, S. SQL-Based Heuristics for Selected KDD Tasks over Large Data Sets. In Proceedings of the Federated Conference on Computer Science and Information Systems, Wrocław, Poland, 9–12 September 2012; pp. 303–310. [Google Scholar]
- Sarawagi, S.; Thomas, S.; Agrawal, R. Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications. Data Min. Knowl. Discov. 2000, 4, 89–125. [Google Scholar] [CrossRef]
- Ślęzak, D. Rough Sets and Bayes Factor. In Transactions on Rough Sets III; Peters, J.F., Skowron, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 202–229. [Google Scholar]
- Mitchell, T.M. Machine Learning; McGraw-Hill: New York, NY, USA, 1997. [Google Scholar]
- Zielosko, B.; Stańczyk, U.; Żabiński, K. Ranking of attributes—Comparative study based on data from stylometric domain. In Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 26th International Conference KES-2022, Verona, Italy, 7–9 September 2022; Cristani, M., Toro, C., Zanni-Merk, C., Howlett, R.J., Jain, L.C., Eds.; Elsevier: Verona, Italy, 2022; Volume 207, pp. 2737–2746. [Google Scholar]
- Dua, D.; Graff, C. UCI Machine Learning Repository, 2017. University of California, Irvine, School of Information and Computer Sciences. Available online: http://archive.ics.uci.edu/ml (accessed on 23 March 2022).
Data Set | Number of | Length | Support | |||||
---|---|---|---|---|---|---|---|---|
Rows | Attributes | Min | Avg | Max | Min | Avg | Max | |
balance-scale | 625 | 4 | 3 | 3.64 | 4 | 1 | 2.44 | 5 |
breast-cancer | 266 | 9 | 1 | 5.61 | 9 | 1 | 2.61 | 11 |
cars | 128 | 6 | 2 | 3.90 | 6 | 1 | 79.31 | 192 |
flags | 194 | 26 | 2 | 8.88 | 20 | 1 | 1.78 | 6 |
hayes-roth-data | 69 | 5 | 1 | 2.64 | 4 | 1 | 3.81 | 12 |
house-votes | 279 | 16 | 3 | 6.14 | 16 | 1 | 31.21 | 81 |
lymphography | 148 | 18 | 1 | 8.40 | 16 | 1 | 2.85 | 6 |
tic-tac-toe | 958 | 9 | 3 | 5.71 | 8 | 1 | 6.43 | 38 |
Data Set | Number of | Length | Support | |||||
---|---|---|---|---|---|---|---|---|
Rows | Attributes | Min | Avg | Max | Min | Avg | Max | |
balance-scale | 625 | 4 | 3 | 3.64 | 4 | 1 | 2.44 | 5 |
breast-cancer | 266 | 9 | 1 | 5.58 | 8 | 1 | 2.61 | 11 |
cars | 128 | 6 | 2 | 3.52 | 5 | 2 | 79.88 | 192 |
flags | 194 | 26 | 2 | 8.88 | 20 | 1 | 1.78 | 6 |
hayes-roth-data | 69 | 5 | 1 | 2.64 | 4 | 1 | 3.81 | 12 |
house-votes | 279 | 16 | 3 | 6.09 | 13 | 1 | 31.23 | 81 |
lymphography | 148 | 18 | 1 | 8.39 | 15 | 1 | 2.85 | 6 |
tic-tac-toe | 958 | 9 | 3 | 5.71 | 8 | 1 | 6.43 | 38 |
Data Set | Number of | Length | Support | |||||
---|---|---|---|---|---|---|---|---|
Rows | Attributes | Min | Avg | Max | Min | Avg | Max | |
balance-scale | 625 | 4 | 3 | 3.00 | 3 | 2 | 3.94 | 5 |
breast-cancer | 266 | 9 | 1 | 5.28 | 6 | 1 | 3.01 | 11 |
cars | 128 | 6 | 2 | 3.11 | 4 | 6 | 82.78 | 192 |
flags | 194 | 26 | 2 | 8.79 | 16 | 1 | 1.78 | 6 |
hayes-roth-data | 69 | 5 | 1 | 2.51 | 3 | 1 | 3.94 | 12 |
house-votes | 279 | 16 | 3 | 5.94 | 10 | 1 | 31.46 | 81 |
lymphography | 148 | 18 | 1 | 8.17 | 11 | 1 | 3.01 | 6 |
tic-tac-toe | 958 | 9 | 3 | 5.29 | 6 | 1 | 6.73 | 38 |
Data Set | Number of | Length | Support | |||||
---|---|---|---|---|---|---|---|---|
Rows | Attributes | Min | Avg | Max | Min | Avg | Max | |
balance-scale | 625 | 4 | 3 | 3.41 | 4 | 1 | 3.38 | 5 |
breast-cancer | 266 | 9 | 1 | 2.97 | 6 | 1 | 2.81 | 24 |
cars | 128 | 6 | 1 | 5.57 | 6 | 1 | 6.69 | 576 |
flags | 194 | 26 | 1 | 2.04 | 4 | 1 | 2.04 | 18 |
hayes-roth-data | 69 | 5 | 1 | 2.88 | 4 | 1 | 2.33 | 12 |
house-votes | 279 | 16 | 2 | 3.17 | 6 | 1 | 22.86 | 95 |
lymphography | 148 | 18 | 1 | 2.32 | 4 | 1 | 5.34 | 32 |
tic-tac-toe | 958 | 9 | 3 | 4.12 | 5 | 1 | 7.32 | 90 |
Data Set | Number of | Length | Support | |||||
---|---|---|---|---|---|---|---|---|
Rows | Attributes | Min | Avg | Max | Min | Avg | Max | |
balance-scale | 625 | 4 | 3 | 3.41 | 4 | 1 | 3.38 | 5 |
breast-cancer | 266 | 9 | 1 | 3.52 | 8 | 1 | 3.25 | 24 |
cars | 128 | 6 | 1 | 5.44 | 6 | 1 | 8.14 | 576 |
flags | 194 | 26 | 1 | 2.23 | 9 | 1 | 2.59 | 18 |
hayes-roth-data | 69 | 5 | 1 | 2.92 | 4 | 1 | 2.56 | 12 |
house-votes | 279 | 16 | 2 | 3.29 | 5 | 1 | 32.22 | 95 |
lymphography | 148 | 18 | 1 | 2.56 | 5 | 1 | 7.70 | 32 |
tic-tac-toe | 958 | 9 | 3 | 4.32 | 7 | 1 | 13.21 | 90 |
Data Set | Number of | Length | Support | |||||
---|---|---|---|---|---|---|---|---|
Rows | Attributes | Min | Avg | Max | Min | Avg | Max | |
balance-scale | 625 | 4 | 3 | 3.41 | 4 | 1 | 3.38 | 5 |
breast-cancer | 266 | 9 | 1 | 3.29 | 6 | 1 | 4.10 | 25 |
cars | 128 | 6 | 1 | 5.45 | 6 | 1 | 8.11 | 576 |
flags | 194 | 26 | 1 | 3.26 | 6 | 1 | 5.68 | 22 |
hayes-roth-data | 69 | 5 | 1 | 2.90 | 4 | 1 | 2.87 | 12 |
house-votes | 279 | 16 | 2 | 3.56 | 7 | 2 | 40.02 | 95 |
lymphography | 148 | 18 | 1 | 2.85 | 5 | 1 | 10.83 | 32 |
tic-tac-toe | 958 | 9 | 3 | 4.20 | 6 | 2 | 13.04 | 90 |
Data Set | Std | Std | Std | |||
---|---|---|---|---|---|---|
balance-scale | 0.89 | 0.05 | 0.93 | 0.05 | 0.73 | 0.05 |
breast-cancer | 0.92 | 0.03 | 0.94 | 0.03 | 0.88 | 0.03 |
cars | 0.95 | 0.06 | 0.86 | 0.04 | 0.84 | 0.07 |
flags | 0.93 | 0.05 | 0.92 | 0.05 | 0.91 | 0.05 |
hayes-roth-data | 0.92 | 0.07 | 0.88 | 0.07 | 0.90 | 0.05 |
house-votes | 0.98 | 0.03 | 0.98 | 0.03 | 0.97 | 0.04 |
lymphography | 0.95 | 0.11 | 0.94 | 0.11 | 0.92 | 0.04 |
tic-tac-toe | 0.95 | 0.06 | 0.95 | 0.06 | 0.88 | 0.06 |
Data Set | M | Std | RM | Std | Log | Std |
---|---|---|---|---|---|---|
balance-scale | 0.94 | 0.06 | 0.95 | 0.05 | 0.95 | 0.05 |
breast-cancer | 0.94 | 0.03 | 0.95 | 0.03 | 0.95 | 0.03 |
cars | 0.97 | 0.11 | 0.97 | 0.11 | 0.97 | 0.11 |
flags | 0.97 | 0.08 | 0.99 | 0.08 | 0.99 | 0.08 |
hayes-roth-data | 0.94 | 0.07 | 0.94 | 0.07 | 0.94 | 0.07 |
house-votes | 0.99 | 0.11 | 0.99 | 0.11 | 0.99 | 0.11 |
lymphography | 0.94 | 0.05 | 0.98 | 0.06 | 0.98 | 0.06 |
tic-tac-toe | 0.97 | 0.04 | 0.98 | 0.05 | 0.98 | 0.05 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Żabiński, K.; Zielosko, B. Improved EAV-Based Algorithm for Decision Rules Construction. Entropy 2023, 25, 91. https://doi.org/10.3390/e25010091
Żabiński K, Zielosko B. Improved EAV-Based Algorithm for Decision Rules Construction. Entropy. 2023; 25(1):91. https://doi.org/10.3390/e25010091
Chicago/Turabian StyleŻabiński, Krzysztof, and Beata Zielosko. 2023. "Improved EAV-Based Algorithm for Decision Rules Construction" Entropy 25, no. 1: 91. https://doi.org/10.3390/e25010091