Abstract
The increasing volume of data generated and the shortage of professionals trained to extract value from it, raises a question of how to automate data analysis processes. This work investigates how to increase the automation in the data interpretation process by proposing a relevance classification heuristic model, which can be used to express which views over the data are potentially meaningful and relevant. The relevance classification model uses the combination of semantic types derived from the data attributes and visual human interpretation cues as input features. The evaluation shows the impact of these features in improving the prediction of data relevance, where the best classification model achieves a F1 score of 0.906.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Botia, J.A., Garijo, M., Bot’ia, J., Velasco, J., Skarmeta, A.: A Generic Datamining System. Basic Design and Implementation Guidelines (1998)
Bremm, S., von Landesberger, T., Bernard, J., Schreck, T.: Assisted descriptor selection based on visual comparative data analysis. In: Computer Graphics Forum, vol. 30, pp. 891–900. Wiley Online Library (2011)
Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, New york (2005)
de Souza, D.F.P.: Time-series classification with kernelcanvas and wisard. Thèse de doctorat, Universidade Federal do Rio de Janeiro (2015)
Dinsmore, T.W.: Automated predictive modelling (2014). [Online; posted 09-April-2014]
Duvenaud, D., Lloyd, J.R., Grosse, R., Tenenbaum, J.B., Ghahramani, Z.: Structure discovery in nonparametric regression throughcompositional kernel search (2013). arXiv preprint arXiv:1302.4922
Grosse, R., Salakhutdinov, R.R., Freeman, W.T., Tenenbaum, J.B.: Exploiting compositionality to explore a large space of model structures (2012). arXiv preprint arXiv:1210.4856
Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)
Lloyd, J.R., Duvenaud, D., Grosse, R., Tenenbaum, J.B., Ghahra-mani, Z.: Automatic construction and natural-language description of nonparametric regression models (2014). arXiv preprint arXiv:1402.4304
Lubinsky, D., Pregibon, D.: Data analysis as search. J. Econometrics 38(1–2), 247–268 (1988)
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: the next frontier for innovation, competition, and productivity (2011)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv preprint arXiv:1301.3781
Spott, M., Nauck, D.: Towards the automation of intelligent data analysis. Appl. Soft Comput. 6(4), 348–356 (2006)
St. Amant, R., Cohen, P.R.: Interaction with a mixed-initiative system for exploratory data analysis. In: Proceedings of the 2nd International Conference on Intelligent User Interfaces, pp. 15–22. ACM (1997)
St. Amant, R., Cohen, P.R.: Intelligent support for exploratory data analysis. J. Comput. Graph. Stat. 7(4), 545–558 (1998)
Záková, M., Křemen, P., Železný, F., Lavrač, N.: Automating knowledge discovery workflow composition through ontology-based planning. Autom. Sci. Eng., IEEE Trans. 8(2), 253–264 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Kamioka, E.H., Freitas, A., Caroli, F., Handschuh, S. (2016). Determining Data Relevance Using Semantic Types and Graphical Interpretation Cues. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XV. IDA 2016. Lecture Notes in Computer Science(), vol 9897. Springer, Cham. https://doi.org/10.1007/978-3-319-46349-0_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-46349-0_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46348-3
Online ISBN: 978-3-319-46349-0
eBook Packages: Computer ScienceComputer Science (R0)