[go: up one dir, main page]

Skip to main content

Determining Data Relevance Using Semantic Types and Graphical Interpretation Cues

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis XV (IDA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9897))

Included in the following conference series:

  • 1738 Accesses

Abstract

The increasing volume of data generated and the shortage of professionals trained to extract value from it, raises a question of how to automate data analysis processes. This work investigates how to increase the automation in the data interpretation process by proposing a relevance classification heuristic model, which can be used to express which views over the data are potentially meaningful and relevant. The relevance classification model uses the combination of semantic types derived from the data attributes and visual human interpretation cues as input features. The evaluation shows the impact of these features in improving the prediction of data relevance, where the best classification model achieves a F1 score of 0.906.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://wordnet.princeton.edu.

  2. 2.

    http://github.com/ekamioka/unipassau-ada.

  3. 3.

    http://archive.ics.uci.edu/ml/.

  4. 4.

    http://plot.ly/.

  5. 5.

    https://www.edx.org/course/analytics-edge-mitx-15-071x-2#!.

  6. 6.

    http://ww2.coastal.edu/kingw/statistics/R-tutorials/multregr.html.

References

  • Botia, J.A., Garijo, M., Bot’ia, J., Velasco, J., Skarmeta, A.: A Generic Datamining System. Basic Design and Implementation Guidelines (1998)

    Google Scholar 

  • Bremm, S., von Landesberger, T., Bernard, J., Schreck, T.: Assisted descriptor selection based on visual comparative data analysis. In: Computer Graphics Forum, vol. 30, pp. 891–900. Wiley Online Library (2011)

    Google Scholar 

  • Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, New york (2005)

    Chapter  Google Scholar 

  • de Souza, D.F.P.: Time-series classification with kernelcanvas and wisard. Thèse de doctorat, Universidade Federal do Rio de Janeiro (2015)

    Google Scholar 

  • Dinsmore, T.W.: Automated predictive modelling (2014). [Online; posted 09-April-2014]

    Google Scholar 

  • Duvenaud, D., Lloyd, J.R., Grosse, R., Tenenbaum, J.B., Ghahramani, Z.: Structure discovery in nonparametric regression throughcompositional kernel search (2013). arXiv preprint arXiv:1302.4922

  • Grosse, R., Salakhutdinov, R.R., Freeman, W.T., Tenenbaum, J.B.: Exploiting compositionality to explore a large space of model structures (2012). arXiv preprint arXiv:1210.4856

  • Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)

    Article  Google Scholar 

  • Lloyd, J.R., Duvenaud, D., Grosse, R., Tenenbaum, J.B., Ghahra-mani, Z.: Automatic construction and natural-language description of nonparametric regression models (2014). arXiv preprint arXiv:1402.4304

  • Lubinsky, D., Pregibon, D.: Data analysis as search. J. Econometrics 38(1–2), 247–268 (1988)

    Article  Google Scholar 

  • Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: the next frontier for innovation, competition, and productivity (2011)

    Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv preprint arXiv:1301.3781

  • Spott, M., Nauck, D.: Towards the automation of intelligent data analysis. Appl. Soft Comput. 6(4), 348–356 (2006)

    Article  Google Scholar 

  • St. Amant, R., Cohen, P.R.: Interaction with a mixed-initiative system for exploratory data analysis. In: Proceedings of the 2nd International Conference on Intelligent User Interfaces, pp. 15–22. ACM (1997)

    Google Scholar 

  • St. Amant, R., Cohen, P.R.: Intelligent support for exploratory data analysis. J. Comput. Graph. Stat. 7(4), 545–558 (1998)

    Google Scholar 

  • Záková, M., Křemen, P., Železný, F., Lavrač, N.: Automating knowledge discovery workflow composition through ontology-based planning. Autom. Sci. Eng., IEEE Trans. 8(2), 253–264 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eduardo Haruo Kamioka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Kamioka, E.H., Freitas, A., Caroli, F., Handschuh, S. (2016). Determining Data Relevance Using Semantic Types and Graphical Interpretation Cues. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XV. IDA 2016. Lecture Notes in Computer Science(), vol 9897. Springer, Cham. https://doi.org/10.1007/978-3-319-46349-0_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46349-0_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46348-3

  • Online ISBN: 978-3-319-46349-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics