A Semantics-Aware Classification Approach for Data Leakage Prevention

Sultan Alneyadi¹⁶,
Elankayer Sithirasenan¹⁶ &
Vallipuram Muthukkumarasamy¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 8544))

Included in the following conference series:

Australasian Conference on Information Security and Privacy

1644 Accesses

Abstract

Data leakage prevention (DLP) is an emerging subject in the field of information security. It deals with tools working under a central policy, which analyze networked environments to detect sensitive data, prevent unauthorized access to it and block channels associated with data leak. This requires special data classification capabilities to distinguish between sensitive and normal data. Not only this task needs prior knowledge of the sensitive data, but also requires knowledge of potentially evolved and unknown data. Most current DLPs use content-based analysis in order to detect sensitive data. This mainly involves the use of regular expressions and data fingerprinting. Although these content analysis techniques are robust in detecting known unmodified data, they usually become ineffective if the sensitive data is not known before or largely modified. In this paper we study the effectiveness of using N-gram based statistical analysis, fostered by the use of stem words, in classifying documents according to their topics. The results are promising with an overall classification accuracy of 92%. Also we discuss classification deterioration when the text is exposed to multiple spins that simulate data modification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Data Loss Prevention Using Document Semantic Signature

Ontology-Driven Data Semantics Discovery for Cyber-Security

A Parameter-Free Method for the Detection of Web Attacks

References

Raman, P., Kayacık, H.G., Somayaji, A.: Understanding Data Leak Prevention. In: 6th Annual Symposium on Information Assurance (ASIA 2011), p. 27 (2011)
Google Scholar
Mogull, R.: Understanding and Selecting a Data Loss Prevention Solution, https://securosis.com/assets/library/reports/DLP-Whitepaper.pdf
Shapira, Y., Shapira, B., Shabtai, A.: Content-based data leakage detection using extended fingerprinting. arXiv preprint arXiv:1302.2028 (2013)
Google Scholar
Kantor, A., Antebi, L., Kirsch, Y., Bialik, U.: Methods for document-to-template matching for data-leak prevention. USA Patent US20100254615 A1 (2009)
Google Scholar
Roussev, V.: Data fingerprinting with similarity digests. In: Chow, K.-P., Shenoi, S. (eds.) Advances in Digital ForensicsVI. IFIP AICT, vol. 337, pp. 207–226. Springer, Heidelberg (2010)
Chapter Google Scholar
Shu, X., Yao, D. D.: Data leak detection as a service. In: Keromytis, A.D., Di Pietro, R. (eds.) SecureComm 2012. LNICST, vol. 106, pp. 222–240. Springer, Heidelberg (2013)
Chapter Google Scholar
Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital Investigation 3, 91–97 (2006)
Article Google Scholar
Borders, K., Prakash, A.: Quantifying information leaks in outbound web traffic. In: 30th IEEE Symposium 2009 Security and Privacy, pp. 129–140 (2009)
Google Scholar
Clark, D., Hunt, S., Malacaria, P.: Quantitative analysis of the leakage of confidential data. Electronic Notes in Theoretical Computer Science 59 (2002)
Google Scholar
Hart, M., Manadhata, P., Johnson, R.: Text classification for data loss prevention. In: Fischer-Hübner, S., Hopper, N. (eds.) PETS 2011. LNCS, vol. 6794, pp. 18–37. Springer, Heidelberg (2011)
Chapter Google Scholar
Carvalho, V.R., Balasubramanyan, R., Cohen, W.W.: Information Leaks and Suggestions: A Case Study using Mozilla Thunderbird. In: Proc. of 6th Conf. on Email and Antispam (2009)
Google Scholar
Zipf, G.K.: Human behavior and the principle of least effort. Addison Wesley, Massachusetts (1949)
Google Scholar
Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. Presented at the Ann Arbor MI (1994)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24, 513–523 (1988)
Article Google Scholar
Alneyadi, S., Sithirasenan, E., Muthukkumarasamy, V.: Word N-gram Based Classification for Data Leakage Prevention. In: TrustCom, Melbourne (2013)
Google Scholar
Holme, P.: Peter Holme’s word stemmer (2011), http://holme.se/stem/
Porter, M.F.: An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14, 130–137 (1980)
Article Google Scholar
Alneyadi, S., Sithirasenan, E., Muthukkumarasamy, V.: Adaptable N-gram Classification Model for Data Leakage Prevention. Presented at the ICSPCS, Gold Coast, Australia(2013)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Communication Technology, Griffith University, Gold Coast Campus, Australia
Sultan Alneyadi, Elankayer Sithirasenan & Vallipuram Muthukkumarasamy

Authors

Sultan Alneyadi
View author publications
You can also search for this author in PubMed Google Scholar
Elankayer Sithirasenan
View author publications
You can also search for this author in PubMed Google Scholar
Vallipuram Muthukkumarasamy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre for Computer and Information Security Research, School of Computer Science and Software Engineering, University of Wollongong, Northfields Avenue, 2522, Wollongong, NSW, Australia
Willy Susilo & Yi Mu &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alneyadi, S., Sithirasenan, E., Muthukkumarasamy, V. (2014). A Semantics-Aware Classification Approach for Data Leakage Prevention. In: Susilo, W., Mu, Y. (eds) Information Security and Privacy. ACISP 2014. Lecture Notes in Computer Science, vol 8544. Springer, Cham. https://doi.org/10.1007/978-3-319-08344-5_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-08344-5_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08343-8
Online ISBN: 978-3-319-08344-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Semantics-Aware Classification Approach for Data Leakage Prevention

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Data Loss Prevention Using Document Semantic Signature

Ontology-Driven Data Semantics Discovery for Cyber-Security

A Parameter-Free Method for the Detection of Web Attacks

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Semantics-Aware Classification Approach for Data Leakage Prevention

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Data Loss Prevention Using Document Semantic Signature

Ontology-Driven Data Semantics Discovery for Cyber-Security

A Parameter-Free Method for the Detection of Web Attacks

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation