[go: up one dir, main page]

Skip to main content

Use Fewer Instances of the Letter “i”: Toward Writing Style Anonymization

  • Conference paper
Privacy Enhancing Technologies (PETS 2012)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 7384))

Included in the following conference series:

Abstract

This paper presents Anonymouth, a novel framework for anonymizing writing style. Without accounting for style, anonymous authors risk identification. This framework is necessary to provide a tool for testing the consistency of anonymized writing style and a mechanism for adaptive attacks against stylometry techniques. Our framework defines the steps necessary to anonymize documents and implements them. A key contribution of this work is this framework, including novel methods for identifying which features of documents need to change and how they must be changed to accomplish document anonymization. In our experiment, 80% of the user study participants were able to anonymize their documents in terms of a fixed corpus and limited feature set used. However, modifying pre-written documents were found to be difficult and the anonymization did not hold up to more extensive feature sets. It is important to note that Anonymouth is only the first step toward a tool to acheive stylometric anonymity with respect to state-of-the-art authorship attribution techniques. The topic needs further exploration in order to accomplish significant anonymity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abbasi, A., Chen, H.: Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26(2), 1–29 (2008)

    Article  Google Scholar 

  2. Narayanan, A., Paskov, H., Gong, N., Bethencourt, J., Stefanov, E., Shin, R., Song, D.: On the feasibility of internet-scale author identification. In: Proceedings of the 33rd Conference on IEEE Symposium on Security and Privacy. IEEE (2012)

    Google Scholar 

  3. Wayman, J., Orlans, N., Hu, Q., Goodman, F., Ulrich, A., Valencia, V.: Technology assessment for the state of the art biometrics excellence roadmap (March 2009), http://www.biometriccoe.gov/SABER/index.htm

  4. Brennan, M., Greenstadt, R.: Practical attacks against authorship recognition techniques. In: Proceedings of the Twenty-First Innovative Applications of Artificial Intelligence Conference (2009)

    Google Scholar 

  5. Afroz, S., Brennan, M., Greenstadt, R.: Detecting hoaxes, frauds, and deception in writing style online. In: Proceedings of the 33rd Conference on IEEE Symposium on Security and Privacy. IEEE (2012)

    Google Scholar 

  6. Eckersley, P.: How Unique Is Your Web Browser? In: Atallah, M.J., Hopper, N.J. (eds.) PETS 2010. LNCS, vol. 6205, pp. 1–18. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: Proceedings of the 29th IEEE Symposium on Security and Privacy, pp. 111–125. IEEE (2008)

    Google Scholar 

  8. Calandrino, J.A., Kilzer, A., Narayanan, A., Felten, E.W., Shmatikov, V.: You might also like: Privacy risks of collaborative filtering. In: Proceedings of the 32nd IEEE Symposium on Security and Privacy, pp. 231–246. IEEE (2011)

    Google Scholar 

  9. Dingledine, R., Mathewson, N., Syverson, P.: Tor: The second-generation onion router. In: Proceedings of the 13th Conference on USENIX Security Symposium, vol. 13, pp. 21–21. USENIX Association (2004)

    Google Scholar 

  10. Rao, J.R., Rohatgi, P.: Can pseudonymity really guarantee privacy. In: Proceedings of the Ninth USENIX Security Symposium, pp. 85–96 (2000)

    Google Scholar 

  11. Kacmarcik, G., Gamon, M.: Obfuscating document stylometry to preserve author anonymity. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 444–451. Association for Computational Linguistics (2006)

    Google Scholar 

  12. Juola, P.: Authorship attribution. Foundations and Trends in information Retrieval 1(3), 233–334 (2008)

    Article  Google Scholar 

  13. Uzuner, Ö., Katz, B.: A Comparative Study of Language Models for Book and Author Recognition. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 969–980. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. Juola, P.: Jgaap, a java-based, modular, program for textual analysis, text categorization, and authorship attribution

    Google Scholar 

  15. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)

    Article  Google Scholar 

  16. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, Philadelphia, PA, USA, pp. 1027–1035 (2007)

    Google Scholar 

  17. Clark, J.H., Hannon, C.J.: A Classifier System for Author Recognition Using Synonym-Based Features. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 839–849. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

McDonald, A.W.E., Afroz, S., Caliskan, A., Stolerman, A., Greenstadt, R. (2012). Use Fewer Instances of the Letter “i”: Toward Writing Style Anonymization. In: Fischer-Hübner, S., Wright, M. (eds) Privacy Enhancing Technologies. PETS 2012. Lecture Notes in Computer Science, vol 7384. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31680-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31680-7_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31679-1

  • Online ISBN: 978-3-642-31680-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics