MODULE - 2
Chapter 1
Data Hiding in Text
Instructor :
Manjula .M
CSE(DATA SCIENCE)
CONTENTS
What is Data Hiding in Text?
Applications of Data Hiding
Water Marking in Data Hiding
Intuitive Methods
Simple Digital Methods
Innocuous Text
Mimic Function
What is Data Hiding in Text?
Data Hiding in Text is the concealment of secret data within text using
steganography or encoding techniques.
Data Hiding is also called as steganography
Basic Features :
1.Encoding Methods – Using whitespace, font variations, or invisible
characters to hide information.
2.Substitution Techniques – Replacing characters or words with predefine d
symbols or encoded text.
3.Compression & Encryption – Ensuring hidden data remains compact and
secure.
4.Retrieval Mechanism – Decoding hidden data accurately without altering
the original text format.
5.Robustness – Resisting detection and preserving hidden data after
transformations like copy-pasting or reformatting.
Applications of Data Hiding:
Watermarking – Protecting intellectual property by embedding ownership
information.
Secure Data Transmission – Concealing sensitive information in everyday
documents.
Authentication &Tamper Detection – Verifying document integrity by
embedding hidden markers.
Digital Rights Management (DRM) – Preventing unauthorized copying and
distribution of digital content.
Forensic &Intelligence Services – Covertly marking and tracking data for
security investigations.
Medical Data Security – Protecting confidential patient records in digital
healthcare systems.
Water Marking in Data Hiding:
Watermarking is a technique used to embed hidden information within digital
content (such as images, videos, audio, and text) to protect ownership, authenticate
sources, or prevent unauthorized copying.
The embedded watermark is usually imperceptible but can be extracted or verified
when needed.
Watermarking
Based on Visibility Based on Robustness
Water Marking Based on Visibility
Visible Watermarking: A clearly visible mark (e.g., logo or text) is added to
the content.
Example: A company logo on stock images.
Water Marking Based on Visibility
Invisible Watermarking: Embedded data is hidden within the content and
can only be detected using special tools.
Example: Copyright protection in digital documents.
Original Image Invisible water marked Image
Intuitive Methods in Data Hiding in Text
Data hiding in text involves embedding secret information within textual content in a way
that it remains unnoticed while ensuring secure communication.
Several intuitive methods are used to achieve this, ranging from simple formatting tricks to
complex steganographic techniques.
1. Whitespace Manipulation :
This method hides information by using spaces, tabs, or newlines in a specific pattern.
Since extra whitespace is usually ignored by readers, it provides an effective way to
conceal data.
Example:
A sentence with spaces encoding binary data:
"Hello world." (Three spaces =binary 111)
"Hello world." (Two spaces =binary 10)
Hidden binary sequences can later be extracted by analyzing the whitespace pattern.
Applications:
Secure message transmission
2. Invisible Characters (Zero-Width Encoding)
Certain Unicode characters
(like Zero-Width Space \u200B and Zero-Width Joiner \u200D) can be used to hide
messages without affecting visible text.
Example:
Text with hidden characters:
Normal text: "HelloWorld"
Encoded text (with zero-width characters): "HelloWorld" (\u200B after "Hello")
A decoding tool can extract the hidden message by detecting these special characters.
3. Text Substitution (Lexical Steganography)
Words are replaced with synonyms, abbreviations, or slight spelling variations to encode
secret messages while maintaining readability.
Example:
Original sentence:
"The weather is cold today."
Encoded sentence:
"The weather is chilly today."
If mapped to a predefined dictionary, different words can represent encoded values.
4.Capitalization Patterns:
Data is encoded by selectively capitalizing certain letters in a sentence to represent binary
values.
Example: Original text:
"The quick brown fox jumps over the lazy dog."
Encoded text (capitalizing key letters to encode data):
"The Quick brown Fox jumps Over the lazy Dog."
Here, every capitalized letter can represent a binary 1, while lowercase letters represent 0.
Applications:
Secret data encoding in formal documents
5. Font &Formatting Tricks
Data can be hidden using variations in font size, color, bold/italic styles, or spacing, which
are visually subtle but detectable via software.
Example:
Using bold letters to represent hidden characters:
"This is a secret message."
Extracted message: "TAM"
Applications:
Hidden text in official documents
6. Misspelling &Special Symbols
Intentional misspellings, extra punctuation, or special symbols can embed secret data.
Example: Original:
"Hello friend."
Encoded:
"Hell0 fri3nd."
Here, numbers replace specific letters, following a predefined encoding scheme.
Applications:
Covert messaging
7. Context-Based Encoding
Secret information is embedded within structured text patterns, making it retrievable
based on context and predefined rules.
Example:
Original:
"Alice and Bob will meet at the park."
Encoded with context rules (e.g., extracting every third word):
"Alice meet park" (which may represent a hidden message).
Applications:
Cryptographic messaging
Simple Digital Methods
Data hiding in text involves embedding secret information within textual content
using simple digital techniques.
These methods are widely used in steganography, watermarking, and secure
communication to protect sensitive data while maintaining the original readability
of the text.
1. Whitespace Encoding
Data is hidden using spaces, tabs, or newlines in a predefined pattern. Since
whitespace is typically ignored by readers, it can store binary or encoded
information without affecting the visible text.
Example:
A binary message ("101") can be encoded using spaces:
"Hello world." (Two spaces =binary 10)
"Hello world." (Three spaces =binary 11)
By analyzing the number of spaces, a hidden message can be extracted
2. Zero-Width Characters
Certain Unicode characters
like Zero-Width Space (\u200B), Zero-Width Joiner (\u200D), and Zero-Width Non-Joiner
(\u200C) are invisible to the human eye but can store encoded data.
Example:
A hidden message encoded with zero-width characters:
"HelloWorld" (Contains a zero-width space between "Hello" and "World")
Decoding tools can extract these characters to reveal the hidden message.
3. Capitalization Pattern
Data can be embedded by selectively capitalizing letters in a structured way, allowing for
extraction based on predefined rules.
Example:
A binary encoding system where capital letters represent 1 and lowercase represents 0:
Text: "The Quick brown Fox jumps Over the lazy Dog."
Binary Extraction: 10110 (from capitalized letters)
Applications:
Encoding hidden data in formal text
4. Font Manipulation
Data can be hidden by altering text appearance, such as changing font styles, sizes,
colors, or spacing.
Example:
Using bold letters to encode a message:
"This is a **S**ecret **M**essage" →Extracted letters: "SM"
Alternatively, different font sizes or colors can be used to represent data patterns.
Applications:
Watermarking sensitive documents
5. Character Substitution
This method replaces certain characters with visually similar alternatives to encode
hidden messages.
Example:
"Hello Friend" →"H3ll0 Fri3nd" (E →3, O →0, etc.)
This can be used with a predefined mapping to encode and decode messages.
Applications:
Steganographic text communication
6. Metadata Embedding
Metadata fields in text files (e.g., Word documents, PDFs) can store hidden
information without altering the actual text.
Example:
Hidden information can be stored in:
Document properties (e.g., author name, title, or comments)
Applications:
Secure watermarking in official documents
7. Steganographic Encoding
Custom encoding schemes can be applied within a text body using invisible or semi-
visible modifications.
Example:
Encoding binary data within structured text:
Original Text: "Alice and Bob met at noon."
Encoded: Extracting every 3rd word →"Alice met noon" (represents hidden
data).
Alternatively, specific character positions or punctuation marks can carry encoded
data.
Innocuous Text:-
Innocuous text refers to ordinary-looking, harmless text that does not raise
suspicion but may secretly contain hidden information.
It is often used in steganography, covert communication, and watermarking to
embed data without attracting attention.
Innocuous Text in Data Hiding:
1.Capitalization Encoding
Hidden data is embedded by capitalizing specific letters in a sentence.
A predefined rule determines how to extract the hidden message.
Example:
"The Quick brown Fox jumps Over the lazy Dog."
Extracting capital letters →"QFOD" (which could represent a secret
message).
2. Whitespace and Zero-Width Character Encoding
Extra spaces, tabs, or zero-width characters (e.g., \u200B Zero-Width Space)
are inserted to encode data.
The hidden message is extracted by analyzing these spaces or invisible
characters.
Example:
"Hello world. " (Extra space after the period carries encoded data).
"Helloworld." (Contains an invisible Zero-Width Space).
Without specialized tools, the hidden message remains undetected.
3. Lexical Steganography (Synonym Substitution)
Specific words in the text are replaced with synonyms that follow a hidden
encoding scheme.
A decoder maps these words back to the original secret message.
Example:
Original sentence:
"The meeting is important."
Encoded sentence:
"The meeting is crucial."
If "crucial" is predefined as a code for a hidden message, it can be
extracted later.
4. Misspelling and Typographic Errors
Deliberate misspellings or added typos can carry encoded information.
The pattern of misspellings helps reconstruct the hidden message.
Example:
"I have a grreat plan." (Extra "r" could indicate encoded data).
5. Sentence Structure and Word Positioning
Secret messages can be embedded by selecting specific words based on
position rules.
Example:
Original text:
"Alice and Bob will meet at the coffee shop tomorrow."
Hidden message extraction (e.g., every 2nd word):
"Alice Bob meet coffee." (Encodes a secret meaning).
Mimic Function in Data Hiding:
A Mimic Function is a technique used in text-based steganography where a
hidden message is transformed into innocuous-looking text that mimics a
particular writing style, structure, or format. The goal is to make the text
appear natural while secretly carrying encoded information.
How Mimic Functions Work
1.Input (Secret Message) – The actual data to be hidden.
2.Transformation Process – The message is encoded into harmless-
looking text that resembles normal communication.
3.Output (Mimicked Text) – The generated text appears innocuous,
avoiding suspicion while containing hidden data.
Techniques Used in Mimic Functions:
1. Contextual Rewriting
Converts the hidden message into text that imitates natural language.
Example: A secret phrase may be converted into a casual news article or a
story.
🔹 Example:
Secret Message: "Meet at midnight"
The receiver uses a predefined decoding rule to extract the message.
2. Structured Mimicry (Imitating Documents or Emails)
The secret message is embedded inside an email, letter, or article that
appears normal.
Uses predefined templates to hide data within typical formats.
🔹 Example:
Hidden Data: "Urgent meeting at 10 PM"
Mimicked Email:
Subject: Project Update
Dear Team,
The **update** is scheduled at **10
PM**, and we expect **urgent**
feedback. :-The bold words form a hidden
Best, message.
John
3. Sentence Expansion and Paraphrasing
The original text is modified with extra words, synonyms, or
restructured sentences while encoding the hidden message.
🔹 Example:
Secret Text: "Attack at dawn"
Mimic Output:
"The early morning brings a breathtaking sunrise, filling the sky with
golden light."
Specific words are mapped to hidden meanings.
Applications of Mimic Functions:
Covert Communication – Sending hidden messages through harmless text.
Steganographic Encoding – Concealing sensitive data within natural
language.
Digital Watermarking – Embedding ownership details in articles or reports.
Avoiding Detection – Bypassing censorship by making hidden messages
undetectable.
Only 3 Modules left........
ALL THE BEST