Multimedia Data Hiding
Springer
Science+Business Media,
LLC
Min Wu Bede Liu
Multimedia Data Hiding
With 92 Illustrations
, Springer
MinWu Bede Liu
Department of Electrical Department of Electrical
Engineering Engineering
University of Maryland Princeton University
College Park, MD 20742 Princeton, NJ 08544
USA USA
minwu@eng.umd.edu liu@ee.princeton.edu
Library of Congress Cataloging-in-Publication Data
Wu, Min, 1974-
Multimedia data hiding / Min Wu, Bede Liu.
p. cm.
Includes bibliographical references and index.
ISBN 978-1-4419-2994-5 ISBN 978-0-387-21754-3 (eBook)
DOI 10.1007/978-0-387-21754-3
1. Multimedia systems. 2. Data encryption (Computer science) 3. Computer
security. 1. Liu, Bede. II. Title.
QA 76.575 .W85 2002
006.7-dc21 2002030240
Printed on acid-free paper.
© 2003 Springer Science+Business Media New York
Originally published by Springer-Verlag New York, Inc in 2003
Softcover reprint of the hardcover 1st edition 2003
AII rights reserved. This work may not be translated or copied in whole or in part without the
written permission ofthe publisher (Springer Science+Business Media New York), except for
brief excerpts in connection with reviews or scholarly analysis. Use in connection with any
form of information storage and retrieval, electronic adaptation, computer software, or by
similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication oftrade names, trademarks, service marks, and similar terms, even
if they are not identified as such, is not to be taken as an expression of opinion as to whether
or not they are subject to proprietary rights.
987 6 5 4 3 2 1
www.springer-ny.com
To Our Families
Preface
The digital information revolution has brought about profound changes in
our society and our life. New devices and powerful software have made it
possible for consumers worldwide to create, manipulate, share, and enjoy
the multimedia information. Internet and wireless networks offer ubiquitous
channels to deliver and to exchange multimedia information for such pur-
poses as remote collaboration, distant learning, and entertainment. With
all these advances in multimedia coding and communication technologies
over the past decade, the major hurdle for allowing much broader access of
multimedia assets and deployment of multimedia services no longer lies with
bandwidth-related issues, but with how to make sure that content is used
for its intended purpose by its intended recipients. The core issue then be-
comes the development of secure management of content usage and delivery
across communication networks.
Data hiding and digital watermarking are promising new technologies for
multimedia information protection and rights management. Secondary data
can be embedded imperceptibly in digital multimedia signals for a variety
of applications, including ownership protection, authentication, access con-
trol, and annotation. Data hiding can also be used to send side information
in multimedia communication for providing additional functionalities or for
enhancing performance. The extraction of the embedded data mayor may
not need knowledge of the original host media data. In addition to im-
perceptibility, robustness against moderate processing such as compression
is also an important consideration. The requirements of imperceptibility,
robustness and the hiding of maximum number of bits are basic for many
data hiding applications. To satisfy these conflicting requirements, attention
viii Preface
must be paid to the visual and hearing perception model and the types of
media data, viz. speech, music, line drawings, signature, natural image, etc.
In addition, different parts of the media data may have significantly differ-
ent embedding capacity. How to handle this uneven distribution of capacity
is also a challenge. Another concern of data hiding is the protection against
intentional attacks, attempts aimed at remove or obliterate the hidden data
or watermark.
This book, based on the Ph.D. dissertation of the first author [46], ad-
dresses both theoretical and practical aspects of multimedia data hiding,
and tackles both design and attack problems. It is organized in three parts:
Fundamental Issues, Algorithm and System Designs, and Attacks and Coun-
termeasures.
In Part I, we identify the key elements of data hiding through a layered
structure. Data hiding is modelled as a communication problem where the
embedded data is the signal to be transmitted. The tradeoff of robustness
versus capacity is studies for two major categories of embedding mecha-
nisms. In addition, a comprehensive solution is proposed to address the
problem caused by the unevenly distributed embedding capacity. The ques-
tion of constant bit rate versus variable bit rate hiding is also addressed.
In Part II, we present new data hiding algorithms for binary images,
grayscale and color images, and videos. These algorithms can be applied to
a variety of problems, including annotation, tamper detection, copy/access
control, fingerprinting, and ownership protection. The designs presented
here provide concrete examples regarding the choice of embedding mecha-
nisms, the selection of modulation/multiplexing techniques for hiding mul-
tiple bits, and the handling of uneven embedding capacity. The use of
data hiding in video communication to convey side information for addi-
tional functionalities or better performance is demonstrated by the novel
approaches of real-time transcoding and error concealment.
Many data hiding applications operate in a competitive environment
where an adversary has incentives to remove or obliterate the embedded
data. Thus the testing of the robustness and security via attacks is im-
portant. In the last part of the book, we discuss a number of attacks and
countermeasures for data hiding systems. The discussion begins with attack-
ing three specific types of watermarking schemes, in which full knowledge
of the watermarking algorithms is available. Attention is then turned to
the watermark attack problems for digital music under a unique compet-
itive environment, in which the watermarking algorithms are unknown to
attackers. This work is based on our participation in the recent public chal-
lenge in the form of attacking four audio watermarking technologies of the
Secure Digital Music Initiative (SDMI).
Acknowledgement Several works included in this book are in collab-
orations with our colleagues: Jeffrey Bloom (Sarnoff Corporation), Inge-
mar Cox (NEC Research Institute), Scott Craver (Princeton University),
Preface ix
Ching-Yung Lin (IBM T.J.Waston Research Center), Y-M. Lui (Signafy
Inc.), Matt Miller (NEC Research Institute), Peng Yin (Thompson Multime-
dia Laboratory), and Heather Yu (Panasonic Information and Networking
Laboratory). We have also benefited from the discussions and suggestions
from Perry Cook (Princeton University), Persi Diaconis (Stanford Univer-
sity), Bradley Dickinson (Princeton University), Edward Felten (Princeton
University), Adam Finkelstein (Princeton University), S-Y. Kung (Prince-
ton University), Nasir Memon (Polytechnic University), Shu Shimizu (IBM
Japan), Harold Stone (NEC Research Institute), Edward Tang (Johns Hop-
kins University), and Wenjun Zeng (Packet Video Corporation). We are
grateful to the Electronic Frontier Foundation and its legal team for their
effort enabling the inclusion of our work on the SDMI challenge into this
book. Special thanks to Gino J. Scarselli and Grayson Barber.
Peter Ramadge and Sanjeev Kulkarni of Princeton University read through
an earlier version of the manuscript and offered many helpful comments
and suggestions. The first author would also like to thank her colleagues
at the University of Maryland, College Park. Special thanks to K.J. Ray
Liu, Steve Marcus, Andre Tits, and Kawthar Zaki for their support and
encouragement.
We are grateful to the State of New Jersey for R&D Excellence Grant,
to the National Science Foundation for Grant MIP-9408462 and CAREER
Award CCR-0133704, and to Intel Corporation for Technology for Educa-
tion 2000 Grant. These grants supported the research work reported in this
book.
We have enjoyed working with the staff at Springer Verlag, New York, es-
pecially the Executive Editor on Computing & Information Science, Wayne
Yuhasz, the Associate Editor, Wayne Wheeler, and the Production Editor,
Antonio D. Orrantia.
The first author would like to thank her parents, Xianli Wu and Yiqi Sun,
and her husband, Xiang Yu, for their love, support, and encouragement.
Min Wu and Bede Liu
October 2002
Contents
Preface vii
List of Figures xv
List of Tables xix
1 Introduction 1
1.1 Overview of Multimedia Data Hiding. 2
1.2 Book Organization . . . . . . . . . . . 6
1.2.1 Fundamental Issues and Solutions 6
1.2.2 Algorithm and System Designs 7
1.2.3 Attacks and Countermeasures . . . 10
I Fundamental Issues 13
2 Preliminaries 15
2.1 Data Hiding Framework . . . . . . . 15
2.2 Key Elements and A Layered View . 16
3 Basic Embedding Mechanisms 19
3.1 Two Basic Embedding Mechanisms. 20
3.1.1 Probability of Detection Errors 23
3.2 Embedding Capacity . . . . . . . . . . 26
3.2.1 Capacity for Type-I Embedding. 27
xii Contents
3.2.2 Capacity of Type-II Embedding. . . . . . . 27
3.2.3 Capacity Comparison for Type-I & Type-II 29
3.2.4 Extensions and Discussions . . . . . . . . 31
3.3 Techniques for Embedding Multiple Bits. . . . . 33
3.3.1 Modulation and Multiplexing Techniques 33
3.3.2 Comparison................. 35
3.4 Chapter Summary . . . . . . . . . . . . . . . . . 37
3.5 Appendix - Derivations of Type-II Embedding Capacity 38
4 Handling Uneven Embedding Capacity 41
4.1 Quantitative Model for Uneven Embedding Capacity. 42
4.2 Constant Embedding Rate (CER) . . . . . . . . . . . 44
4.2.1 Backup Embedding . . . . . . . . . . . . . . . 45
4.2.2 Equalizing Embedding Capacity Via Shuffiing . 46
4.2.3 Practical Considerations . 50
4.2.4 Discussions . . . . . . . . . . . . . . . . 50
4.3 Variable Embedding Rate (VER) . . . . . . . . 51
4.3.1 Conveying Additional Side Information 52
4.4 Outline of Examples . . . . . . . . . . . . . . . 53
4.5 Chapter Summary . . . . . . . . . . . . . . . . 53
4.6 Appendix - Generating Shuffiing Table From A Key 54
4.7 Appendix - Analysis of Shuffiing . . . . 55
4.7.1 Joint Probability of Histogram 55
4.7.2 Mean and Variance of Each Bin. 56
4.7.3 More About E[!ff] . . . . . . . . 58
4.7.4 Approximations for Hypergeometric Distribution 59
4.7.5 More About Var[!ff] . . . . . . . . . . . . . . . 61
II Algorithm and System Designs 63
5 Data Hiding in Binary Image 65
5.1 Proposed Scheme. . . . . . . 67
5.1.1 Flippable Pixels ... 67
5.1.2 Embedding Mechanism 69
5.1.3 Uneven Embedding Capacity and Shuffiing 70
5.2 Applications and Experimental Results. . . . . 74
5.2.1 "Signature in Signature" . . . . . . . . . 74
5.2.2 Invisible Annotation for Line Drawings 74
5.2.3 Tamper Detection for Binary Document 75
5.3 Robustness and Security Considerations . . . . 76
5.3.1 Analysis and Enhancement of Robustness 77
5.3.2 Security Considerations . . . . . . . . . . 81
5.4 Chapter Summary . . . . . . . . . . . . . . . . . 83
5.5 Appendix - Details of Determining Flippability Scores 84
Contents xiii
5.6 Appendix - Recovering Images After Printing & Scanning 88
6 Multilevel Data Hiding for Image and Video 93
6.1 Multi-level Embedding. . . . . 94
6.2 Multi-level Image Data Hiding 96
6.2.1 Spectrum Partition. . . 97
6.2.2 System Design . . . . . . 101
6.2.3 Refined Human Visual Model . 103
6.2.4 Experimental Results . . 108
6.3 Multi-level Video Data Hiding. . . . . 109
6.3.1 Embedding Domain . . . . . . 109
6.3.2 Variable vs. Constant Embedding Rate . 112
6.3.3 User Data vs. Control Data . . . . . . . . 113
6.3.4 System Design and Experimental Results . 115
6.4 Chapter Summary . . . . . . . . . . . . . . . . . . 116
7 Data Hiding for Image Authentication 119
7.1 Review of Prior Art . . . . . . . . . . . . 121
7.2 Framework for Authentication Watermark. . . 122
7.3 Transform-domain Table Lookup Embedding . 123
7.3.1 Considerations for Imperceptibility & Security . 125
7.3.2 Estimating Embedded Data and Changes . 128
7.4 Design of Embedded Data . . . . . . . 129
7.4.1 Visually Meaningful Pattern. . 129
7.4.2 Content-based Features . 129
7.5 Experimental Results. . 131
7.6 Extensions..... . 134
7.7 Chapter Summary . . . 136
8 Data Hiding for Video Communications 137
8.1 Transcoding by Downsizing Using Data Hiding . 138
8.1.1 Overview of Proposed Approach . . . . . 138
8.1.2 Embedding Subblock Motion Information . 139
8.1.3 Advantages of Data Hiding . . 140
8.1.4 Experimental Results . . . . . 141
8.2 Error Concealment and Data Hiding . 143
8.2.1 Related Works . . . . . 143
8.2.2 Proposed Techniques. . 145
8.3 Chapter Summary . . . . . . . 146
III Attacks and Countermeasures 147
9 Attacks on Known Data Hiding Algorithms 149
9.1 Block Replacement Attack on Robust Watermark. . 150
9.1.1 Existing Attacks on Robust Watermarks. . . 151
xiv Contents
9.1.2 Attack via Block Replacement . 151
9.1.3 Analysis and Countermeasures . 153
9.2 Countermeasures Against Geometric Attacks . 155
9.2.1 Basic Idea of RST Resilient Watermarking . 157
9.2.2 Embedding and Detection Algorithms . 158
9.2.3 Implementation Issues . . . . . . . . . . . . . 160
9.2.4 Experimental Results . . . . . . . . . . . . . 164
9.2.5 Concluding Remarks on RST Resilient Watermarking 168
9.3 Double Capturing Attack on Authentication Watermark . 171
9.3.1 Proposed Attack . . . . . . . . . . . . . . . . 171
9.3.2 Countermeasures Against Proposed Attack . 172
10 Attacks on Unknown Data Hiding Algorithms 175
10.1 Introduction. . . . . . . . . . . . . . 175
10.1.1 SDMI Attack Setup . . . . . . . . . . . . . 176
10.1.2 Comments on Attack Setup . . . . . . . . . 177
10.2 Attacks and Analysis on SDMI Robust Watermarks . 179
10.2.1 General Approaches to Attacks . . 179
10.2.2 Attacks on Watermark-C . . . . . 180
10.2.3 Attacks on Watermark A, B & F . 184
10.2.4 Remarks. . . . . . . . . . . . . . . 187
10.3 Attacks and Analysis on SDMI Fragile Watermarks. . 188
11 Conclusions and Perspectives 191
References 193
Index 211
About the Authors 219
List of Figures
2.1 General framework of data hiding systems. 16
2.2 Layered structure of a data hiding system. . 17
3.1 Channel model for Type-land Type-II embedding 21
3.2 Distribution of detection statistics (Type-I) .... 24
3.3 Decision boundaries of Type-II embedding . . . . . 25
3.4 Computing MSE distortion by odd-even embedding. 26
3.5 Binary symmetric channel (BSC) . . . . . . . . . . . 27
3.6 Capacity of DICO and DIDO channels . . . . . . . . 29
3.7 Capacity of Type-I & Type-II embedding (AWGN noise) . 30
3.8 Illustration of the bit re-allocation nature of data hiding 33
3.9 Comparison of orthogonal modulation vs. TDM/CDM 36
4.1 An original unmarked image of 640-by-432. . . . . . . 43
4.2 Smooth blocks of Fig. 4.1 (shown in black) ...... 44
4.3 Histogram of embeddable coefficients before shuffling . 45
4.4 Symmetric backup embedding. ............ 46
4.5 Incorporate shuffling in an embedding mechanism. . . 47
4.6 Histogram of embeddable coefficients after shuffling. . 49
4.7 Illustration of random shuffling in terms of a ball game . 56
4.8 Various approximations to hypergeometric distribution . 60
4.9 Analytic, approximated, and simulated variance . 62
5.1 Block diagram of data hiding in binary images 68
xvi LIST OF FIGURES
5.2 Examples of high and low flippability scores . . . . . . . 68
5.3 Boundary pixel becoming "non-flippable" after flipping. 69
5.4 Odd-even mapping and table lookup mapping . . . . . . 70
5.5 Pixels with high flippability scores . . . . . . . . . . . . 71
5.6 Distribution of flippable pixel before and after shuffling 71
5.7 Histogram of flippable pixel before and after shuffling 72
5.8 Analysis and simulation of shuffling for binary image 73
5.9 "Signature in signature" . . . . . . . . . 75
5.10 Invisible annotation for line drawings. . . . . . 76
5.11 Data hiding in binary document image. . . . . 77
5.12 Improving robustness against small translation 78
5.13 Recovering image from printing & scanning (1) 80
5.14 Recovering image from printing & scanning (2) 81
5.15 Illustration of transitions in four directions 85
5.16 Examples of regular patterns . . . . . . . 85
5.17 Connectivity criteria between pixels . . . . 86
5.18 Graph representation of pixel connectivity. 86
5.19 One possible flippability lookup table for 3 x 3 pattern. 89
5.20 Illustration of registration marks . . . . . . . . . . . . . 89
5.21 Determining the cross point of a registration mark . . . 90
5.22 Coordinate conversion for performing scaling & de-skewing 91
6.1 Extractable data by single- & multi-level embedding . 95
6.2 Zig-zag ordering of DCT coefficients in an 8 x 8 block. 99
6.3 Comparison of different correlator detectors . . . . . . 100
6.4 Block diagram of multi-level data hiding for images. . 102
6.5 Two-level Data Hiding in Block-DCT Domain. . 102
6.6 2-D DCT basis images of 8 x 8 blocks . . . . . . . . 105
6.7 Block diagram of the refined 3-step HVS model. . . 106
6.8 Images watermarked by the proposed HVS model . . 108
6.9 Multi-level data hiding for Lenna image (512x512) . 110
6.10 Multi-level data hiding for the Baboon image (512x512) . 110
6.11 Methods for handling frame jittering. . . . . . . . . .. . 111
6.12 Block diagram of the proposed video data hiding system. . 115
6.13 Multi-level data hiding for flower garden video sequence). . 116
7.1 Block diagram for embedding authentication watermarks. . 123
7.2 Frequency-domain Embedding Via Table Lookup. . 125
7.3 Markovian property of restricted LUT generation. . 127
7.4 A binary pattern as part of the embedded data . . . 129
7.5 An original unmarked image of 640-by-432. . . . . . 131
7.6 Watermarked image without shuffling during embedding . 132
7.7 Authentication result without shuffling. . . . . . . . . . 133
7.8 Watermarked image using shuffling during embedding . 134
7.9 Authentication result using shuffling . . . . . . . . . . . 135
LIST OF FIGURES xvii
8.1 Motion vectors for downsized video. . . . . . . . . . . .. . 140
8.2 Comparison of two ways to send side information. . . .. . 141
8.3 Performance comparison of various transcoding methods. . 142
8.4 Edge directed interpolation for concealing lost blocks. . 143
8.5 An example of edge directed block concealment. . . .. . 144
9.1 Block diagram of the proposed block replacement attack . 152
9.2 A watermarked image and the attacked versions . 152
9.3 Spectrum analysis of block-based transform . 155
9.4 Rectilinear tiling and image rotation . . . . . . . . 160
9.5 An image and its 2-D DFT . . . . . . . . . . . . . 161
9.6 A rotated image with zero padding and its 2-D DFT . 161
9.7 Images with dominant structure and their DFTs . . . 163
9.8 SNR histogram of watermarked images. . . . . . . . . 164
9.9 Image watermarked by proposed RST resilient algorithm. . 165
9.10 False alarm probability of RST resilient watermark . 166
9.11 Geometric attacks tested in our experiments. . 167
9.12 Detection results under rotation attack. . . . 168
9.13 Detection results under upscaling attack . . . 169
9.14 Detection results under down scaling attack . 169
9.15 Detection results under translation attack . . 170
9.16 Detection results under JPEG compression . 170
9.17 Double capturing attack on authentication watermark . 171
9.18 Countermeasure against double capturing attack . 172
10.1 Illustration of SDMI attack problem . . . . . . . . 176
10.2 Watermark detectability and perceptual quality . . 177
10.3 Waveform and spectrum analysis of SDMI Technology-C. . 181
10.4 I/O time index for time-domain jittering/warping. . 182
10.5 Graphics user interface of GoldWave shareware . 183
10.6 A 2nd order notch filter . . . . . . . . . . . . . . 184
10.7 Spectrum observation for SDMI Technology-A. . 185
10.8 Spectrum observation for SDMI Technology-B. . 186
List of Tables
3.1 Comparison of two types of embedding mechanisms . 23
3.2 Comparison of modulation/multiplexing techniques. 38
5.1 Analysis and simulation of shuffling for binary image 73
6.1 Comparison of HVS models ........... .107
6.2 Adaptive embedding rate for a video frame .. .113
6.3 Experimental results of multi-level data hiding .117
6.4 Annotated excerpt of detection log . . . . . . . · 118
7.1 Generating look-up table with constrained runs · 126
8.1 List of three schemes for experimental comparison · 141
9.1 Experimental results of block-replacement attack . · 153
9.2 Block replacement attack on global and local embedding . · 154
9.3 Detecting watermarks embedded in DFT magnitude ... .156
1
Introd uction
The digital information revolution has brought about profound changes in
our society and our life. The many advantages of digital information have
generated new opportunities for innovation and new challenges. Along with
powerful software, new devices, such as digital camera and camcorder, high
quality scanners and printers, digital voice recorder, MP3 audio player, and
multimedia personal digital assistant (PDA), have reached consumers world-
wide to create, manipulate, and enjoy multimedia data. Internet and wireless
network offer ubiquitous channels to deliver and exchange information. The
security and fair use of multimedia data, as well as the fast delivery of mul-
timedia content to a variety of end users or devices with guaranteed QoS,
are important yet challenging problems. The solutions to these problems
will not only contribute to more intellectual knowledge and understanding,
but also offer new business opportunities. This book addresses the issues
of multimedia data hiding and its applications in multimedia security and
communications.
With the ease of editing and perfect reproduction in digital domain, the
protection of ownership and the prevention of unauthorized tampering of
multimedia data has raised serious concerns. Digital watermarking and data
hiding are schemes to embed secondary data in digital media. Considerable
progress has been made on data hiding in recent years and attentions at-
tracted from both academia and industry [28]-[47]. Techniques have been
developed for a variety of applications, including ownership protection, au-
thentication, access control, and annotation. Data hiding is also a useful
general tool of sending side information during multimedia communications
for achieving additional functionalities or enhancing performance. Impercep-
2 1. Introduction
tibility, robustness against moderate processing such as compression, and
the ability to hide many bits are the basic but conflicting requirements for
many data hiding applications. In addition, a few other important prob-
lems encountered in practice, such as the uneven embedding capacity for
image/video and the perceptual models for binary images, have received
little attention in literature. The book is intended to provide a general un-
derstanding of multimedia data hiding by addressing both theoretical and
practical aspects, and tackling both design and attack problems. A num-
ber of important issues of data hiding are addressed, and new principles
and techniques are proposed. This introductory chapter first gives a brief
overview of the recent technologies and advances of data hiding, then out-
lines the problems addressed in this book and summarizes its major original
contributions.
1.1 Overview of Multimedia Data Hiding
The ideas of information hiding can be traced back to a few thousand years
ago. As surveyed in [32], simply obscuring the content of a messages by
encryption is not always adequate in practice. In many competitive sit-
uations, concealing the existence of communications is desirable to avoid
suspicion from adversaries. The word "steganography", which originated
from Greek and is still in use today, literally means "covered writing". Sto-
ries of covert communications have been passed for generations, but they
were mainly used by military and intelligence agencies. Information hiding
began receiving wide attention from research community and industry in
recent decade. Many publications and patents have appeared in the past
few years. The digital information revolution and the thriving progress in
network communications are the major driving forces of this change. The
perfect reproduction, the ease of editing, and the Internet distribution of
digital multimedia data have brought about concerns of copyright infringe-
ment, illegal distribution, and unauthorized tampering. The imperceptible
embedding of data in multimedia sources appeared as promising solutions
to alleviating these concerns. Interestingly, while most such techniques em-
bed data imperceptibly to retain the perceptual quality and the value of the
host multimedia source, many of them were referred as digital watermarking
whose traditional counterpart is not necessarily imperceptible. The analogy
emphasizes on the applications: as a technique in the art of paper mak-
ing, paper watermarks usually indicate the origin, the ownership, and/or
the integrity of the document printed on the associated pieces of paper, in
addition to their roles for artistic decorations. As the application domain
of embedding data in digital multimedia sources becomes broaden, several
terms became popular, including steganography, digital watermarking, and
data hiding. Explanation and comparison of terminologies related to infor-
mation hiding were presented in [30)[32]. To avoid unnecessary confusion
1.1 Overview of Multimedia Data Hiding 3
with terminologies, this book uses the two terms data hiding and digital
watermarking interchangeably, referring to embedding secondary data into
the primary multimedia sources. The embedded data, usually called water-
mark(s), can be used for various purposes, each of which is associated with
different robustness, security, and embedding capacity requirements. The
principal advantage of data hiding versus other solutions is its ability to
associate secondary data with the primary media in a seamless way. As we
shall see later in this book, the seamless association is desirable in many ap-
plications. For example, compared with cryptographic encryptions [25J [26J,
the embedded watermarks can travel with the host media and assume their
protection functions even after decryption. With the exception of visible
watermarks that will be discussed below, the secondary data are expected
to be imperceptible.
There are many ways to categorize data hiding techniques. A straightfor-
ward classification is according to the type of primary multimedia sources,
leading to data hiding systems for perceptual and non-perceptual sources.
This book is primarily concerned with perceptual multimedia sources, in-
cluding audio, binary image, color or grayscale image, video, and 3-D graph-
ics. Among digital sources, the major difference between perceptual and
non-perceptual data is that the non-perceptual data, such as text and exe-
cutable codes, usually requires lossless processing, transmission and storage.
Flipping a single bit may lead to different meaning. Perceptual data, how-
ever, has a perceptual tolerance range, which allows minor change before be-
ing noticed by human. This perceptual property enables data embedding as
well as lossy compression either imperceptibly or with a controllable amount
of perceptual degradation. Although many general techniques of data hid-
ing can be applied to audio, image, video, and 3-D graphics [30][116J, there
are unique treatments associated with each type of perceptual sources. The
main reason is that they are related to particular senses, and the way we
see things is quite different from the way we hear. Proper perceptual models
have to be exploited to ensure the host data is changed in such a way that
no noticeable difference is introduced [14][88J. Dimensionality and causality
are another two reasons leading to different treatments. The techniques and
resources required for processing 1-D data would be quite different from
that for 2-D and 3-D data. Similar argument holds for non-progressive data
(such as image) versus progressive data (such as audio and video). We shall
clarify that the perceptual property is not a necessity for hiding data. There
are changes that can be made to non-perceptual data while preserving the
semantic meaning. For example, a word can be changed to its synonyms,
special pattern of spaces and blank lines can be added to computer source
code, and jumping instructions in assembly code can be rearranged [121J.
These changes can be used to enforce certain relations (either deterministi-
cally or statistically) to encode secondary data, as we do for hiding data in
perceptual sources. This book focuses on data hiding in audio, image, and
video. Interested readers may refer to the literature for detailed discussions
4 1. Introduction
on data hiding in 3-D graphics data [114, 115, 116, 117J and non-perceptual
sources [120][121 J.
In terms of perceptibility, data hiding techniques can be classified into
two groups, perceptible and imperceptible hiding. Perceptible watermarks
are mainly used in image and video. A visually meaningful pattern, such as
a logo, is overlaid on an image or video, which is essentially an image editing
or synthesis problem. The visible watermarks explicitly indicate the copy-
right, ownership information, or access control policies so as to discourage
the misuse of watermarked content. For example, semi-transparent logos are
commonly added to TV programs by broadcasting networks, and to the pre-
view images accessible via World Wide Web by copyright holders. In [118],
a visible watermarking technique is proposed by modifying the luminance
of the original image according to a binary or ternary watermark pattern.
The amount of modification is adaptive to the local luminance to give a
consistent perceptual contrast [14J. In addition, the modification is modu-
lated by a random sequence to make it difficult to systematically remove
the visible marks via an automated algorithm. Video can be visibly marked
using similar techniques [119J. The majority of current data hiding research,
however, concerns with imperceptible watermarking. It is also the focus of
this book. As mentioned earlier, perceptual models need to be exploited to
ensure the changes imposed by an embedding system are imperceptible to
retain the perceptual quality and the value of multimedia content.
Application domain is another criterion to categorize data hiding tech-
niques. Classic applications include ownership protection, authentication,
fingerprinting, copy / access control, and annotation. We shall briefly ex-
plain the design requirement of each application:
• Ownership Protection: a watermark indicating ownership is embedded
in the multimedia source. The watermark, known only to the copy-
right holder, is expected to survive common processing and intentional
attack so that the owner can show the presence of this watermark
in case of dispute to demonstrate his/her ownership. The detection
should have as little ambiguity and false alarm as possible. The total
embedding capacity, namely, the number of bits that can be embedded
and extracted reliably, does not have to be high in most scenarios .
• Authentication or Tampering Detection: a set of secondary data is
embedded in the multimedia source beforehand, and later is used to
determine whether the host media is tampered or not. The robust-
ness against removing the watermark or making it undetectable is not
a concern as there is no such incentive from attacker point of view.
However, forging a valid authentication watermark in an unauthorized
or tampered multimedia signal must be prevented. In many practical
applications, it is also desirable to locate the tampered regions and
distinguish some changes (such as the non-content change incurred by
moderate lossy compression) from some other changes (such as con-
1.1 Overview of Multimedia Data Hiding 5
tent tampering). In general, the embedding capacity has to be high to
accommodate these needs. The detection should be performed with-
out the original unwatermarked copy because either this original is
unavailable or its integrity has not been established yet. This kind of
detection is usually known as non-coherent detection or blind detec-
tion.
• Fingerprinting or Labelling: the watermark in this application is used
to trace the originator or recipients of a particular copy of multimedia
content. For example, different watermarks are embedded in different
copies of a multimedia signal before distributing to a number of re-
cipients. The robustness against obliterating and the ability to convey
a non-trivial number of bits are required. In addition, digital finger-
printing techniques should also be robust against collusion when users
having access to the same host image embedded with different finger-
print IDs get together and try to remove the fingerprints through such
operations as averaging [140, 141, 142, 143].
• Copy Control & Access Control: the embedded watermark in this ap-
plication represents certain copy control or access control policy. A
watermark detector is often integrated in a recording/playback sys-
tem, such as the proposed DVD copy control [106] and the on-going
SDMI activities [160]. Upon detection, the policy is enforced by direct-
ing certain hardware or software actions such as enabling or disabling
a recording module. The robustness against removal, the ability of
blind detection, and the capability of conveying a non-trivial number
of bits are required.
• Annotation: the embedded watermark in this application is expected
to convey as many bits as possible without the use of original un-
marked copy in detection. While the robustness against intentional
attack is not required, a certain degree of robustness against common
processing such as lossy compression may be desired.
More generally, data hiding is a tool to convey side information while retain-
ing the original appearance. This property is also found useful in multimedia
communications [147] to achieve additional functionalities or better perfor-
mance.
Data hiding can be considered as a communication problem where the
watermark is the signal to be transmitted. Many communication theories
and techniques are found useful in studying data hiding. A fundamental
problem is the total embedding capacity. It is impossible to answer how
many bits can be embedded if without specifying the required robustness.
This is not hard to understand from information theory perspective, where
the capacity is tied with a specific channel model and is a function of the
channel parameters. The classic results of channel capacity in informa-
tion theory [6], including the capacity theorem in terms of an optimization
6 1. Introduction
of mutual information between the channel input and output, the AWGN
channel capacity, the capacity of parallel channels, and the zero-error ca-
pacity, have been found beneficial toward the understanding of data hiding
capacity [42, 65, 75, 76, 77, 78, 79, 80, 83]. However, there are many im-
portant differences between data hiding and conventional communications.
First, the types of noise incurred by processing or intentional attack are
diverse and rather complicated to model. Second, the shape and parameter
constraints of watermark signals are determined by human perceptual sys-
tem, which is far more sophisticated than a simple L2 model and has not
been completely understood. These differences limit the direct application
of information theoretical results to practical data hiding problems.
In addition to the total embedding capacity, we notice another funda-
mental problems associated with data hiding. Due to the non-stationary
nature of perceptual sources, the amount of data that can be embedded
varies significantly from region to region. Such uneven embedding capacity
adds great difficulty to high-rate embedding. This problem does not receive
much attention in literature as a highly suboptimal approach is often used in
practice by embedding a predetermined small number of bits to each region.
Although the low constant rate embedding seems work well in experiments
involving only a few test sources, where the embedding rate can be tuned to-
ward this small test set, it encounters serious difficulties in practical systems
that need to accommodate much more diverse sources. The simple constant
rate embedding not only wastes much embedding capacity in regions that
are capable of hiding many bits, but also create dilemma in regions that
can hardly embed any bits without introducing noticeable artifacts. Solu-
tions to this problem would substantially improve the performance of many
practical systems.
1.2 Book Organization
This book is organized into three parts: Fundamental Issues (Part I), Al-
gorithm and System Designs (Part II), and Attacks and Countermeasures
(Part III). We conclude the book with final remarks and suggestions for
further study in Chapter 11.
1.2.1 Fundamental Issues and Solutions
We begin our discussion with a general framework and a list of key ele-
ments shared by almost all data hiding problems in Chapter 2. A layered
view analogous to network communications is presented to show the rela-
tions among those key elements. This view point motivates the divide-and-
conquer strategies for data hiding problem so that the general approaches
for each element can be pursued, based on which solutions to specific appli-
cations can be systematically found.
1.2 Book Organization 7
In Chapter 3, we consider data hiding as a communication problem where
the embedded data is the signal to be transmitted. Different embedding
mechanisms target at different robustness-capacity tradeoff. We study this
tradeoff for two major categories of embedding mechanisms, including the
embedding capacity of simplified channel models and the set-partitioning
nature. This study serves as a guideline for selecting an appropriate embed-
ding algorithm given the design requirements of an application, such as the
proposed data hiding algorithms for binary images (Chapter 5) and data
hiding applications in video communications (Chapter 8). It also serves as
a foundation of multi-level data hiding (Chapter 6), leading to a new em-
bedding paradigm with improved performance. In addition, we discuss a
number of modulation/multiplexing techniques for embedding multiple bits
in multimedia signals. While many data hiding publications use one Or sev-
eral modulation techniques, there is little systematic study and justification
regarding how to embed multiple bits given a set of design requirements.
Our work compares the advantages and disadvantages of various techniques
in a quantitative way. The principles discussed here are used extensively in
our algorithm and system designs.
Due to the non-stationary nature of natural multimedia source such as
digital image, video and audio, the number of bits that can be embedded
varies significantly from segment to segment. This unevenly distributed em-
bedding capacity adds difficulty in data hiding: using constant embedding
rate generally wastes embedding capacity, while using variable embedding
rate requires sending additional side information that can be an expensive
overhead. There has been few solution in literature. In Chapter 4, we ad-
dress this problem and propose a comprehensive solution. Specifically, when
the total number of bits that can be embedded is much larger than the
number of bits needed to convey how many bits are embedded, we choose
variable embedding rate and hide the side information via appropriate em-
bedding and multiplexing techniques to facilitate detection. When the two
bit numbers are comparable, we hide data at a constant rate and incorpo-
rate shuffling. We will show via analysis and experiments that shuffling is
an efficient and effective tool to equalize the uneven embedding capacity.
The solutions to uneven embedding capacity problem are applied to many
of our designs presented in Part II.
1.2.2 Algorithm and System Designs
In Part II, we present new data hiding algorithms for binary images, grayscale
and color images, and videos. For each design, we follow the list of key ele-
ments discussed in the fundamental part, explaining how these elements are
handled. We shall see concrete examples regarding the choice of embedding
mechanism, the selection of modulation/multiplexing technique(s) for hid-
ing multiple bits, and the handling of uneven embedding capacity via such
techniques as random shuffling.
8 1. Introduction
We begin with designing data hiding algorithms for binary images in
Chapter 5. Embedding data in binary images are generally considered diffi-
cult because there is little room to make invisible changes. There has been
very few work in the literature on human visual model for binary images.
The few existing data hiding works for binary images are usually only ap-
plicable to a specific type of binary image, and the number of bits that
can be embedded is limited. We propose a new algorithm to hide data in
a wide variety of binary images, including digitized signature, text docu-
ment, and drawings. The proposed algorithm can be used to annotate and
authenticate binary images. The algorithm consists of several modules, ad-
dressing respectively (1) how to identify flippable pixels while maintaining
visual quality, (2) how to use flippable pixels to embed data, and (3) how
to handle uneven embedding capacity from region to region. The embedded
data can be extracted not only from a digital copy, but also from a printed
hard copy. The conditions and the method to recover the original digital
image after printing and scanning are discussed, along with security issues,
practical considerations, and three sample applications.
We have discussed in the fundamental part the tradeoff between robust-
ness and embedding capacity for specific embedding mechanisms. When
designing a data hiding system, considering a single tradeoff setting, which
is common in practice, may either overestimate or underestimate the actual
distortions. We propose multi-level embedding in Chapter 6 to allow the
amount of extractable data to be adaptive according to the actual noise
condition. When the actual noise condition is weak, many bits can be ex-
tracted from a watermarked image; when the noise is strong, a small number
of bits can still be extracted with very small probability of error. Analysis is
presented to support this idea. A multi-level data hiding system for grayscale
and color images is designed with experimental results being presented. The
design also uses a refined human visual model that provides reduced arti-
facts with a reasonable amount of computation. We then extend the work
to hide a large amount of data in video. Our multi-level data hiding system
for video presents concrete examples of handling the uneven embedding ca-
pacity from region to region within a frame and also from frame to frame.
Furthermore, a small amount of side information, so-called control bits, are
crucial for handling uneven embedding capacity and for combating frame
jitter that may occur during transcoding or intentional attacks. We shall
explain how to convey these bits via various modulation/multiplexing tech-
niques.
We mentioned earlier that editing digital multimedia data is much easier
than the traditional analog version. In many occasions, it is important to de-
termine whether a digital copy has been tampered. In Chapter 7, we discuss
data hiding algorithms for tamper detection of grayscale and color images.
Many general data hiding techniques, such as the embedding mechanism
and shuffling, are used in this specific application. In the mean time, issues
that are unique to authentication need to be addressed, including issues on
1.2 Book Organization 9
what to authenticate, how to authenticate, and security considerations. We
present a framework for watermark-based authentication covering these as-
pects. Following this framework, we design a specific image authentication
system, aiming at signaling and locating tampering as well as at distinguish-
ing non-content changes (such as moderate lossy compression) from content
tampering. This distinguishability is important because many digital im-
ages and videos are stored in lossy compressed format for efficient storage
and transmission, and excessive fragility of an authentication system that
is unable to tolerate the change incurred by compression is undesirable.
Our design uses a transform domain table look-up embedding mechanism
to embed a visually meaningful pattern and a set of content features in pre-
quantized DCT coefficients. The detection of tampering utilizes both the
semi-fragility of the embedding mechanism and the information about the
key image features conveyed by the embedded data. This provides adjusta-
bility in the degree of distinguishability for content vs. non-content changes,
hence is suitable to accommodate a variety of authentication applications.
Our experimental results also show that applying shuffling helps to embed
more data, enabling better distinguishability between non-content and con-
tent changes while preserving visual quality. Extensions to color image and
video are discussed at the end of the chapter.
Besides the classic use in ownership protection, authentication, and copy /
access control, data hiding serves as a general tool to convey side informa-
tion. In Chapter 8, we propose novel applications of data hiding in video
communications, where the embedded side information helps to achieve ad-
ditional functionalities or better performance. We start with the problem
of real-time transcoding, where a new video bitstream with a lower bit rate
is generated from an existing high bit-rate one to cope with the bandwidth
limitation. The reduction of spatial resolution can significantly reduce the
bit rate, but the processing, mostly used for motion estimation and motion
compensation, is rather involved. We propose a fast compressed-domain ap-
proach to obtain from an MPEG stream a new MPEG stream with half
the spatial resolution. The key idea to alleviating the bottleneck of motion
estimation and motion compensation is to directly use as much information
as possible from the original full size video. Our solution is supported by a
novel standard-and-customized decoding framework based on data hiding.
That is, the transcoded bit stream still maintains standard compliant ap-
pearance and can be decoded by standard decoder with reasonable visual
quality; in the mean time, better image quality will be obtained if a cus-
tomized decoder that can extract the embedded information is available. We
present justifications regarding the advantage of data hiding versus other
methods for conveying the side information. We then move onto the error
concealment problem that is commonly used to compensate the perceptual
quality reduction caused by transmission errors. After discussing the con-
nections between error concealment and data hiding and reviewing a few
related works, we present an error concealment system that consists of a
10 1. Introduction
data hiding module to protect P-frame motion vectors by embedding mo-
tion parity bits in the DCT coefficients of I-frames.
1.2.3 Attacks and Countermeasures
Many applications of data hiding, such as ownership protection, copy/access
control, and authentication, operate in competitive environment where an
adversary has incentives to obliterate the embedded data. Testing the ro-
bustness and security of a data hiding system via attacks is as important as
the design process and can be regarded as an inseparable element of a good
design process. In Part III, we discuss a number of attacks and countermea-
sures for data hiding systems, aiming at not only identifying the weaknesses
of existing design algorithms and suggesting improvement, but also obtain-
ing a better understanding of what data hiding can do and cannot do for
the above mentioned applications.
We begin our study with three specific types of watermarking schemes in
Chapter 9, to which analysts have full knowledge of the watermarking algo-
rithms and are able to perform attack experiments without much limitation.
The novel block replacement attack in Section 9.1 targets on removing ro-
bust watermarks embedded locally in an image. The attack has discovered
an important weakness of block-based embedding mechanism that has been
neglected in literature. Possible causes of the vulnerability against the pro-
posed attack are analyzed, along with a discussion of countermeasures. In
Section 9.2, we shall present a countermeasure against geometric attacks on
robust image watermarks, which have been considered as a big challenge.
Our solution embeds and detects watermarks in a domain that is related to
special properties of Fourier transform and is resilient to rotation, scale, and
translation. Many important implementation issues are discussed, followed
by experimental results on thousands of images. The chapter is concluded
with a double capturing attack for forging fragile watermarks in Section 9.3.
This attack touches a fundamental aspect of image authentication, namely,
the authenticity is always relative with respect to a reference. Counter-
measures of embedding additional data are proposed, aiming at detecting
multiple captures or non-natural captures.
Chapter 10 discusses attacks under a unique emulated competitive envi-
ronment, in which analysts have no knowledge of the watermarking algo-
rithms. This interesting study and experimental results are based on our
participation in the recent public challenge in the form of attacking four
audio watermarking technologies organized by the Secure Digital Music Ini-
tiative (SDMI). We begin our discussion with the challenge setup, com-
menting on a few unrealistic aspects that made the challenge much more
difficult than a real-world scenario. General approaches for tackling the at-
tack problem are proposed. Following this general framework, we use two
successful attacks as examples to demonstrate our attack strategies, to de-
scribe the specific implementation, and to present analysis in detail. For
1.2 Book Organization 11
completeness, other successful attacks are also briefly explained. While the
challenge is designed to test robust watermarks, we notice that an SDMI
system may consist of both robust and fragile watermarks. Having found
that the fragile watermark is for a special use of tamper detection and that
its security is important to an SDMI system, we present a potential attack
on fragile watermarks and a countermeasure to conclude the chapter.
Part I
Fundamental Issues
2
Preliminaries
2.1 Data Hiding Framework
A typical data hiding framework is illustrated in Fig. 2.1. Starting with
an original digital media (10 ), which is also commonly referred to as the
host media or cover media, the embedding module inserts in it a set of
secondary data (!!.), which is referred to as embedded data or watermark,
to obtain the marked media (It). The insertion or embedding is done such
that 11 is perceptually identical to the original 10 . The difference between
It and 10 is the distortion introduced by the embedding process.
In most cases, the embedded data is a collection of bits, which may come
from an encoded character string, a pattern, or some executable agents,
depending on the applications. For generic hidden data, we concern the
bit-by-bit accuracy when extracting them from the marked media. The em-
bedded data can also come from a perceptual source, such as the application
of "image in image" and "video in video" [107] [108]. Moderate decay of the
hidden data is tolerable in this case.
The embedded data b is to be extracted from the marked media It by
a detector, often after It has gone through various processing and attacks.
The input to the detector is referred to as test media (h), and the difference
between h and It is called noise. The extracted data from 12 is denoted by
~. In such applications as ownership protection, fingerprinting / recipient
tracing, and access control, accurate decoding of hidden data from distorted
test media is preferred. They are commonly referred as robust data hiding.
In other applications such as authentication and annotation, robustness
16 2. Preliminaries
against processing and attacks are not a principal requirement in general.
We will discuss the design requirements for a few specific applications in
later chapters.
host II~'
media (/0) '
I I-l
~ marked media (I,)
tI ~
~
r.;J
~
.l... . .,
data to be
hidden (Q) - ----i._ fl ----:e:m~b=e:;d-l-----=~:..:...~
..
l compress l
- - - - - - - - - - - - - - --I
customized player 1 '::::::::I::::::::
1 l process / i
• 1
1 l attack l
'l • • • • • • • • • • ••• • • • • "
1
test media (12)
• extract
~ -- ------------ - -
FIGURE 2.1. General framework of data hiding systems
2.2 Key Elements and A Layered View
The key elements in many data hiding systems include [86]:
1. A perceptual model that ensures imperceptibility,
2. A mechanism for embedding one bit,
3. Techniques for embedding multiple bits via appropriate modulation/
multiplexing techniques,
4. What data to embed,
5. How to handle the parts of host media in which it is difficult to embed
data, and
6. How to enhance robustness and security.
These elements can be viewed in layers (Fig. 2.2), analogous to the lay-
ered structure in network communication [9]. The lower layers deal with
how one or multiple bits are embedded imperceptibly in the original media.
The three related key elements are: (1) the mechanism for embedding one
2.2 Key Elements and A Layered View 17
......
Compression and encoding
... ...'"
~
=-~ Security
=-~
~~
Error Correction
Equalization of uneven capacity
... ...'"
~
Multiple-bit embedding
~ ~
Q ~
~~ Imperceptible embedding of one bit
FIGURE 2.2. Layered structure of a data hiding system.
bit, (2) the perceptual model to ensure imperceptibility, and (3) the mod-
ulation/multiplexing techniques for hiding multiples bits. Upper layers for
achieving additional functionalities can be built on top of these lower layers,
for example, to handle uneven embedding capacity, to enhance robustness
and approach capacity via error correction coding, and to incorporate ad-
ditional security measures. In the remaining chapters of Part I, we shall use
data hiding in images as an example to discuss a few elements in detail.
3
Basic Embedding Mechanisms
As discussed in Chapter 1, data hiding can be considered as a commu-
nication problem where the embedded data is the signal to be conveyed.
Communication theories and techniques have been found helpful in study-
ing data hiding. A fundamental problem is the embedding capacity. That
is, how many bits can be embedded in a host signal. The answer depends
on the required robustness.
Earlier works regarding the embedding capacity focused on spread spec-
trum additive watermarking, by which a noise-like watermark is added to a
host image and is later detected via a correlator [48, 49, 50]. This embed-
ding can be modelled as communication over a channel with additive white
gaussian noise (AWGN) [80][83]. Other researchers studied the bounds of
embedding capacity under blind detection [73, 75, 79]. Zero-error capac-
ity has been studied for a watermark-based authentication system under
magnitude-bounded noise [42], using the principles originally proposed by
Shannon [6][81]. In [74], Costa showed that the channel capacity under two
additive Gaussian noises with one known to the sender equals to the capac-
ity in the absence of the known noise. This result has been incorporated in
information theoretical formulations of data hiding [64, 76, 77, 78].
The gap between the theoretical embedding capacity in data hiding and
what is achievable in practice can be bridged by investigation of such issues
as basic embedding mechanisms for embedding one bit and modulation/
multiplexing techniques for embedding multiple bits. In this chapter, we
study these issues in detail. We pay particular attention to the following
problems [86]:
20 3. Basic Embedding Mechanisms
• Distortion during and after embedding: The distortion introduced by
watermarking must be imperceptibly small for commercial or artistic
reasons. However, an adversary intending to obliterate the watermark
may be willing to tolerate higher degree of distortion.
• Actual noise conditions: An embedding system is generally designed
to survive certain noise conditions. The watermarked signal may en-
counter a variety of legitimate processing and malicious attacks, so
the actual noise can vary significantly. Targeting conservatively at sur-
viving severe noise would lead to the waste of actual payload, while
targeting aggressively at light noise could result in the corruption of
embedded bits. In addition, some bits, such as the ownership informa-
tion, are required to be more robust than others.
• Non-stationarity: The amount of data that can be embedded often
vary widely from region to region in image and video. This uneven
embedding capacity causes serious difficulty to high-rate embedding.
The commonly adopted solution of embedding a predetermined small
number of bits to each region is not suitable for practical systems that
need to accommodate diverse signals whose embedding capabilities
vary in a wide range.
In this chapter, we study the robustness vs. capacity tradeoff for two ma-
jor types of embedding mechanisms. The embedding capacity of simplified
channel models for these two embedding types are compared. These studies
serve as a guideline for selecting an appropriate embedding algorithm given
the design requirements of an application, and as a foundation of a new em-
bedding framework known as multi-level data hiding (Chapter 6). We also
discuss in this chapter a number of modulation/multiplexing techniques
for embedding multiple bits, quantitatively comparing their advantages and
disadvantages. The discussion of the uneven embedding capacity problem
will be presented in the next chapter.
3.1 Two Basic Embedding Mechanisms
The embedding of one bit in original media is basic to every data hiding
system. There are many ways to classify embedding schemes, for example,
some schemes work with the multimedia signal samples while others work
with transformed data. We found it beneficial to use the following classi-
fication of embedding mechanisms, which was proposed independently in
[64], [76], and [110]. Many embedding approaches belong to one of these
two general types.
In Type-I, the secondary data, possibly encoded, modulated, and/or scaled,
is added to the host signal, as illustrated in Fig. 3.1{a). The addition can
be performed in a specific domain or on specific features. To embed one
3.1 Two Basic Embedding Mechanisms 21
channel
r---.
-----------------------------~I
bits to be marked media
embedded test media
{b;} ---I."
h
h
10
host media noise
I
I
~----------------------- ______ I
(a)
channel
bits to be marked media
embedded test media
{bd • h h
I
I
10 I
host media :I noise
I
L ____________ _
(b)
FIGURE 3.1. Channel model for Type-I (a) and Type-II (b) embedding
bit b, the difference between marked signal h and the original host signal
10 is a function of b, i.e., 11 -10 = f(b). 10 can be regarded as a major
noise source in detection. Although it is possible to detect b directly from
h [61], the knowledge of 10 will enhance detection performance by elimi-
nating the interference. Cox et al. also modelled the additive embedding as
communication with side information and proposed techniques of "informed
embedding" to reduce (but not completely eliminate) the negative impact
from host interference [51][56]. Additive spread spectrum watermarking as
those in [49, 58, 102, 105] is a representative of this category.
In Type-II embedding, the signal space is partitioned into subsets which
are mapped by a function g(.) to the set of values taken by the secondary
data (e.g., {a, I} for binary hidden data), as illustrated in Fig. 3.1(b). The
marked value 11 is then chosen from the subset that maps to b, so that
the relationship of b = g(I1) is deterministically enforced. To minimize
perceptual distortion, h should be as close to 10 as possible. That is,
h = arg min D{Io, I), (3.1)
I s.t.g(I)=b
where the distance measure D(·,·) depends on perceptual model. Unlike
Type-I, the detectors for this type do not need the knowledge of the original
value 10 because the information regarding b is solely carried in 11 • Note that
there may be other constraints imposed on 11 for robustness considerations,
22 3. Basic Embedding Mechanisms
for example, the enforcement may be done in a quantized domain with
uniform quantization step size Q [46][110].
A simple example of Type-II is the odd-even embedding, whereby a clos-
est even number is used as h to embed a "0" and a closest odd number
is used to embed a "1". Extracting the embedded data is straightforward
through an odd-even parity check I. Data can also be embedded by enforc-
ing relations on a group of components, for example, to enforce the sum of
several host components to a nearby even number to encode a "0", and to
an odd number to encode a "1". When keeping the total distortion fixed
and extending to higher dimensional space, the distortion introduced per
dimension is reduced. Also, more choices are available to select a new signal
vector with desired bits embedded in, which allows embedding to be per-
formed in such a way that the human-visual-model-weighted distortion is
minimized. The cost here is a reduced embedding bit rate. This is a tradeoff
between embedding rate and invisibility 2.
The odd-even embedding can be viewed as a special case of the table-
lookup embedding [137][139], which provides an additional level of security
by using a random lookup table as the mapping g(.). There are many other
ways to partition the space to enforce a desired relationship. For exam-
ple, from a pair of host samples or coefficients VI and V2, we may generate
marked coefficients V'I and v' 2 that are close to VI and V2 such that V'I > v' 2
to embed a "I" and v'l ~ V ' 2 to embed a "0" [69]. One can also enforce
signs for embedding [67][71]. Extending the basic ways of enforcement, more
sophisticated schemes can be designed and analyzed [65]. Many proposed
schemes in the literature that claimed to be capable of non-coherent detec-
tion 3 belong to this Type-II category. It is the deterministically enforced
relationship on h that removes the need of using original signal 10 , For
convenience, we shall refer the collection of image pixels or coefficients on
which the relation is enforced as an embedding unit. If the enforcement is
performed on a quantity derived from the embedding unit (e.g., the sum of
a few coefficients, the sign of a coefficient, etc.), we shall refer the quantity
as a feature.
lOdd-even embedding is not equivalent to replacing the least-signifIcant-bit (LSB)
with the data to be embedded [131]' because LSB embedding does not always produce
the closest h to satisfy the relationship b = g(h). If the probabilistic distribution of 10
in each quantization interval is approximately uniform, the MSE of odd-even embedding
is Q2/3, while the MSE of embedding by replacing LSB is 7Q2/12.
2Equivalently, if the embedding distortion per dimension is fixed, the total distortion
that can be introduced increases when moving to higher dimensions. This aggregated
energy allows more reliable embedding via quantization, as will be discussed in Sec. 3.1.1.
3Non-coherent detection in data hiding refers to being able to detect the embedded
data without the use of the original unwatermarked copy. It is also called "blind detec-
tion".
3.1 Two Basic Embedding Mechanisms 23
3.1.1 Probability of Detection Errors
The two types of embedding schemes have different characteristics in terms
of robustness, capacity and embedding distortion, as outlined in Table 3.l.
In this section, we shall consider and compare their probability of detection
errors. For both types, properly constructed channel codes can be applied
to enhance the reliability of embedding.
TABLE 3.1. Comparison of two types of embedding mechanisms
Type-/ Type-1/ (Relation
(Additive) Enforcement)
Capacity low high
(host interference)
Robustness high low
(rely on long seq .) (rely on qu antization
or tolerance zone)
Example spread- odd-even
spectrum embedding
embeddinQ
The detection of hidden data for Type-I schemes can be formulated as
a hypothesis testing problem, where the hidden data is considered as sig-
nal and the host media as noise. For the popular spread spectrum embed-
ding [49] [58], the detection performance can be studied via the following
simplified additive model:
(i = I, ... ,n) ifb=-I ,
(3.2)
(i = I, ... ,n) ifb=+I,
where {Si} is a deterministic sequence (often called watermark) , bis a bit to
be embedded and is used to antipodally modulate Si, di represents the total
noise and interference, and n is the number of samples or coefficients to carry
the hidden information. We further assume b is equally likely to be "-I"
and "+1". In coherent detection where the original source is available, di
comes from processing and/or attacks; in non-coherent detection, di consists
of the host media as well as processing and attack. If di is modelled as i.i.d.
Gaussian N(O, O"i), the optimal detector is essentially a correlator [7]. The
normalized detection statistic TN is given by
(3.3)
24 3. Basic Embedding Mechanisms
where y and §. are column vectors of {Yi} and {Si}, respectively. It is Gaus-
sian distributed with unit variance and a mean value of
(3.4)
TN is compared with a threshold "zero" to decide Ho against H 1 . The
probability of error is Q(E(TN )), where Q(x) is the probability P(X > x) of
a Gaussian random variable X N(O, 1). As illustrated in Fig. 3.2, the error
f'V
probability can be reduced by raising ratio of total watermark energy 11§.11 2
to noise power (Ti. The maximum watermark power is generally determined
by perceptual models so that the changes introduced by the watermark are
below the just-noticeable-difference (JND). If both the watermark power
and the noise power per component are constant, E(TN) can only be raised
by increasing n. That is, to use a longer signal to represent 1 bit. This
reduces the total number of bits that can be embedded in the host data.
,..
I \
I \
I \
I \
I
,;
FIGURE 3.2. Illustration of the distribution of detection statistics (Type-I).
Larger absolute value of mean detection statistic under both hypotheses leads
to smaller probability of detection error.
Another model, used often for conveying ownership information [49][58],
leads to a similar hypothesis testing problem described by:
(i = 1, ... ,n) if §. is absent,
(3.5)
(i=1, ... ,n) if §. is present.
Similar results and conclusions can be drawn. For example, the detection
threshold can be set according to the Bayesian rule to minimize the overall
probability of error as in the previous case, or according to the Neyman-
Pearson criterion to minimize miss detection probability P(choose HolHl
is true) while keeping the false alarm probability P(choose HllHo is true)
below a specified level.
3.1 Two Basic Embedding Mechanisms 25
decision
~ boundary
-A o +A
(a)
(2k·2JQ
•
(2k- IJQ
.
2k Q
I Qn l Qn I
(2k+1JQ
.
(2k+2JQ
~ ~
~
t _ t__ -
decision
boundaries
(b)
FIGURE 3.3. Decision boundaries of Type-II embedding: (a) single-sided detection
decision for sign enforcement with tolerance zone A and decision threshold "0";
(b) two-sided detection decision for odd-even enforcement with quantization step
size Q.
In contrast, Type-II schemes are free from the interference from host me-
dia, as mentioned earlier. They can be used to code one bit in a small num-
ber of host components. Their robustness against processing and attacks
generally comes from quantization and/or tolerance zones. For schemes en-
forcing order or sign, the embedding mechanism may force Iv,! - v'21 ~ A
or Iv'l ~ A, respectively, where A represents the size of a tolerance zone. In
this case, the decision boundaries is single-sided, as shown in Fig. 3.3(a). For
other enforcements, quantization may be used to achieve robustness [110] 4 .
For example, if we use the enforcement to odd or even multiple of Q to
represent one-bit side information, as illustrated in Fig. 3.3(b), then any
further distortion within (-Q/2 , +Q/2) will not cause errors in detection.
A larger Q would lead to more tolerance, at the cost of a larger distortion in-
troduced by embedding. This is because the mean squared error introduced
by embedding, as illustrated in Fig. 3.4, is
MSE 1- . - + 1- [1.Q 1
Q2
2 12 2
- 0
_2
(x + Q)2 dx + - 11~
Q 0
1
(x _ Q)2 dx
2
4An alternative formulation of enforcement with quantization is known as the Quanti-
zation Index Modulation (QIM) proposed in [64]. Dithered modulation was also proposed
as a practical case of QIM[62, 63] .
26 3. Basic Embedding Mechanisms
~Q2 (3.6)
3 '
where the host components within ±Q/2 of kQ are assumed to (approxi-
mately) follow uniform distribution, giving an overall MSE quadratic with
respect to Q.
r unifonn distribution of
.. I I
unmarked elements over
<-:12+2kQ, +:+2kQ)
(2k-l)Q ~~ (2k+l)Q
'--.J\JU~
embed "I" embed "0" embed "I"
FIGURE 3.4. Computing MSE distortion introduced by odd-even embedding
In general, error will be incurred if the noise falls in [(4j + 1)Q/2, (4j +
3)Q/2) for some integer j. If we assume the noise is uniformly distributed
between - M /2 and M /2 with M > Q, the probability of error can be
expressed on interval basis:
I _ (2k-l)Q
if ME [(4k - 3)Q, (4k - l)Q),
Pe-- { M (3.7)
~
M
if ME [(4k - l)Q, (4k + l)Q),
where k is a positive integer. We can see that Pe fluctuates around 1/2
and converges to 1/2 as M goes to infinity. While a more sophisticated set
partition may achieve a better tradeoff between the perceptual distortion
introduced by embedding and the tolerance against certain processing or
attacks, one can see that the tolerance is always limited and is at a cost
of the pre-distortion in the embedding step. By incorporating a proper hu-
man visual model, Type-II schemes are suitable for high-rate data hiding
applications that do not have to survive severe noise.
3.2 Embedding Capacity
In this section, we study the embedding capacity of the two types of embed-
ding. The capacity depends on the channel model, the noise distribution,
and the constraints on watermark signal. For simplicity, we consider addi-
tive white Gaussian noise (AWGN), although other models such as additive
white uniform noise (AWUN) and colored noise can be handled in a similar
way.
3.2 Embedding Capacity 27
3.2.1 Capacity for Type- I Embedding
The channel model of Type-I embedding shown in Fig. 3.I(a) has continu-
ous input and continuous output (CICO). The additive noise consists of two
parts: the interference from the host signal and the noise due to other pro-
cessing and distortion. Under the simplified assumptions that the host sig-
nal is independent of the processing noise, and that both are Li.d. Gaussian
distributed, the embedding capacity is achieved with Gaussian distributed
input and is given by [6]
(3.8)
where E2 is the power of the embedded signal, aI 2 is the power of the
original host signal, and a 2 is the power of additional additive noise. In gen-
eral, the interference from host signal is much stronger than the additional
processing noise, i.e., aI 2 > > a 2 .
Channel
o o
1 1
FIGURE 3.5. A binary symmetric channel (BSC) with a flipping probability of p
3.2.2 Capacity of Type-II Embedding
The channel of Type-II schemes has discrete input with either single sided
or double sided decision boundary, as illustrated in Fig. 3.3. We shall first
study the single sided case shown in Fig. 3.3(a) in which the enforcement to
±A conveys I-bit information, and extend the result to the double sided case
later. For hard decision detection, Type-II embedding can be modelled as
a discrete-input discrete-output (DIDO) binary symmetric channel (BSC)
shown in Fig. 3.5. The capacity of this type of channel is given by [6]
(3.9)
28 3. Basic Embedding Mechanisms
and is achieved by equiprobable input, where hp is the binary entropy
1 1
hp = P ·log( -)
p
+ (1- p) .log(-).
1-p
(3.10)
The flipping probability p is
PAWGN = (3.11)
under AWGN noise N(O,u 2 ), and is
A<M/2 (3.12)
A? M/2
under AWUN noise that is uniformly distributed between -M/2 and +M/2
and has variance u = M/v'I2.
For soft decision detection, which offers knowledge of how wrong the
decision of a particular element would be, the channel is discrete-input
continuous-output (DICO), having a capacity of
A2 2AY
CAWGN,DICO = 1 + u 2 log2 e - E[log2(e--;;2 + 1)], (3.13)
where the expectation E[·] is taken with respect to a random variable Y,
whose probability density function is
1 (y+A)2 1 (II-A)2
f(y) = e-~ + e-~. (3.14)
2v'21l"u2 2v'21l"u2
For AWUN between -M/2 and +M/2, it can be shown that
CAWUN,DICO -
- {
1
2A
M
A < M/2
A? M/2 --
{-L v'3u
1 (3.15)
We plot the capacity versus watermark-to-noise ratio (WNR) for the four
cases in Fig. 3.6. The soft decision shows an advantage of 2 - 5dB over
hard decision. The derivations of CAWGN,DICO and CAWUN,DICO are given in
Appendix 3.5.
For practical schemes that enforce signs with a tolerance zone A, a signal
is generally enforced to be greater than +A or less than -A to encode one
bit, rather than being enforced exactly to ±A as in our simplified model.
This implies that the actual capacity would be higher than that of the model
discussed above.
For odd-even embedding, the error regions are two-sided rather than
single-sided, as shown in Fig. 3.3. Suppose the channel input is kQ, where
Q is the quantization step size. Error may occur when the output Y is in
3.2 Embedding Capacity 29
0.9
Capacity of Ch. (achiewd wI equiprobable input)
__ Discrt-input Conts-output AWGN
- - Discrt input & output AWGN
-e- Discrt-input Conts-output AWUN
..... ~
0.8 -& Discrt input & output AWUN
Q) 0.7
~ , ,
~ 0.6 , ,
-------T-------~--------r-------
,
,
,
, ,
.e.. 0.5 _______ l _______
,
~
,
________ L ____ _
,
,
() ,
,
:g>- 0.4
,
-------~-------~--------~-
! 0.3
0.2
-10 -5 0 5 10 15
1OIOQl o(A2/a 2) (dB)
FIGURE 3.6. Capacity of mco and DIDO channels under AWGN and AWUN
noise.
the regions Y > (k + 1/2)Q or Y < (k - 1/2)Q. Using the DIDO channel
model with AWGN noise, the bit error probability pis
min {1/2, 2. ~ Q((4k + l)Q) _ Q( (4k + 3)Q)}
i
L...J 2a 2a
k=O
1
2L
+00 (4k::)Q t2
= min {1/2, ~e-2 dt} (3.16)
k=O (4k:.})Q v 271"
~ min{1/2, 2Q(~)}. (3.17)
Inserting this PAWGN into Eq. 3.9 yields the channel capacity for odd-even
embedding with quantization.
3.2.3 Capacity Comparison for Type-I f3 Type-II
By fixing the mean squared error introduced by embedding to be E2, we can
compare the capacity of Type-I and Type-II schemes under AWGN noise.
For Type-I, we consider a Continuous-Input-Continuous-Output (CICO)
channel model and assume that the AWGN noise consists of Gaussian pro-
cessing noise (with variance ( 2 ) and host interference (with standard de-
30 3. Basic Embedding Mechanisms
viation 10 times as much as the amplitude of the watermark signal, i.e.,
OJ = lOE) 5. For Type-II, we consider a Discrete-Input-Discrete-Output
(DIDO) Binary-Symmetric-Channel for odd-even embedding with quanti-
zation step Q = V3E. The capacity for Type-I and Type-II under these
assumptions are
1 E2
2"log2(1 + (lOE)2 + 02)' (3.18)
ell 1- hmin{1/2,2E~: Q«4k::)Q)_Q«4k::)Q)}" (3.19)
Defining E2/0 2 as watermark-to-noise ratio (WNR), we plot the capacity
vs. WNR for the two embedding types in Fig. 3.7. It shows that the capacity
of Type-II is much higher than that of Type-I until the WNR falls negative.
The comparison suggests that Type-II is useful under low noise condition
while Type-I is suitable for severe noise, especially when additional noise is
stronger than watermark.
Capacity of Type-I (host=10E) and Type-II AWGN ch. (wrnk MSE E2)
I I I I ".......... I
- Type-I (C-i ~, blind detection) : I' :
0.9 - _& Type-II (D-i D-o) :-----/:--------:-------
+- ---- ---:-- ------:- ----- --+- -- -0-- -:- ---- ---:- -- ----
I I I " I I
0.8 -- - - - - -
I I I I
I " I
I I
_______ _______ ________ I _______ I __ '
J ____ I
_____ II_ _ _ _ _ _
,
_
CD 0.7
~ ~ ~ ~ ~---
I I I
, ,
I
~
~ 0.6 - --- - - -
I
I
~ -:
- - - - - - - I- - - - - - - -:- - - -
I
I
- - - - ~ ~
"
"
- - -
I
- - ;I - - - - - - - -:-
I
I
- - - - - -
--~
~0.5
: : I :: - -. . . . .- - - - -. . . .
-------{--------:--------:--------~-- I I
U : : : : 0.1 --:---~---~---~---:---~I
:5 0.4 -------~ -------~-- ------~ ----- -t/i -- I
I I I I
0.08 - -:- - --:- - - ~ - - - ~ - --:- - -,~-
I .,
~ t _~ __ -:-,1-
I I I I I I I I I, I
: I _ _ _ _ _ _ _ :_ _ _ _ _ _ 0.06 - -:- - - -: - - - ~ - - - ~ - - ~-
U 0.3 -------1"------ :,: 0.04 --~--~---~---~--A--~-
I I 0 I I I I I,' I I
0.2 _______ ~--------:--- ----~---I--- --:----:---~A'-~--:---~-
:"
I 0.02
I I I , o~~~~~~~·,~~1,~!j
: :, ~ -4 -3 -2 -1
0.1 ----:--------:-£r , I 0
~~5. . . .~. .~~~~0~~~5*=~~1~0~~~15~~~20
10Iog10(E2to·2 ) (dB)
FIGURE 3.7. Capacity of Type-I (CICO channel) and Type-II (DIDO channel)
embedding under AWGN noise
5In general, the magnitude ratio between the host signal and the watermark depends
on the content of the host signal and human perceptual models [58]. A ratio around 10
is typical and is used here [49]. Small changes in the ratio will not lead to significant
changes in the capacity curve.
3.2 Embedding Capacity 31
3.2.4 Extensions and Discussions
Extensions of Channel Models We have discussed two possible chan-
nel models for Type-II embedding, namely, DICO and DIDO channels, and
have compared them with the CICO channel model for Type-I embedding.
Throughout the above discussion, we assumed that the noise is additive and
white, and the watermark signal power is the same on all media components.
Generalization is possible on two aspects. First, we should consider the so-
called unembeddable components which have to be left untouched by the
embedding mechanism to meet imperceptibility requirement. As we shall
see in Chapter 4, these unembeddable components reduce the data hiding
capacity. Second, we should consider that each media component may incur
different host interference, may be able to be watermarked with different
strength, and may sustain different noise. The channel model can thus be
modified into a parallel channel and the capacity be studied using informa-
tion theory [6]. A noteworthy issue regarding the parallel channels is that in
classic communication literature, the capacity of L parallel AWGN channels
follows a so-called Water-filling Theorem, where a constraint on total power
is imposed and the power is shared among all channels. Though meaningful
in telecommunication, the constraint on the total power may not always be
valid in watermarking problems where the power constraint for each indi-
vidual channel (possibly in the form of a frequency band or a local region)
is determined by perceptual models such as those in [47, 59, 92]. While
the perceptual constraints may have dependency among several channels,
it does not appear to fit the simple constraints on the total power. How to
better handle this problem is a direction to be explored.
Zero-error Capacity Also note from the above analysis that the discrete-
input channel model for Type-II has zero probability of detection error if
the support of noise distribution is within the decision boundary shown as
shaded area in Fig. 3.1. Under the specific model considered there, the chan-
nel is able to convey 1 bit per channel use with no error. If we revise the
channel model to allow freedom in choosing the input alphabet, the embed-
ding capacity is then determined by Shannon's zero-error capacity [6][81],
as suggested in [42]. The capacity may exceed 1 bit per channel use for a
specific noise distribution and a specific power constraint on the watermark.
A Unified View on Two Types A unified view of the two embedding
types can be obtained in terms of set partitioning: both partition the signal
space into several subsets, each of which represents a particular value of hid-
den data. We have already explained the set partitioning for Type-II. For
Type-I, such as the antipodal additive spread spectrum scheme, we can see
that the signal space for detection is also partitioned into two parts accord-
ing to the sign of detection statistic TN: the positive part represents a "I"
and the negative part represents a "0". The difference is that the enforce-
32 3. Basic Embedding Mechanisms
ment is not done on the marked media deterministically for Type-I. Under
non-coherent detection, the additive embedding alone still leaves non-zero
probability for watermarked components not to be enforced to the desired
set. Because the host media is a major noise source, we have to rely on a
statistical approach (e.g., spreading watermark signal to many components
and taking sum or average) to suppress noise and obtain detection result
with small probability of error. This effort is needed even when there is no
noise coming from processing or attack.
Motivated by Costa's information theoretical result [74J, distortion com-
pensation has been proposed to be incorporated into quantization-based
enforcement embedding [39, 64, 66, 68], where the enforcement is combined
linearly with the host signal to form a watermarked signal. The optimal scal-
ing factor is a function of WNR and will increase the number of bits that
can be embedded. This distortion compensated embedding can be viewed
as a combination of Type-I and Type-II embedding. Among the practical
embedding schemes, Type-II can reach comparable embedding rate to its
distortion compensated counterpart at high WNR, while Type-I can reach
similar rate to distortion compensated embedding at very low WNR.
No Free Bandwidth via Data Hiding A specious argument about data
hiding is that it could provide additional bandwidth to convey secondary
information. In fact, depending on whether the quality of host media is
reduced or not, the bandwidth for conveying secondary information is at
an expense of either reducing the bandwidth for conveying host media or
increasing the total bandwidth of conveying the watermarked media. Con-
sider the simplest case of embedding one bit in an image: the entire image
space S is partitioned into two disjoint subsets SI and S2 with S = S1 US2,
regardless of the specific embedding algorithm being used. When a detector
sees an image belonging to S1, it will output the embedded bit value as "0";
when it sees an image belonging to S2, it will output the embedded bit value
as "1". Assuming the embedded bit is equiprobable to be "0" or "1", the
probability of an image falling into the first and the second subset equals
to 1/2, respectively. To code any image in this image space, one bit will be
used to specify to which subset the image belongs. In other words, one of
the bits used in coding an image is actually for conveying the embedded bit.
For odd-even embedding in a single image element, the two subsets are
obtained by a partition according to the least significant bit (LSB); for an
odd-even enforcement applied to the sum of two components, the embedded
bit is related to the LSB of both components. For spread spectrum additive
embedding, the boundary between the two subsets is commonly determined
by a cotrelator-type detector. If the total number of bits for representing an
image do not change during the embedding process, one bit is reallocated in
a logical sense from representing the host image to representing the embed-
ded data, even though there may be more than one bit physically related to
the embedded data. In this case, the absolute quality of the image is reduced
3.3 Techniques for Embedding Multiple Bits 33
/
1 bit to specify
where an image
belongs to
----...,.~--
FIGURE 3.8. Illustration of the bit re-allocation nature of data hiding
because fewer bits effectively are used in image representation (unless there
is redundancy in the previous representation).
While the above argument indicates that data hiding does not have an
advantage in terms of saving bit rate when compared with attaching the
secondary data separately to the host media, it does have advantages in
other aspects, including the ability to associate the secondary data with the
host media in a seamless way, the standard compliant appearance, and the
potentially low computation complexity in some practical applications. We
will discuss more on this issue in Chapter 8.
3.3 Techniques for Embedding Multiple Bits
Most techniques used to extend the single-bit embedding to multiple-bit em-
bedding [85] have evolved from the modulation and multiplexing methods
in classic communications [8]. The applicability of a particular technique
depends on the type of multimedia sources and the embedding mechanism
being used. In this section, we give a brief review of four typical approaches,
namely, amplitude modulo modulation, orthogonal and biorthogonal modu-
lation, time division modulation and multiplexing (TDM), and code division
modulation and multiplexing (CDM). We then compare their applicability
and performance.
3.3.1 Modulation and Multiplexing Techniques
Amplitude Modulo Modulation Type-I embedding using antipodal or
on-off modulation can be viewed as a simple amplitude modulation. It is
uncommon in practice to use amplitude modulation with Type-I embedding
to convey more than one bit under blind detection and HVS constraints.
34 3. Basic Embedding Mechanisms
This is mainly because the power of a watermark signal is kept small to
satisfy imperceptibility and the detection works in a signal-to-noise ratio
range as low as -20dB when the host signal is not available. As a result, we
have very limited dynamic range of detection statistics and cannot afford
amplitude modulation beyond two symbols. We therefore focus on Type-II
embedding in the discussion of amplitude modulation.
In general, B bits can be embedded in each embedding unit by enforcing a
feature derived from this unit into one of K = 2B subsets. A straightforward
extension of odd-even embedding is to enforce the relation via modulo-K
operation to hide B bit per feature. That is,
h = arg min II - 101 (3.20)
I s.t.I=jQ,jEZ,mod(j,K)=m
where m E {O, 1, ... , K - I} represents the B-bit information to be embed-
ded. As before, 10 is the original image feature, h the watermarked feature,
and Q the quantization step size. Assuming that the distribution of 10 is
approximately uniform over an interval of KQ long, the MSE distortion
introduced by embedding is approximately K 2Q2/12. This indicates that
with the fixed minimal separation Q between the K subsets, larger embed-
ding distortion will be introduced by a larger K. For a fixed amount of MSE
embedding distortion, the enforced relation with a larger K has smaller sep-
aration hence can tolerate less distortion. The idea is easily extensible to
table lookup embedding [137] (Section 7.3) or other enforcement schemes,
and the analysis is similar.
Orthogonal & Biorthogonal Modulation Orthogonal modulation is
mainly used for Type-I embedding. K orthogonal signals are generated be-
forehand, and one of the K signals is added to the host media to represent
B = log2 K bits. A detector computes the correlation between the test signal
and each of the K signals. The signal that gives the largest correlation and
exceeds a threshold is decided as the embedded signal and the correspond-
ing B-bit value determined accordingly. A variation, known as biorthogonal
modulation, encodes log22K = (B + 1) bits by adding or subtracting one
of K signals [8]. Using the classic detection strategies mentioned above,
orthogonal and biorthogonal modulations are inefficient except for small
K, because the computational complexity of detection grows exponentially
with the number of bits being conveyed 6.
There is considerable freedom in selecting the K orthogonal signals, but
letting the embedder and the decoder agree on the K high-dimensional sig-
nals is often non-trivial. In practice, we can use K approximately orthogonal
random signals generated from a set of keys, and make the keys known to
the embedder and the decoder.
6 A divide-and-conquer detection algorithm for orthogonal modulation recently pro-
posed in [142] can reduce the computation complexity from 0(2B) to O(B) at an expense
of detection accuracy.
3.3 Techniques for Embedding Multiple Bits 35
Time Division Modulation and Multiplexing (TDM) This type
of modulation/multiplexing partitions the host media into non-overlapped
segments and hides one or more bits in each segment. The term "TDM"
in this book includes both temporal division for sequential data and spa-
tial division for visual data. TDM is a special case of CDM to be discussed
next. It offers a simple way to realize orthogonal embedding for both Type-I
and Type-II, as the bits embedded in different regions or segments do not
interfere with one other. However, different regions/segments can tolerate
different amount of changes without causing perceptible artifacts. For ex-
ample, very few bits can be embedded into a smooth area of an image,
whereas more bits can be embedded into areas with significant amount of
details. The difficulty arising from this uneven embedding capacity can be
handled by applying random shuffling before embedding, as to be addressed
in Chapter 4.
Code Division Modulation and Multiplexing (CDM) For Type-I
embedding, B bits are encoded into to a watermark signal '!Q via
B
'!Q = Lb k . ~k' (3.21)
k=l
where bk E {±1} and {~k} are orthogonal vectors. As in the case of orthogo-
nal and biorthogonal modulation, there is considerable freedom in selecting
{~d. But unlike orthogonal modulation, the total signal energy here is the
sum of the energy allocated for each bit. If a fixed total amount of energy
is uniformly allocated to each bit, the energy per bit will be reduced as B
increases, causing a decrease in detection reliability and implying a limit on
the total number of bits that can be hidden.
For Type-II, the embedding of multiple bits can be done by enforcing rela-
tions deterministically along different directions that are orthogonal to each
other. For example, relations on the projections of a feature vector along
several orthogonal directions can be enforced in an image block [72)[168].
The total modification introduced by embedding is the sum of the change
along each direction.
3.3.2 Comparison of Modulation/Multiplexing Techniques
Applicable Media Types Amplitude modulo modulation is applicable
to most media including audio, image, and video, as long as the features
participating in the embedding are properly chosen. TDM can be used in
the temporal domain for audio and video, as well as in spatial domain for
image and video. For both general CDM 7 and orthogonal/biorthogonal
7TDM can be regarded as a special case of CDM. Here, by "general" we mean to
exclude the special case of TDM.
36 3. Basic Embedding Mechanisms
modulation, we need multiple mutually orthogonal directions in the embed-
ding domain, which can be a non-trivial task. For example, it is difficult
to find in a binary image many overlapped but orthogonal directions to
produce features that are manipulable within the just-noticeable-difference
(JND) range [99][100]; to obtain such directions for audio also requires a
large window of samples, which could lead to significant processing delay.
TDM vs. CDM Approaches TDM and CDM are equivalent in terms
of energy allocation. TDM is a special case with the supports of'Mk being
non-overlapping with each other in the sample domain. Alternatively one
can choose orthogonal but overlapped {'Mk}, similar to CDMA communica-
tions [8][10]. The confidentiality of {'Mk} can potentially add an additional
layer of security. In addition, unlike TDM approach, uneven embedding ca-
pacity is no longer a concern for CDM because the {'Mk} can be chosen so
that each bit is spread over all the media data. However, the B orthogonal
sequences have to be generated and shared with the detector(s), which may
be non-trivial for large B. The TDM and CDM approaches can be combined
to encode multiple bits.
TDM/CDM vs. Orthogonal Modulation The orthogonal modulation
and TDM/CDM-type modulation can be compared by studying the dis-
tances between signal constellation points that represent the hidden data
(Fig. 3.9). This distance, in many cases, are directly related to the likeli-
hood of detection errors [8]. Considering the case of conveying B bits with
total energy £. The minimum distance between signal points is ..;u for
orthogonal modulation, and is 2J£/B for TDM/CDM. When B > 2, or-
thogonal modulation gives smaller probability of detection error at a cost
of complexity in computation and bookkeeping.
u2 u2
- 2 eB 7
/JE
ul ul
• •
FIGURE 3.9. Comparison of distance between signal constellation points for or-
thogonal modulation (left) vs. TDM/CDM-type modulation (right) with total
signal energy being fixed at &.
3.4 Chapter Summary 37
By combining orthogonal modulation with TDM or CDM, it can be shown
that the embedding rate will increase considerably. In fact, we can double the
embedding rate with little complexity increase. For example, the watermark
can be constructed as
B
~= L bk . [I(bB+k = 1) . .'!!~1) +I(bB+k i= 1) . .'!!~2)1, (3.22)
k=l
where bi E {+ 1, -I}, I(·) is an indicator function, and all vectors in the two
sets {.'!!~1)} and {.'!!~2)} are orthogonal. Here TDM/CDM is used to convey
B bits and the orthogonal modulation is used to double the payload. The
resulting total watermark energy is the same as using TDM or CDM alone.
Energy Efficiency We define a quantity W = X?i2
to measure the en-
ergy efficiency of embedding, where X is the number of embedded bits per
element, Y is the MSE distortion per element introduced by embedding, and
Z is the minimum separation between the enforced constellation points and
is a measure of robustness against noise. A smaller W value is more prefer-
able. We consider applying modulation/ multiplexing to one embedding
unit of n elements and summarize the comparison of different techniques 8
in Table 3.2. We can see that except for very small n and B, biorthogonal
techniques has the smallest W values, while the amplitude modulo tech-
1
nique gives largest W values - it equals to for B = 1, and to ~ for B = 2.
This suggests that to embed multiple bits with limited watermark energy,
orthogonal and biorthogonal modulation should be used at a cost of com-
putation and bookkeeping. On the other hand, TDM and CDM techniques,
being applicable to both Type-I and Type-II embedding under blind de-
tection as well as having a constant W value of ~ and linear complexity,
show broad applicability and a good balance between energy efficiency and
detection complexity.
3.4 Chapter Summary
This chapter discussed fundamental problems associated with embedding
mechanisms. We first studied two major categories of embedding mecha-
nisms, namely, an additive embedding (Type-I) and a deterministic enforce-
ment embedding (Type-II). Our quantitative investigation of their proba-
bility of detection errors and embedding capacity suggested that Type-II is
useful under low noise condition while Type-I is suitable for dealing with
severe noise that has stronger power than the watermark itself. We also
compared various modulation/multiplexing techniques for hiding multiple
8The modulo-K modulation extended from odd-even embedding is taken as a repre-
sentative of amplitude modulation.
38 3. Basic Embedding Mechanisms
TABLE 3.2. Comparison of modulation/multiplexing techniques
(n elements per embedding unit)
Amplitude TDM/CDM Orthogonal Biorthogonal
Modulo (B:::; n) (2B :::; n) (2B :::; n)
Type-I embed. Applicable Applicable Applicable
Type-II embed. Applicable Applicable
it'
# embedded bits .!! .!! .!! .!!±.!
n n n n
per element
y
22BQ2 f f f
MSE distortion 12n n n n
per element
Z
minimum
separation
Q 2v'1 VU VU
W=~
22B 1 1 1
energy efficiency 12B 4" 2B 2(B+1)
of embedding
computational
complexity for const O(B) O(2B) O(2B)
detecting B bits
bits and found that CDM/TDM techniques have broad applicability and
a good balance between energy efficiency and detection complexity. In the
next chapter, we will address the problem of unevenly distributed embed-
ding capacity.
3.5 Appendix - Derivations of Type-II Embedding
Capacity
In this appendix section, we derive the capacity under DICO channel model
for Type-II embedding. We shall consider AWGN and AWUN noises, and
show the capacity under these channels follow Eq. 3.13 and Eq. 3.15, re-
spectively.
According to information theory [6], the channel capacity is
C = maxI(X; Y), (3.23)
p(x)
where I(X; Y) is the mutual information between two random variables X
and Y. For a channel with continuous outputs, we have
J(X; Y) = h(Y) - h(Y!X) (3.24)
3.5 Appendix - Derivations of Type-II Embedding Capacity 39
h(Y) - h(X + Z\X) (3.25)
h(Y) - h(Z), (3.26)
where h(·) is the differential entropy of a continuous random variable, h(·\·)
is the conditional differential entropy, and Z is additive noise that is in-
dependent of the channel input. Consider first the case of AWGN noise
N(O, 0"2), whose differential entropy is known as ~ log(271"e0"2). We have
1
J(XjY) = E[-logfy]- "2 log(271"e0"2) , (3.27)
where the expectation E[·] is with respect to the random variable Y whose
probability density function (p.d.f.) fy is a bimodal gaussian, i.e.,
1 1
+ P(X = +A) . - - e -
(Y+1)2 (Y-1)2
fy(Y) = P(X = -A) . - - e - 2u 2u •
';271"0"2 ";271"0"2
(3.28)
By symmetry, the capacity is achieved by equiprobable input, i.e., P(X =
-A) = P(X = +A) = 1/2. We now have
where B = yA/0"2. The term log(e- B + eB ) can be simplified as
+ eB ) = log e e B+ 1 = loge~
~ yA
2B
log(e- B +1 - 2loge.
0"
(3.30)
We take expectation with respect to Y on every term in - log f y (y) and
obtain
1 2 A2 log e 2 2AY
h(Y) = log 2 + "2log(271"0" ) + 20"2 log e + 20"2 E(Y ) - E[log e -;;2 + 1],
(3.31)
where the term E[ ~ log e] vanishes because Y has zero mean. With E(y2) =
0"2 and some more rearrangement, we arrive at
A2 2AY
C AWGN DICO
, = log 2 + -2
0" loge - E[log(e-;;2 + 1)]. (3.32)
Therefore, the channel capacity in unit of bit per channel use under AWGN
noise is
CAWGN,DICO = 1 + U2log2
A2
e - E[ ( 2AY
log2 e-;;2 + 1)] . (3.33)
40 3. Basic Embedding Mechanisms
For AWUN noise between -M/2 and +M/2 (noise variance a 2 = M2/12),
the differential entropy of noise is
h(Z) = j M/2 1
MlogM dz = logM. (3.34)
-M/2
The shape of the output Y's distribution depends on the relations between
M and A. We can show that
h(Y) = { ~h(P) + logM A < M/2, (3.35)
h(p) + logM A? M/2,
where p is the probability of the channel input p = P( X = -A), and h(P) is
the binary entropy defined as h(P) = -p ·logp - (1- p) .log(l - p). Noting
that h(P) assumes its maximum at p = 1/2, we have
2A A < M/2
CAWUN,DICO
-
-
{ M
1 A? M/2,' (3.36)
where the capacity is achieved by equiprobable inputs.
4
Handling Uneven Embedding Capacity
We have pointed out in previous chapters that the design of a data hiding
system involves several conflicting requirements, such as imperceptibility,
robustness / security, and capacity. Depending on the specific applications,
these requirements are given different weights, and in general a tradeoff
has to be made. Compared with this widely discussed tradeoff, another
challenge, known as uneven embedding capacity, has received little attention
in literature. It is, however, an important and unavoidable problem that
every data hiding system has to consider. This chapter discusses how to
handle uneven embedding capacity.
The unevenly distributed embedding capacity for multimedia data hiding
comes from the non-stationary nature of the host signal. Taking an image
as an example, changes made in smooth regions are easier to be noticed
than those made in textured regions. In terms of data hiding, this means
fewer bits can be embedded in smoother regions, resulting in unevenly dis-
tributed embedding capacity from region to region. We shall refer to a pixel
or coefficient of the media source as embeddable if it can be modified by
more than a predetermined amount without introducing perceptible distor-
tion. The predetermined amount of modification usually depends on both
robustness and imperceptibility requirements. For example, a DCT coeffi-
cient whose magnitude is smaller than a threshold may be considered as
unembeddable [47]. The uneven distribution of embeddable coefficients from
region to region is a reflection of the uneven embedding capacity.
While it is desirable to embed as many bits as possible in each region,
the number of actually embedded bits would vary significantly from region
to region, and this side information has to be conveyed to a detector for
42 4. Handling Uneven Embedding Capacity
accurate decoding. Under blind detection where a detector does not have
the original unwatermarked copy, an accurate estimation of how many bits
are embedded in each region is not always easy, especially when the wa-
termarked image may have been subjected to distortion. An error in this
estimation can cause not only errors in extracting the embedded data in
the associated region but also synchronization errors that affect the data
extracted from the following regions. Unless the number of bits that can be
embedded in each region is large, conveying side information would intro-
duce large overhead, and may even exceeds the number of bits that can be
reliably embedded in the first place.
A common way to overcome this difficulty is to embed a fixed number of
bits in each region, thereby eliminating the need of side information. For
this approach to work, the fixed number of bits must be small and the size
of each region must be large enough to ensure that each region has the the
capacity for embedding this fixed number of bits. Large region size reduces
the total number of bits that can be embedded. Furthermore, this will cause
significant waste in embedding capabilities in regions that is able to hide
more bits.
In this chapter, we propose an adaptive solution to handling uneven em-
bedding capacity. If the total number of bits that can be embedded is much
larger than the number of bits needed to convey how many bits are embed-
ded in each region, we adopt variable embedding rate and use a multiplexing
technique to hide the side information as control bits to facilitate the detec-
tion without introducing large overhead. If the two bit numbers are com-
parable, we adopt constant rate embedding and use shuffling to overcome
uneven embedding capacity. We will show the efficacy of using shuffling via
analysis and experimentation. Later in Part-II Algorithms and System De-
signs, we will demonstrate how the proposed solutions are applied to specific
design problems.
We begin our discussion with a quantitative model of the uneven embed-
ding capacity in a natural image in Section 4.1. We then discuss constant
and variable rate embedding in Section 4.2 and Section 4.3, respectively.
Three examples from Part II are briefly outlined in Section 4.4 to demon-
strate how the proposed approaches can be used for designing practical data
hiding systems.
4.1 Quantitative Model for Uneven Embedding
Capacity
We consider the blockwise DCT transform of an image of size S = Ml X M2,
with each transform coefficient labelled as "embeddable" or "unembed-
dable", which are determined by a human visual model. The block size
of the transform is fixed as 8 x 8. DC coefficients and the AC coefficients
4.1 Quantitative Model for Uneven Embedding Capacity 43
whose magnitude is smaller than a perceptual threshold are left unchanged
to avoid artifacts [47][137]. Under this labelling, a smooth block could have
no embeddable coefficients. In a typical natural image such as the one shown
in Fig. 4.1, about 20% of the 8 x 8 blocks are smooth and have no embed-
dable coefficients at all. This is illustrated in Fig. 4.2 . .
Suppose n of the S coefficients are embeddable. Then the fraction of
embeddable coefficients is p = n/ S. The coefficients from all blocks can be
concatenated into a single string of length S, and this string is divided into N
segments of equal length q = S / N. Let mr be the number of segments having
r embeddable coefficients, where r = 0,1,2, ... , q. In particular, mo/N is the
fraction of segments having no embeddable coefficients. For the image in
Fig. 4.1 with segment size q = 8 x 8 = 64, the histogram of mr/N vs. r is
shown as a solid line in Fig. 4.3. It is seen that about 20% of the segments
have no embeddable coefficients, while a small number of segments have as
many as 25 embeddable coefficients. This demonstrates that there can be
a large variation in the distribution of embeddables in a natural image. By
increasing the segment size q from 64 to 256, a similar shaped histogram
is obtained, where the fraction of blocks with no embeddable coefficient is
only decreased to 15%. This indicates that to embed a constant number
of bits in each segment, simply increasing the segment size is ineffective in
reducing the number of segments having zero embeddable coefficients. At
the same time, embedding capabilities is wasted in other regions that could
potentially hide many bits.
FIGURE 4.1. An original unmarked 640 x 432 image Alexander Hall. The image
is stored in JPEG format with a quality factor of 75%; watermarking and related
studies are performed on its luminance components.
44 4. Handling Uneven Embedding Capacity
- •
. ..
• .. . -. • •
FIGURE 4.2. Smooth blocks of Fig. 4.1 (shown in black)
4.2 Constant Embedding Rate (CER)
At the beginning of this chapter, we have explained the dilemma in choosing
embedding rate under uneven embedding capacity. On one hand, using vari-
able embedding rate generally requires sending side information about the
embedding rate, which could be an expensive overhead; on the other hand,
using a constant embedding rate may waste embedding capabilities. In this
section, we shall focus on constant embedding rate and explore approaches
that can increase the total amount of data embedded.
The simplest case of constant embedding rate is to embed one bit in each
segment by either Type-lor Type-II mechanisms discussed in Chapter 3.
For images, the blocks can be obtained by a regular partition, which retains
the original geometric layout of the image. As illustrated earlier in Fig. 4.3,
unless the block size is large, blocks in a smooth area may have no embed-
dable coefficients at all. Under a constant embedding rate, a large block size
reduces the total number of bits that can be embedded and wastes a large
amount of embedding capability. Approaches that can embed more data via
a smaller block size are therefore more desirable. In the following, we will
discuss two ways to achieve this, namely, backup embedding and shuffling.
4.2 Constant Embedding Rate (CER) 45
----i
,'" ~. .,;V',-tvI'~\r-~
oL~'~~~~~~,~'w~'.~~.'~-L'~~'~~~~-J
o ~ ~ ~ 00 ~ ro ~ 00 100
embeddable coeff. # per segment (alex. img)
FIGURE 4.3. Histogram of embeddable coefficients per block for the luminance
components of Fig. 4.1 before shuffling: 8 x 8 blocks (solid line) and 16 x 16 blocks
(line with dots).
4.2.1 Backup Embedding
The idea of backup embedding is to embed the same data in multiple loca-
tions. The locations are identified deterministically by a rule known to both
the embedder and the detector. Illustrated in Fig. 4.4 is a special case where
we embed one bit in (i,j) block and also put a backup copy in (i,j + If)
block that is half way apart, where H is the number of blocks along the
vertical direction. We shall call this symmetric backup.
Assuming that a block consists of q components and the number of lo-
cations holding the same information is L, the equivalent block size for
embedding one bit is Lq, implying that an increase in L will reduce the
total number of bits being embedded. The difference between backup em-
bedding with L locations and simply increasing the block size by L times is
that backup embedding is more likely to allow most bits being embedded.
This is because if multiple locations are sufficiently apart from each other,
the probability of each location being smooth tends to be independent of
each other, therefore the probability that all of them are smooth is greatly
reduced, enabling to hide more data than the approach that simply enlarges
the block size. With a proper choice of the location patterns, the indepen-
dence condition is most likely to hold. In addition to helping overcome the
uneven embedding capacity {137], backup embedding has also been adopted
46 4. Handling Uneven Embedding Capacity
16x 16
(i, j)
4 bits
backu
H -- ---------- ----- ------------
backup
(i, j+Hl2)
FIGURE 4.4. Symmetric backup embedding for handling smooth region. One bit
is embedded in a block and its companion block half image apart. The effective
embedding rate is two bits per 16 x 16 macroblock that consists of four blocks.
by "self-recovery" systems that use embedded data to recover corrupted
regions [42][ 123].
The shuffling approach discussed next can be viewed as a generalization
of backup embedding where each block is reduced to contain only one coef-
ficient and the multiple locations are specified by a permutation function.
4.2.2 Equalizing Embedding Capacity Via Shuffling
The effectiveness of simple backup embedding such as the symmetric backup
shown in Fig. 4.4 may still depend on the structure of the host image. For
example, an entire column or row of an image may be smooth and there-
fore data cannot be hidden in that column or row. To achieve statistical
independence with respect to the image structure, we consider shuffling
the coefficients. Shuffling is a bijective mapping of the coefficient indexes
f: {I, 2, ... , S} -7 {I, 2, ... , S}, where S is the total number of image coeffi-
cients. As illustrated in Fig. 4.5, a shuffle is applied to the top string formed
by the concatenation mentioned previously, resulting in the second string.
Embedding is done on this string to produce the third string. For exam-
ple, the secolld number "-74" is changed to "-73", the third number "24"
is changed to "25", and so on. The third string is then is reverse shuffled
to get the fourth'string, which is the watermarked signal. The same shuffle
needs to be performed at detection.
4.2 Constant Embedding Rate (CER) 47
segment-1 segment-2 segment-3
-----
;=
171 -74 144
I 189 201 192
I 133 24 128 ... I-----
shuftling
( ', ~---
~
189 -74 24 133 ... ...
-----
I-----
g
embeddin ( r----'-------,----'-----,----~--....,_
n "I" n "0" n "0"
embed embed embed
inverse (
shuftling r----L-----;::1-----::::::~==-~---~
133 25 128 ...
FIGURE 4.5. Incorporate shuffling in an embedding mechanism
Shuffling can be considered as a permutation, which can be either (pseudo)
random or non-random. A simple case of non-random shuffle is an inter-
leaving process similar to the backup embedding discussed earlier, i.e., to
embed the i-th bit of a total of B bits to {kB + i}-th coefficients, where k
is a positive integer [137]. We shall focus on the case of complete random
permutation, where all permutations are equiprobable hence the probabil-
ity of each permutation is l/S! [84]. We will show the effectiveness of this
approach by examining the distribution of embeddables before and after a
random shuffling.
Analysis As defined earlier, mr/N is the fraction of segments having
r embeddable coefficients. Computing the marginal distribution of P(m r )
from the joint probability distribution of {mo, ml, ... , m~} is quite involved
unless q = ~ is small. We adopt instead a moment-based approach [3] to
study the mean and variance of each normalized bin mr / N of the histogram.
For each bin mr with r = 0, ... , q, it is shown in appendix that
E[r;] (4.1)
Var[r;]
(4.2)
The distribution of embeddable coefficients per segment after shuffling there-
fore depends only On two global parameters, p (the percentage of embed-
dable coefficients) and q (the segment size taken after shuffling), and does
not depend on the distribution before shuffling.
48 4. Handling Uneven Embedding Capacity
The expected histogram {E (miIN)} is an arch-shaped hypergeometric
distribution function [lJ and can be approximated well by binomial distri-
bution with mean pq:
(4.3)
It can also be approximated well by Poisson and normal distributions with
mean pq. An excellent approximation of Var[¥J is given by
1
Var [;] ~ N· b(rjq,p)· [1- b(rjq,p)J. (4.4)
The detailed derivation and further analysis of these two quantities can be
found in Appendix 4.7.
Simulation and Verification We use the image shown in Fig. 4.1 to
illustrate the effectiveness of shuffling. Among a total of S = 640 x 432 co-
efficients, p = 15.49% of all coefficients are embeddable. We choose segment
size as q = 8 x 8 = 64, which coincides with the block size of the DCT
transform. From Eq. 4.1 and Eq. 4.2, we have
E [r;] ~ 0.002%, Var [r;] ~ 4.85 x 10- 9.
The very small value of Var [moiNJ suggests that very few shuffles among
the S! possibilities will result in a histogram which deviates appreciably from
the mean, and that higher moments will not contribute much to this investi-
gation. The mean value indicates that the average fraction of segments with
no embeddable coefficients is reduced by 4 orders of magnitude, from the
original 20% to 0.002%. The expected number of segments with no embed-
dable coefficient after shuffling is only 0.002%xN = 0.002%x640x432/64 ~
0.086.
We have also performed 1000 random permutations on the block DCT of
the image of Fig. 4.1, and compute the mean and variance of each bin of the
histogram {mrIN}. As we can see from Fig. 4.6(b), the dash-dot line is the
histogram of {mr IN} before shuffling, showing that 20% of the segments
have no embeddable coefficients, the solid line is a plot of Eq. 4.1, and the
dotted line is a plot of the squared root of Eq. 4.2 (standard deviation -
std). The circles are the average fraction of blocks having a given number of
embeddable coefficients from simulation, and the crosses are the standard
deviation from simulation. Fig. 4.6 shows that the agreement of simulation
and analysis is excellent. We also see that after shuffling, the number of seg-
ments that have no embeddable coefficients has been significantly reduced
and that most segments have between 5 and 15 embeddable coefficients.
We should point out that q does not have to be the same block size as
that of the transform (8 x 8). Instead, q should be chosen to give the desired
mean, pq, of the histogram {E (md N)}, and to make sure that the left tail
4.2 Constant Embedding Rate (CER) 49
0.2 5
_.- before shuff
0 simulation mean
• simulation std
analytic mean
2
..... analytic std
r 1\
5
!
I
!I
I .\
1
i
i
i
_0'" _ .... ,.
0.0 5 i ,
I
1/ "." \ ~-'\
!~ fo'-'"
..... -. .
............. ............. ._ ....
I •
"0,
o 5 10 15 20 25 30 35
embeddable coefl. # per segment (alex. img)
FIGURE 4.6. Histogram of embeddable coefficients per 8 x 8 block for the lumi-
nance components of Fig. 4.1 before shuffling (dash-dot line) and after shuffling
(all others): (solid line) - mean of simulation; (dot line) - std of simulation; (circles)
- mean from analytic study; (crosses) - std from analytic study.
of the histogram is smaller than a desired bound. For images that contain a
large fraction of embeddable coefficients (i.e., large p), the segment size can
be chosen to be small; while for images in which most regions are smooth,
the segment size should be large enough to ensure enough decay at the left
tail.
Shuffling in most cases can lead to all segments having at least one em-
beddable coefficient. This allows one bit to be embedded in every shuffled
segment, including those segments in smooth regions. The embedding in
smooth regions via shuffling is in a logical sense. In fact, no bits are actu-
ally inserted in smooth regions. A number of embeddable coefficients from
non-smooth regions are dynamically allocated through shuffling to hold the
data that are intended to be put in smooth regions. It also indicates that as
long as the criterion for identifying embeddable coefficients is unchanged,
shuffling will not compromise the perceptual quality.
The equalization of embedding capacity via shuffling requires little addi-
tional side information. The detector only need to know the segment size
and the shuffle table that can be generated from a key. This side information
also has the added benefit of enhancing security.
50 4. Handling Uneven Embedding Capacity
4.2.3 Practical Considerations
Generating Shuffle Table A shuffling table can be generated from a key
and the generation is with linear complexity proportional to the number of
entries [2]. An algorithm of this kind is discussed as appendix in Section 4.6.
Handling Bad Shuffle While our analysis shows that the probability for
getting a bad shuffle is extremely small, it is still possible for a given image.
This problem can be handled in two ways.
The first approach is to generate a set of candidate shuffles which are
significantly different from each other, then select and use the best shuffle
when hiding data in a given image. It addresses the problem that a spe-
cific instance of random shuffle could be good for most images and bad for
some images. Notice that this approach allows the candidate shuffles to be
image-independent and such independence is desirable for marking many
images without the need of conveying much additional side information of
the shuffling. The probability that all shuffles are bad for the image shall
decrease exponentially from the already low probability in the single shuffle
case, and even two shuffles would be adequate in practice. We can use one as
a primary shuffle, and switch to the secondary one when the primary one is
not suitable for a specific image. How to convey to detector the information
of which shuffle is actually used is similar to conveying side information in
variable rate embedding, and will be discussed further in Section 4.3.
The second approach targets at the case where almost all but a very few
blocks having no embeddable coefficients/pixels. In this case, the bits to be
embedded in the blocks with no embeddables can be treated by a detector
as erasure bits. Using error correction coding [5] with moderate correction
capability before embedding will be able to handle this problem.
Adaptive Segment Size The segment size q determines how many bits
will be embedded in an image and is in turn dependent on p, the percentage
of embeddable coefficients/pixels. If p is small, the segment size has to be
large to ensure sufficient number of embeddable coefficients are present in
each shuffled block. Because the percentage of embeddable coefficients can
vary considerably from image to image, it is desirable to choose the segment
size adaptively according to the content and the type of each image. As in
dealing with bad shuffles, a key problem for using adaptive segment size is
how to convey such side information to detector. This is discussed further
in Section 4.3.
4.2.4 Discussions
Uneven embedding capacity occurs when multiple bits are embedded in
non-overlapped segments. This falls in the TDM category discussed in Sec-
tion 3.3. An alternative way is to embed multiple bits using CDM approach,
4.3 Variable Embedding Rate (VER) 51
possibly combined with spread spectrum embedding. Discussion on the pros
and cons of TDM and CDM approaches can be found in Section 3.3.2.
We also notice that shuffling may increase the sensitivity to intentional
attacks aimed at rendering watermark undetectable. Geometric distortion
is one class of such attacks. While this is a potential shortcoming for some
applications, it is not a major concern for applications in which users can
benefit from the hidden data and/or are not willing to make the hidden data
undetectable, such as using watermark to detect tampering or to convey
bilingual audio tracks. Furthermore, the robustness against sample drop-
ping, warping, scaling, and other distortions or attacks has been identified
as a major challenge for robust data hiding [173][174], regardless of whether
the shuffling is performed or not. The sensitivity of image data hiding to
geometric distortions can be alleviated through registration with respect to
a known watermark that serves as reference, or through embedding in a
resilient domain. This problem will be addressed later in Section 9.2.
In addition to applying shuffling to the entire set of samples or coefficients,
we can shuffle on block basis by permuting all samples / coefficients in
the same block as a whole [70j. We can also apply different shuffles to
each frequency band of a block-based transform so that the coefficients of
a particular frequency band remains in their original frequency band but
permuted to different blocks.
4.3 Variable Embedding Rate (VER)
In this section, we explore issues associated with variable embedding rate.
Compared with CER, VER may enable embedding more data by better
utilizing the embedding capability. However, the side information regarding
how many bits are embedded in each segment must be conveyed. The gain
of VER over CER is significant under the following two conditions: (1) the
total number of bits that can be hidden should exceed the amount of side
information for almost all segments - this ensures there is sufficient room to
convey side information, and (2) the average overhead for side information is
relatively small compared with the average embedding capacity per segment.
A key issue for VER is how to tell a detector the number of bits being
embedded in each segment. More generally, we would like to explore mech-
anisms to convey additional side information to a detector so as to facilitate
the extraction of the embedded data payload. The side information could be
the number of bits being embedded in each segment, or could be an index
signaling which shuffle and/or what segment size is used in the constant-
rate embedding discussed in Section 4.2.3. The latter scenario also indicates
a connection between CER and VER: while given a set of segments such as
all the blocks of an image we may use CER in each segment, the parameter
settings such as the segment size could be different for different sets of seg-
ments (e.g., different images) due to the fact that the embedding capacity
52 4. Handling Uneven Embedding Capacity
of different sets of segments may vary significantly. On the set level, it is
more suitable to apply VER rather than CER because the two conditions
described above are likely to hold.
4.3.1 Conveying Additional Side Information
The additional side information can be conveyed using either the same em-
bedding mechanism as for the user payload or different embedding mech-
anisms. In both cases, the side information consumes part of the energy
by which the host image can be changed imperceptibly. The difference lies
only on the specific way to achieve orthogonality, similar to the discussion
of TDM/CDM multiplexing and orthogonal modulation in Section 3.3.
Consider first the embedding of side information via the same embedding
mechanism as that for the user payload. We use a strategy similar to the
training sequence in classic communications. That is, part of the embedded
data (such as the first several bits) are pre-determined or designed to be
self-verifiable. The self-verifiability can be obtained by hash function (mes-
sage digest function) or error detection/correction codes. For example, in
order to let a detector know which shuffle is used for each image, one may
choose the beginning bits of hidden data to be a predetermined label, or a
label plus its hash. The detector tries to decode the hidden data using all
candidate shuffles. The shuffle that leads to accurately decoding the begin-
ning bits is identified as the one used by the embedder. When we decode
the embedded data using a shuffie table that is significantly different from
the one used by the embedder, the decoded bits are approximately inde-
pendent of each other and equiprobable to be "I" or "0". The probability
of matching the pre-determined pattern or passing the verification test de-
creases exponentially with the number of bits used for identifying which
shuffle is used. Similarly, to let detector know what segment size is used by
the embedding process, we can select a finite number of candidate segment
sizes, and choose a suitable one to embed data. Again, part of the embedded
data is per-determined or self-verifiable. A detector will tryout candidate
segment sizes and find the one that successfully passes the verification. To
limit the searching complexity, we can choose one primary segment size that
is suitable for a large number of images, and a couple of secondary sizes for
handling special images. We have applied these strategies of conveying side
information to data hiding in binary images (Chapter 5).
For grayscale/color images and videos, it is possible to find some other
domains or mechanisms to hide the additional side information. These mech-
anisms are often orthogonal to those used for embedding the user payload.
The popular spread spectrum additive embedding is one feasible approach
for this purpose because their statistical properties make it easy to gener-
ate additional "watermarks" orthogonal or approximately orthogonal to the
watermarks for the user payload. In addition, spread spectrum embedding
has been proven to be robust against a number of distortions. The robust-
4.4 Outline of Examples 53
ness is necessary since the accuracy in determining the side information
(such as how many bits are embedded and what shuffle is used) is crucial
to correctly extract the user payload. The watermarks for conveying side
information often share part of the total energy that can be allocated to all
embedded data while preserving perceptual quality. Allocating more energy
to the side information gives higher robustness in extracting them but re-
duces the amount of user payload. It is desirable to both limit the amount
of side information and use energy efficient modulation techniques to embed
multiple bits of side information. Commonly used energy-efficient modula-
tion techniques such as orthogonal and biorthogonal modulation have been
discussed and compared in Section 3.3.
4.4 Outline of Examples
Several design examples in the following chapters will be used to explain
how the approaches described in the previous sections are used in designing
practical watermarking algorithms. Experimental results are reported to
demonstrate the effectiveness of our proposed approaches. More specifically,
the data hiding in binary images (Chapter 5) and the watermark-based
authentication for grayscale/color images (Chapter 7) show the effectiveness
of shufHing in equalizing uneven embedding capacity from region to region.
The multi-level data hiding in video (Chapter 6) is a prototype system
incorporating almost all solutions we have discussed in Part-I. It adopts
CER within a video frame and uses VER from frame to frame with adaptive
embedding rate.
4.5 Chapter Summary
In summary, this chapter addresses the problem of unevenly distributed
embedding capacity and proposes a set of feasible solutions. Depending on
the overhead relative to the total embedding capacity, we choose between
a constant embedding rate and a variable embedding rate. For a constant
embedding rate, shufHing is proposed to dynamically equalize the distribu-
tion of embeddable coefficients, allowing for the hiding of more data. We
demonstrated, via analysis and experiments, that shufHing is effective and is
applicable to many data hiding schemes and applications. For variable em-
bedding rate, we discussed how to convey the additional side information to
a detector to ensure the reliable extraction of the embedded user payload.
Three design examples and experimental results will be presented in the
following chapters to illustrate the handling of uneven embedding capacity
in practical data hiding systems.
54 4. Handling Uneven Embedding Capacity
Acknowledgement Fig. 4.1 was edited from a Princeton HomePage Pho-
tograph at http://www.princeton.edu/ Siteware/Images/ Cornerpictures/
cornerpixs.shtml taken by Robert P. Matthews as of Year 1997.
4.6 Appendix - Generating Shuffling Table From A
Key
The generation of shuffling table relies on a random number generator. The
security strength of the generator determines that of the shuffling table.
For efficient implementation, we adopt pseudo random number generator
with key(s) or seed(s) determining its output sequence. A simple way of
generating an N-entry shuffling table is to sort N random numbers and to
use the sorting index to construct the table. Tpis approach was used by
Mathworks for its Matlab function "randperm" [18]. More specifically, let
{rk} denote the sequence of N random numbers (k = 1 '" N), and {r~}
denote the sorted sequence with r~ = rik and r~l :s: r~2 for any kl' k2 E
{I, ... , N} such that kl < k 2 • The mapping T of the shuffling table is then
obtained as T(k) = ik' Since the best sorting we can get has the complexity
of O(N log N), the complexity of this algorithm for generating the shuffling
table is O(NlogN).
A better algorithm quantizes the random number with monotonically
increasing step sizes and takes advantage of a carefully selected data struc-
ture [2], reducing the complexity to O(N). The basic idea is as follows: we
start with a set Sl = {I, ... , N}, and generate one random number per step.
At the kth step, we uniformly partition the output range of random number
generator into N - k + 1 non-overlapped segments; if the random number
generated at this step falls in the jkh segment, we pick the jkh element in
the set Sk, fill the value in the kth entry of shuffling table, and cross out
the element from the set, denoting the new set of N - k element as Sk+l.
We continue the process until the shuffling table is fully filled. To achieve
linear complexity and to allow in-place storage (i.e., no additional storage
is needed for every new Sk), we implement the set Sk based on hashing and
swapping the elements in an array. The detailed algorithm is summarized
below:
(1) Initialization. Set up two N-element array T[i] and sri] (i = 1 '" N)
for storing shuffling table and for keeping the above mentioned set Sk,
respectively., Let sri] = i and the step index k = 1.
(2) Generate a random number rk, and denote jk as the index of the
segment that it falls in. More specifically, if the range of the random
number is [L; U), then
jk = l~ =~ x (N - k + 1)J + 1.
4.7 Appendix - Analysis of Shuffling 55
(3) T[k] = S[jk], then swap the content of s[N - k + 1] and S[jk].
(4) k = k + 1. If k 2 N, let T[N] = s[l], then stop; otherwise, go back to
(2).
Notice that after the above process, T[k] = s[N - k + 1], implying that
s[·] contains an inversely ordered version of T[.]' therefore even T[·] is not
needed. The following example further illustrates the algorithm, assuming
the output of random number generator is within [0,1) and N = 10.
s[·] = [1,2,3,4,5,6,7,8,9,10]
rl = 0.46 --+ jl = 5 s[·] = [1,2,3,4,10,6,7,8,9, Q], T[l] =5
r2 = 0.70 --+ j2 = 7 s[·] = [1,2,3,4,10,6,9,8, 7, 5], T[2] =7
r3 = 0.51 --+ j3 = 5 s[·] = [1,2,3,4,8,6,9, 10,7,5], T[3] = 10
4.7 Appendix - Analysis of Shuffling
The detailed analytic study of shuffling introduced in Section 4.2.2 is pre-
sented in this appendix section. For the simplicity of discussion, we formu-
late the problem of analyzing the distribution of embeddable coefficients
after shuffling in terms of a ball game illustrated in Fig. 4.7. Consider we
have a total of S balls, and a fraction p of them or a total of n = pS balls are
blue which represent the embeddable coefficients in our data hiding prob-
lem. The balls are to be placed in S holes randomly with one ball for each
hole. Further, every q holes are grouped together to form a cluster, and the
total number of clusters is N = Sjq. It is important to note that S is a
very large number and q « S. For simplicity, we assume nand N are
integers. We are interested in studying mr j N, the percentage of clusters
each of which has exactly r blue balls, for r = 0, ... , q. Since the blue balls
are the center of focus, we can also view the game as putting all blue balls
in a bag, then take out one ball at a time and randomly throw it to the
unfilled holes. The ball has equal probability falling in each unfilled holes.
The game continues until all blue balls are thrown.
4.7.1 Joint Probability of Histogram
Traditionally, we would start with the joint probability of the histogram
{mo, ml, ... , mq}, which can be found as
[( oq)]yO ... [(q)]yq
q
N!
x~
(4.5)
(~)
56 4. Handling Uneven Embedding Capacity
mr - # of blocks eacb baving r blue balls out of q balls
,... q balls -. N = S/q blocks
0.0000 .0.000 .0.00.
pick w/Q replacemen~
o unchangeable pixel/coeff. S balls in total
• changeable pixellcoeff. n = pS blue baUs
FIGURE 4.7. Illustration of random shuffling in terms of a ball game
The denominator (~) is the number of ways to throw n balls into Sholes,
while the numerator indicates how many of them result in the same his-
togram of [Yo, ... , Yq]. While it is possible to sum up the distribution of
histogram under the constraints
(4.6)
to get the marginal distribution of each bin P( mr ), the computation may
involve high complexity unless q is very small. For this reason, we adopt a
moment-based approach [3] suggested by Kolchin et at. to study the mean
and variance of each normalized bin of the histogram {mr/N} 1
4.7.2 Mean and Variance of Each Bin
Considering the bin of mr where r is an integer between 0 and q, we perform
the following decomposition
mr = (}r,l + (}r,2 + ... + (}r,N, (4.7)
where (}r,i is an indicator function defined as
if ith cluster has r balls,
(4.8)
otherwise.
Computing the mean of (}r,i is equivalent to getting the probability that the
cluster has r balls, i.e. ,
ith
(4.9)
1 Interested readers may refer to [3] for the analysis strategies and results of several
related random allocation problems.
4.7 Appendix - Analysis of Shuffling 57
Since the mean of (}r,i is independent of i, we have
E [mr] = E
N
[E~lN (}r'i] = E(()
r,l
)= (~) (~).
(~:::~) (4.10)
This quantity indicates the average portion of clusters each having exactly
r balls.
The variance is obtained by observing the following relationship:
(4.11)
from which we obtain
N
mr 2 L (}r,k + L (}r,i(}r,j
2 (4.12)
k=l i#j
N
= L (}r,k + L (}r,i(}r,j
k=l i#j
mr + L (}r,i(}r,j. (4.13)
i#j
For i =1= j,
E[() .() .j
T,'t T,J
= P((} . = () . = 1) =
T,t T,J
(~) (~) (~:::~~)
(~)'
(4.14)
indicating that the expected value of (}r,i(}r,j is the probability that two
different clusters, the ith and lh, each having r blue balls. Since this prob-
ability is independent of i and j, we have
(4.15)
Therefore,
Var[~] ~2E[mr2j-[E(~)r
r
= (4.17)
= ~E [~] + (1- ~) E[(}r,1(}r,2j- [E (~) (4.18)
~ . (~) (~:::~) + (1 _~) (~) (~) (~:::~~) - [(~) (~:::~)]2
N (~) N (~) (~)
(4.19)
58 4. Handling Uneven Embedding Capacity
In summary, the mean and variance of the rth bin is
E [ZW]
Var [ZW]
(4.20)
We have presented the simulation result in the main text (Fig. 4.6), showing
that the analytic study and the simulation on the mean and variance match
very well.
4.7.3 More About E[~]
The relation of E[m r / NJ with respect to r, which in our data hiding problem
describes the spread of embeddable coefficients after shuffling, is what we are
mostly interested in. We noted that the distribution P[Or,i = 1) (in Eq. 4.9)
which the mean is equal to is known as a hypergeometric distribution [1).
Given a population of S balls with n of them are blue, a hypergeometric
distribution H(r; S, n, q) describes the probability of getting r blue balls
among a sampling of q balls, where the sampling is performed without re-
placement. Denoting a random variable following this distribution as Y, we
have seen that its probability mass function takes the form of
(;) (~:::;)
P(Y = r) = H(r;S,n,q) = (~) , (4.21)
for r = 0, ... , min(q, n). In our problem, r takes values from 0 to q with
q « n. Noticing the following relationship
(4.22)
from 2::;=0 P(Y = r) = 1, we compute the mean of Y
1 q ( q) ( S - q) (S-1)
n-1
E[Y] = (~) ~ r· r n _ r = q. (~) = p . q, (4.23)
where p is the portion of blue balls in the population and p = n/ S. Similarly,
we obtain the second moment of Y
_1
(nS)
~ r2.
L..J
(q) (S -q)
r n - r
(4.24)
r=l
(h [t,r(r -1) (;) (~=;) + t,r. (;) (~=;)1
4.7 Appendix - Analysis of Shuffling 59
(8-2)
q(q - 1)· (~) + E[Y] (4.25)
(n-l)(q-l) ]
= p.q. [ 8-1 +1, (4.26)
from which the variance of Y can be computed
Var[Y] = E[y2]_ (E[y])2 = P . q. (8 - q)(1 - p) . (4.27)
8-1
To study the relations of H(rj 8, n, q) with respect to r, we simplify the
notation as Hr and study the ratio
(;) (~=;)
= (4.28)
C~l) (n~~~l)
(q - r + 1) . (n - r + 1)
=
r·(8-q-n+r)
= 1 (q+l)(n+l)-r(8+2)
(4.29)
+ r(8-q-n+r) .
Defining ro as
(q + 1)(n + 1) (q + 1)(1 - 2p)
ro = (8 + 2) =P.q+ p+ 8+2 ' (4.30)
we have
{ Hr > Hr- 1
Hr < Hr- 1
Hr = Hr- 1
if r < ro,
if r > ro,
if ro E Z and r = ro.
(4.31)
This indicates that with r varying from 0 to q, Hr first monotonically
increases then monotonically decreases, achieving its maximum value at
r = LroJ except that if ro is an integer, the maximum values is achieved at
both ro and (ro - 1). Such a relation has been confirmed by a numerical
evaluation of Hr shown in Fig. 4.6 and Fig. 4.8 with the parameter setting
taken from the image in Fig. 4.1. In this case, 8 = 640 x 432, q = 64,
p = 15.49%, so ro = 10.0687, implying Hr reaches maximum at r = 10.
This is the same as what we have observed in the numerical evaluation.
4.7.4 Approximations for Hypergeometric Distribution
In [1], Feller pointed out the close relations among the hypergeometric dis-
tribution, the binomial distribution, the Poisson distribution, and the nor-
mal (Gaussian) distribution. Their probability mass functions or probability
60 4. Handling Uneven Embedding Capacity
0.141------r--~::_---,----____r;:==7=l==~===;=:=;_]
- analytic mean (Hypogeometric)
• Binomial
0.12 - -, Gaussian
•••• , Poisson
#-
8 0.1
1
,!$.
0,08
~
'50.06
t
~ 0.04
~
0.02
o 5 10 15 20 25 30 35
# of embeddable coefficients per segment
FIGURE 4.8. Various approximations to the hypergeometric distribution: exper-
iments are performed on the Alexander Hall image (Fig. 4.1).
density function (for the normal distribution) are summarized as follows:
q) (S-q)
hypergeometric H(r·, 8 , n , q) = r
n)
n-r
binomial
(4.32)
Poisson P(r;,\) = >.~
r.
c>'
normal
More specifically, as the population 8 goes to infinity, H(r; 8, n, q) is
approximated by a binomial distribution b(r; q,p), as long as q is much
smaller than 8 so that lims--+oo ~ = O. This approximation is shown as
follows:
(~) (~=~) (4.33)
H(r; 8,n,q)
(~)
= (q)rrr-
1
(n - ~)q-rrr-l (8 -n- k)
r 8-J
j=O
8-r-k k=O
(4.34)
4.7 Appendix - Analysis of Shuffiing 61
Intuitively, this means that when taking a small number of samples from a
large population, the statistical outcomes of sampling without replacement
is approximately the same as that with replacement (or equivalently, from
an infinite population). Since the above-mentioned conditions hold in our
data hiding problem, we have obtained a very good binomial approximation
to E[mr/N], as shown in Fig. 4.8.
Because the behavior of binomial distribution is well studied [1], the bi-
nomial approximation enables our making use of many existing results to
understand the behavior of random shuffling. For large q and small p, a bi-
nomial distribution b(r; q,p) can be approximated by a Poisson distribution
with mean>. = p. q. A binomial distribution can also be approximated by a
normal distribution (with mean p. q and variance p(l- p)q) for large q and
small r. While the q in our data hiding problem is generally not very large,
the Poisson approximation and the Gaussian approximation still give good
numerical matches. Fig. 4.8 shows the hypergeometric distribution with the
parameters taken according to the Alexander Hall image (Fig. 4.1) and the
corresponding binomial, Poisson, and normal approximations. In addition,
the tail of a binomial distribution is known to be bounded by
L~=o b(k; q,p) ~ rv~~~)2 if r < pq,
{ (4.35)
",,00 b(k )< r(l-p) if r > pq.
L..Jk=r ; q, p - (r_pq)2
4.7.5 MOTeAboutVaT[~]
The joint probability P( Or,i = Or,j = 1) which is an important term for com-
puting the variance can also be approximated by sampling with replacement
(or equivalently, from an infinitely large population). That is,
P(Or,i = Or,j = 1) ~ [b(r; q,p)]2 = [(;)pr(l _ p)q-r] 2 (4.36)
Therefore,
Var[~] ~ ~b(r;q,p) + (1- ~ )[b(r;q,p)]2 - [b(r;q,p)]2(4.37)
1
= N· b(r; q,p) . [1 - b(r; q,p)]. (4.38)
This approximation takes the form of f(x) = C . x(1 - x) with x being
replaced by b(r; q,p) and C a constant. The function f(x) describes an arch
shaped curve with the maximal value at x = 1/2. Because the binomial
distribution b(r; q,p) has small values over its support except for very small
q, b(r;q,p) is generally smaller than 1/2 for all r. Therefore, Var[mr/N] is
monotonically increasing with b(r; q,p) for all practical cases. This implies
62 4. Handling Uneven Embedding Capacity
that the trend ofVar[mr/N) with respect to r is the same as that of b(r; q,p)
which has an arch shape with the maximum value around pq. Our numer-
ical evaluation shown in Fig. 4.9 confirms this analysis. Note that special
care should be taken in the numerical evaluation of Var[mr/NJ because it
involves taking the difference between two comparable terms (Eq. 4.20).
2.5
~
i 2
~
'0
:B 1.5
~
'0
g
..
.~
> 0.5
o 5 10 15 20 25 30 35
# of embeddable coeff. per segment (alex. img)
FIGURE 4.9. Comparison of analytic, approximated, and simulated variance of
the histogram of the embeddable coefficients after shufHing. Experiments are per-
formed on the Alexander Hall image (Fig. 4.1).
Part II
Algorithm and
System Designs
5
Data Hiding in Binary Image
In the recent decade, an increasingly large number of digital binary images
have been used in business. Handwritten signatures captured by electronic
signing pads are digitally stored and used as records for credit card pay-
ment by many department stores and for parcel delivery by major courier
services such as the United Parcel Service (UPS). Word processing software
such as Microsoft Word allows a user to store his/her signature in a binary
image file for inclusion at specified locations of an electronic document. The
documents signed in such a way can be sent directly to a fax machine or
distributed across a network. The unauthorized use of a signature, such as
copying it onto an unauthorized payment, is becoming a serious concern. In
addition, a variety of important documents, such as social security records,
insurance information, and financial documents, have also been digitized
and stored. Because of the ease to copy and edit digital images, annotation
and authentication of binary images have become very important.
This chapter discusses data hiding techniques for these authentication and
annotation purposes, possibly as an alternative to or in conjunction with
the cryptographic authentication approach. Such targeted applications calls
for fragile or semi-fragile embedding of many bits. It should be stressed that
while it is desirable for the embedded data to have some robustness against
minor distortion and preferably to withstand printing and scanning, the ro-
bustness of embedded data against intentional removal or other obliteration
is not a primary concern. This is because an adversary in authentication ap-
plications would have much more incentive to counterfeit valid embedded
data than to remove them, and there is no obvious threat of removing em-
bedded data in many annotation applications.
66 5. Data Hiding in Binary Image
Most prior works on image data hiding are for color or grayscale images
in which the pixels take on a wide range of values. For most pixels in these
images, changing the pixel values by a small amount may not be noticeable
under normal viewing conditions. This property of human visual system
plays a key role in watermarking of perceptual data [49, 58]. However, for
images in which the pixels take value from only a few possibilities, hiding
data without causing visible artifacts becomes more difficult. In particular,
flipping white or black pixels that are not on the boundary is likely to
introduce visible artifacts in binary images. Before we present our solutions
to the challenging issues of hiding data in binary images, we shall give a
brief review of the prior art.
Prior Art Several methods for hiding data in specific types of binary
images have been proposed in literature. Matsui et al. [97] embedded infor-
mation in dithered images by manipulating the dithering patterns and in fax
images by manipulating the run-lengths. Maxemchuk et al. [98] changed line
spacing and character spacing to embed information in textual images for
bulk electronic publications. These approaches cannot be easily extended to
other binary images and the amount of data that can be hidden is limited.
In [95], Koch and Zhao proposed a data hiding algorithm that enforces the
ratio of black vs. white pixels in a block to be larger or smaller than 1. Al-
though the algorithm aims at robustly hiding information in binary image, it
is vulnerable to many distortions/attacks, and it is not secure enough to be
directly applied for authentication or other fragile use. The number of bits
that can be embedded is limited because the particular enforcing approach
has difficulty in dealing with blocks that have low or high percentage of
black pixels. In spite of these weaknesses, the idea of enforcing properties of
a group of pixels via the local manipulation of a small number of pixels can
be extended as a general approach of data embedding. Another approach
of marking a binary document was proposed in [93] by treating a binary
image as a grayscale one and manipulating the luminance of dark pixels
slightly so that the change is imperceptible to human eyes yet detectable
by scanners. This approach, targeted at intelligent copier systems, is not
applicable to bi-Ievel images hence is beyond the scope of this chapter. The
bi-Ievel constraint also limits the extension of many approaches proposed for
grayscale or color images to binary images. For example, applying the spread
spectrum embedding, a transform-domain additive approach proposed by
Cox et al. [49], to binary image could not only cause annoying noise on the
black-white boundaries, but also have reduced robustness hence limited em-
bedding capacity due to the post-embedding binarization that ensures the
marked image is still a bi-Ievel one [96]. For these additive embedding ap-
proaches, hiding a large amount of data and detecting without the original
binary image is particularly difficult. In summary, these previously proposed
approaches either cannot be easily extended to other binary images, or can
only embed a small amount of data.
5.1 Proposed Scheme 67
Chapter Organization We propose a new approach that can hide a
moderate amount of data in general binary images, including scanned text,
figures, and signatures. The hidden data can be extracted without using
the original unmarked image, and can also be extracted after high qual-
ity printing and scanning with the help of a few registration marks. The
approach can be used to verify whether a binary document has been tam-
pered with or not, and to hide annotation labels or other side information.
We shall first discuss three key issues of hiding data in binary image in
Section 5.1, along with our proposed solutions. In Section 5.2, we use three
applications and experimental results to illustrate the proposed data hiding
approach. Further discussions on robustness and security are given in Sec-
tion 5.3, including such issues as recovering hidden data from high quality
printing-and-scanning.
5.1 Proposed Scheme
There are two basic ways to manipulate binary images for the purpose of
data hiding. The first class of approaches changes low-level features such as
flipping a black pixel to white or vice versa. The second class of approaches
changes high-level features such as modifying the thickness of strokes, cur-
vature, spacing, and relative positions. Since the number of parameters that
can be changed by the second class of approaches is limited, especially un-
der the requirements of invisibility and blind detection (Le., without using
the original image in detection), the amount of data that can be hidden is
usually limited except for special types of images [98].
We focus in this chapter on the first class of approaches. An image is par-
titioned into blocks and a fixed number of bits are embedded in each block
by changing some pixels in that block. For simplicity, we shall show how to
embed one bit in each block. Three issues will be discussed below: (1) how
to select pixels for modification so as to introduce as little visual artifacts as
possible, (2) how to embed data in each block using these flippable pixels,
and (3) why to embed the same number of bits in each block and how to
enhance its efficiency. The entire process of embedding and extraction is
illustrated in Fig. 5.1.
5.1.1 Flippable Pixels
There is little discussion in the literature on a human visual model for binary
images. A simple criterion, proposed in [95], is to flip boundary pixels for
high contrast image such as text image and to only create rather isolated
pixels for dithered image. We take the human perceptual factor into account
by studying each pixel and its immediate neighbors to establish a score of
how unnoticeable a change of it will cause for a binary image at hand. The
68 5. Data Hiding in Binary Image
embeddin2
test content related data
. . r-----------------------------------------.,
O ~I
bmary nnage : extracted data':' ..
-----.7-----,
extraction
I
shuffle ~I extract I •L__c~_Jn~ venficalion
result
FIGURE 5.1. Block diagram of the embedding and extraction process in binary
images for authentication and/or annotation.
score is between 0 and 1, with 0 indicating no flipping. Flipping pixels with
higher scores generally introduces less artifacts than flipping a lower one.
To assign flippability score manually according only to neighborhood pat-
terns has the shortcomings that the storage of every pattern can be huge,
except for small neighborhood, and that such a fixed assignment does not
offer flexibility when the characteristics of the binary image change .. Our
approach in this chapter is to determine the scores dynamically by observ-
ing the smoothness and connectivity. The smoothness is measured by the
horizontal, vertical, and diagonal transitions in a local window (e.g., 3 x 3),
and the connectivity is measured by the number of the black and white
clusters. For example, the flipping of the center pixel in Fig. 5.2(b) is more
noticeable than that in Fig. 5.2(a) due to the significant change in connec-
tivity of (b). We order all 3 x 3 patterns in terms of how unnoticeable the
change of the center pixel will be. We then examine a larger neighborhood,
such as 5 x 5, to refine the score. Special cases are also handled in larger
neighborhood so as to avoid introducing noise on special patterns such as
sharp corners. By changing the parameters in our procedure, we can easily
adjust the intrusiveness of different kind of artifacts and tailor to different
types of binary images. The details of our score assignment are presented
as appendix in Section 5.5.
(a)
FIGURE 5.2. Two examples of 3x3 neighborhood, for which flipping the center
pixel to white in (a) is less noticeable than that in (b).
5.1 Proposed Scheme 69
5.1.2 Embedding Mechanism
Directly encoding the hidden information in flippable pixels (e.g., set to
black if to embed a "0" and to white if to embed a "I") may not allow
the extraction of embedded data without the original image. The reason
is that the embedding process may change a flippable pixel in the original
image to a pixel that may no longer be considered as flippable. As a simple
example, suppose only black pixels that are immediately adjacent to white
pixels are considered as "flippable". If one such flippable pixel, marked by
thick boundary in Fig. 5.3(a), is changed to white to carry a "1", as shown
in Fig. 5.3(b). It can be seen that after embedding, this pixel is no longer
considered flippable if applying the same rule. This simple example shows
the difficulty for the detector to correctly identify which pixel carries hidden
information if without knowledge of the original image.
after
embedding
(a) (b)
FIGURE 5.3. Illustration of boundary pixel becoming "non-fiippable" after flip-
ping
Instead of encoding the hidden information directly in flippable pixels,
we apply the Type-II embedding discussed in Chapter 3. That is, we embed
data by manipulating pixels with high flippability scores to enforce a certain
relationship on low-level features of a group of pixels. One possibility is to
use the odd-even parity of the total number of black pixels in a block as
a feature. To embed a "0" in a block, some pixels are changed so that the
total number of black pixels in that block is an even number. Similarly, to
embed a "1", the number of black pixels is enforced to an odd number.
We may also choose a quantization step size Q and force the total number
of black pixels in a block to be 2kQ for some integer k in order to embed
a "0", and to be (2k+1)Q to embed a "1". As discussed in Chapter 3, a
larger Q gives higher tolerance to noise at the expense of decreased image
quality. This "odd-even" method can be viewed as a special case of table
lookup similar to that in [137, 139] and Section 7.3. These two approaches
are illustrated in Fig. 5.4, where each possible quantized number of black
pixels per block is mapped to 0 or 1. While other relationship enforcing
techniques are possible, we shall use in this chapter the enforcing of odd or
even number of black pixels in a block for proof-of-concept.
70 5. Data Hiding in Binary Image
# of black pixel per bik ZkQ (Zk+I)Q (Zk+Z)Q (Zk+3)Q
odd-even mapping o o
lookup table mapping o o
FIGURE 5.4. Illustration of odd-even mapping and table lookup mapping
The above approaches can be characterized more generally by:
V~ = arg min
x:T(x)=bi,x=kQ
Ix - vii, (5.1)
where Vi is the ith feature to be enforced 1, V~ is the feature value after em-
bedding, bi is the bit to be embedded in ith feature, and T(·) is a prescribed
mapping from feature values to hidden data values {O,l}. Detection is done
by checking the enforced relationship:
(5.2)
where V~' is the feature extracted from the ith block of a test image, and bi
is the estimated value of the embedded bit in the ith block.
If the same bit is repeatedly embedded in more than one block, majority
voting is performed to determine which bit has been hidden. As discussed
in Chapter 3, more sophisticated coding than simple repetition can also be
used to achieve certain level of robustness against decoding error.
5.1.3 Uneven Embedding Capacity and Shuffling
As outlined earlier, we embed multiple bits by dividing an image into blocks
and hiding one bit in each block via the enforcement of odd-even relation-
ship. However, the distribution of flippable pixels may vary dramatically
from block to block. For example, no data can be embedded in the uni-
formly white or black regions, while regions with text and drawing may
have quite a few flippable pixels, especially on the non-smooth boundary.
This uneven embedding capacity can be seen from Fig. 5.5 where the pix-
els with high flippability scores, indicated by black dots, are on the rugged
boundaries.
General approaches to handling uneven embedding capacity have been
discussed in Chapter 4. Regarding the uneven embedding capacity in a
binary image, using variable embedding rate from block to block is not
feasible for the following reasons. First, a detector has to know exactly
1 In the above case, the feature is the total number of black pixels of the ith block.
5.1 Proposed Scheme 71
:." ~- :;.. :".
........... rr n ir,'
I~'"
r.
il I'~...i'~.
' hiI-·
"'. \ ... ":--. •• ".. " . . .:. ',- .. : " " . . . :...... .. • ••
I I··.... ..
'~~~~ ~-'\ ~.." :.}ov.~ 9'-~-"""~ j,.~. 1.~··iI-:"Y·-:'"l"
"~I 'I I' ." '. I
'L
FIGURE 5.5. A binary image (top) and its pixels with high fiippability scores
(bottom, shown in black).
how many bits are embedded in each block. Any mistake in estimating the
number of embedded bits is likely to cause errors in decoding the hidden data
for the current block, and the error can propagate to the following blocks.
Second, the overhead for conveying this side information via embedding is
significant and could be even larger than the actual number of bits that
can be hidden. We therefore adopt constant embedding rate to embed the
same number of bits in each region and use shufRing to equalize the uneven
embedding capacity from region to region.
~. .. '.
'"..I ... ... " i';. :.\'
,~ . ....." .o'.i , "l" ..
I" A
, ... ..., ...
, \.\7 ""f •
~
...~, . lT~1 I:. .'
i..... "
~:
"
..
'
~
..... I
" .h
!__ J"
'"..
." ..... ..~. : ..., .... :;". .".. .~'.. -. ",,:;'. ;. a::
..
'.
'.'
. ' ...
"; '.~ ".:...: :: . .... . ':.,.. ." .~... ." . ," .,:. .:.. ...... :.".... ".:
~
;:
:.. '. :.': '"0 • • '.' ••
.. . "'. "'. ' • •: .... : ••::":' ".:": '."to' • -=t· .. :": '. ".: ".' .
FIGURE 5.6. Distributions of fiippable pixels per 16x16-pixel block of the binary
image in Fig. 5.5, before shuffling (top) and after shuffling (bottom).
Fig. 5.6 shows the fiippable pixels before and after a random permutation
of all pixels, and Fig. 5.7 shows the histogram of the number of fiippable pix-
els in one 16 x 16-pixel block. It is seen that the distribution before shufRing
extends from 0 to 40 fiippables per block and that about 20% of the blocks
do not have any fiippable pixels. The distribution after shufRing, shown in
the dotted line, concentrates from 10 to 20, and ALL shufRed blocks have
fiippable pixels. This equalization capability of shufRing has been analyzed
72 5. Data Hiding in Binary Image
- before shuffle
_._. after shuffle
0.2 - - - - -,- - - - - -1- - - - - -,- - - - - -, - - - - - -: - - - - - ..., - - - - - "1 - - - - - "1 - - - - - T - - - - -
I
,
I I I
,
I
,
I I
, ,
, ,
'8 0.15
III
- - - - -1- - - - - -1- - - - - -1- _____ 1_ - - - - -1- ____ -I - - - - _ -l _____ -l __ - __ -+ ____ _
,
I
,
I I I
:is
'0
§
i 0.1
0.05
5 10 15 20 25 30 35 40 45 50
ambeddbla coati. # per block (signature img)
FIGURE 5.7. Histogram of flippable pixels per 16x16-pixel block of the binary
image in Fig. 5.5, before shuffling (solid line) and after shuffiing (dotted-dash
line).
in Chapter 4. Plugging into Eq. 4.1 and Eq. 4.2 the parameters 2 of the
binary signature image of Fig. 5.5:
block size q = 16 x 16
{ image size
block number
fiippable percentage
8=288x48
N = 8/q = 18 x 3
p= 5.45%
we compute the mean and the standard deviation of the histogram. The
analytic results are shown in Fig. 5.8, along with the simulation results
from 1000 random shuffles. The analysis and simulation are seen to agree
well, and the percentage of blocks with no or few fiippables is extremely
low. The statistics of blocks with no or few fiippables are also shown in
Table 5.1. Error correction coding can be applied to the embedded data to
deal with those few blocks that have no fiippable pixels. As illustrated by
the block diagram in Fig. 5.1, the embedding of one bit per block described
in Section 5.1.2 is performed in the shuffled domain, and inverse shuffling is
performed to get a marked image.
2In this analysis, we set a threshold of 0.1 on the flippability score and consider the
pixels with score higher than this as flippable.
5.1 Proposed Scheme 73
TABLE 5.1. Analysis and simulation of the blocks with no or few flippable pixels
before and after shuffling for the binary image in Fig. 5.5.
before mean after shuffle std after shuffle
shuffle analysis simulation analysis simulation
maiN (()Ihbin) 20.37"/0 5.16x1()-5% 0% 9.78x1()-5 0
m1/N (1st bin) 1.85% 7.nx111"'% 0% 3.79x1Cl"4 0
miN (2nd bin) 5.56% 5.81x104% 5.56x10"3% 0.0010 0.0010
0.251-i--i-i--~---:--;::=::::::::==:===::==:sl
_._. before shuff
, o simulation mean
x simulation std
;; 0.2 :----I
-r---- - analytic mean
before 'shuffle
•..... analytic std ~----r---
--:- 1"",----"----------'--', :
§ . : : : :: I
i;
::l!5
0.15 i---
i
•
---t-------t-------!-r--
:
I
:
I
mean after shuffle---;-
:
I
:
I I
-o i. I
I
I
I
I
I
I
I
c: ! I I I I
:eo 0.1
I
r-------~-_-----L--
I I I I
~--- ___
I
.L
I
...1---
8. \ 1 ~td afte'r shuffle 1
i ,
0.05 i-~- -,, -,,
!II! "'•
,it \l:'.
J.
I.
'I
,
I '.
10 15
# of flippable pixels per block (signature img)
FIGURE 5.8. Analysis and simulation of the statistical behavior of shuffling for
the binary image in Fig. 5.5.
We have also discussed in Chapter 4 that shuffling does not produce more
flippable pixels. Instead, it dynamically assigns the flippable pixels in active
regions and along rugged boundaries to carry more data than less active
regions. This is done without the need of specifying much side information
that is image dependent. Shuffling also enhances security since receiver side
need the shuffling table or a key for generating the table to correctly extract
the hidden data.
74 5. Data Hiding in Binary Image
5.2 Applications and Experimental Results
The proposed data hiding for binary image is targeted mainly at authenti-
cation and annotation. In this section, we present three specific applications
along with experimental results.
5.2.1 "Signature in Signature"
Unauthorized use is a potential concern for the increasingly popular use
of digitized signature. A "Signature in Signature" system annotates the
signer's signature with the data that is related to the signed documents,
so that the unauthorized use of a signature can be detected [94]. Here the
second "signature" refers to the actual digital version of a person's signature,
while the first "signature" refers to a checksum related to the document
content or other annotation information.
The data hiding method proposed in this chapter can be applied to an-
notating a signature in such applications as faxing signed documents and
storing digitized signatures as transaction records. Compared with the tra-
ditional cryptographic authentication approach [25] that has been used in
secure communications, the proposed data embedding based approach has
the advantage of being user-friendly, easy to visualize, and integrating the
authentication data with the signature image in a seamless way.
An example is demonstrated in Fig. 5.9, in which 7 characters (49 bits)
are embedded in a 287 x 61 signature. The embedding rate is 1 bit per
block of 320 pixels. The top is the original signature; the middle is the
signature after embedding, which is indistinguishable from the original; and
the bottom shows where the altered pixels are 3.
5.2.2 Invisible Annotation for Line Drawings
Artists may wish to annotate their work with information, such as the cre-
ation date and location, in such a way that the annotation data interfere
minimally with perceptual appreciation. Our proposed data hiding approach
can be used to invisibly annotate line drawings such as the 120 x 150 picture
of Fig. 5.10. In this example, a character string of the date "01/01/2000" is
embedded in Fig. 5.1O(b). We can see that the annotation does not interfere
with perceptual appreciation in any perceivable way.
3The gray areas in Fig. 5.9(bottom) and Fig. 5.1O(c) visualize the strokes, and those in
Fig. 5.11(d) visualize the background. We show them in gray to assist viewers associating
the difference between the original and the marked image with their precise location in
the images. The pixelwise differences are indicated by black pixels.
5.2 Applications and Experimental Results 75
[ole Edit loois ~ndow !:!~
Embed
PUEEY2K
M"'ked/T eslllMge test1
Fiename
P1JEEY2K
D~felence
FIGURE 5.9. "Signature in Signature". (top) the original image, (middle) a
marked copy with 7 letters (49 bits) embedded in, (bottom) the difference be-
tween the original and the marked (shown in black) .
5.2.3 Tamper Detection for Binary Document
A large number of important documents have been digitized and stored
for records. The authentication of these digital documents as well as the
detection of possible tampering are important concerns. The data hiding
techniques proposed in this chapter can be applied for such purposes, as
an alternative to or in conjunction with the cryptographic authentication
approach. More specifically, data are be embedded in an image in such a
fragile way that it will be obliterated if the image is altered and/or that it
no longer matches some properties of the image, indicating the tampering
of content 4. The hidden data may be an easily recognized pattern, or some
features or their digest version related to the content of host image.
Shown in Fig. 5.11(a) is a part of a U.S. Patent, consisting of 1000 x 1000
pixels. This binary image contains texts, drawings, lines, and bar codes.
Fig. 5.11 (b) is a visually identical figure, but with 976 bits embedded in it
using the proposed techniques. In this particular example, 800 bits of the
embedded data come from a "PUEE" pattern shown in Fig. 5.11(g). If the
4More discussions on authentication via data hiding can be found in Chapter 7.
76 5. Data Hiding in Binary Image
(~ ~)
(c)
FIGURE 5.10. Invisible annotation for line drawings: (a) the original image; (b)
a marked copy with 10-Ietter date information (70 bits) embedded in; (c) the
difference between the original and the marked (shown in black).
date "1998" on the top is changed to "1999" , the extracted data will be the
random pattern shown in Fig. 5.11(g) , indicating that alteration was made.
5.3 Robustness and Security Considerations
In this section, we discuss the robustness and security issues of the proposed
scheme. Other considerations associated with shuffling, such as the methods
for handling bad shuffles and for adaptively choosing the block size, can be
found in Chapter 4.
5.3 Robustness and Security Considerations 77
'-"'-"----yjl
on and T. S ham I 5,825,892 1
~ i ng for image
lQct:~~?9~J
5.825,892
n
1111 1)lll el'll NU luht r:
(e)
]151 1)lI le M i'Jil l"lI(; Oct. 20, 191)8
:. on Image Pn
alter
'-------11
I 5,825,892 !
: I
IOct. 20~~99j
(I)
Ill! r'a l ~n l 1\'ullIb{':r. 5,825J892
loUl Ilalc of 1'lIlel1l : (kl. 20, 1991{
(b) on and T. S ham
~_ _-,-_~ :ing for image~
:. on Image Pn
(e)
(gl
FIGURE 5.11. Data hiding in binary document image: (a) original copy, (b) a
marked copy with 976-bit embedded in, (c) magnified original image, (d) difference
between original and marked (shown in black), (e) magnified marked image, (f) a
portion of the image where alteration is done (on the marked image) by changing
"1998" to "1999", (g) among the 976-bit hidden data, 800 bits forms a "PUEE"
pattern; the 800-bit data patterns extracted after alteration is visually random
and significantly different from the originally embedded "PUEE" .
5.3.1 Analysis and Enhancement of Robustness
As in other Type-II embedding discussed in Chapter 3, the robustness
against noise of the embedding mechanism presented in Section 5.1.2 is
quite limited, and generally depends on whether and how much quantiza-
tion or tolerance zone we applied. Let us consider the simple odd-even case
with no quantization, i.e., the total number of black pixels is enforced to
an even number to embed a "0", and to an odd number to embed a "1" .
When a single pixel is changed due to noise, the bit embedded in the block
to which the pixel belongs will be decoded in error. When several pixels in
an embedding block are subject to be changed, whether or not the bit can
be decoded correctly depends on how many pixels are flipped; if the change
is independent from pixel to pixel and is with probability p for each of n
pixels where n 2: 1, the probability of getting a wrongly decoded bit is
p.
el
= ~
~
(n) k P
k(1 _ )n-k
P
= 1 - (1 - 2pt
2 (5.3)
k=l,k odd
78 5. Data Hiding in Binary Image
The error probability Pel is small for small p and small n. In this case, error
correction encoding can be applied to correct errors if accurate decoding
of hidden data is preferred. When p is close to 0.5, so is Pel' implying the
difficulty in embedding and extracting data reliably. Notice that because of
shuffling, the assumption of independent change is likely to hold even if the
noise involves nearby pixels since adjacent pixels in the original image will
be distributed to several blocks. If the total number of changed pixels in
the whole image is small (no matter whether they are close to each other
in the original image or far away), it is likely that most of those pixels
are involved in different embedding blocks hence the extracted bits from
those blocks will be wrong. On the other hand, if many pixels have been
changed, each embedding block may include several of these pixels and the
decoded bit from each block is wrong with approximately 0.5 probability.
This implies that the decoded data are rather random, as what we have seen
in Fig. 5.11(g). The case of incorporating quantization or tolerance zone can
be analyzed similarly.
Besides the noise involving flipping of individual pixels, misalignment is
another cause of decoding errors. For this matter, using shuffling has the
disadvantage of increasing the sensitivity against geometric distortion such
as translation. This is due to the shift-variant property of the shuffling oper-
ation, i.e., the shuffling result of a shifted image is very different from that of
the non-shifted one. To alleviate the sensitivity with respect of translation,
we can hide data in a cropped part of the image, as shown in Fig. 5.12. With-
out loss of generality, we consider the case of black foreground and white
background. The upper-left point of the data hiding region is determined
by the uppermost and leftmost black pixel, and the lower-right point is by
the lowermost and rightmost black pixel. The data hiding region therefore
covers all black pixels. This approach can reduce the sensitivity to shifts as
long as both embedding and detection system agree on the protocol and no
cropping or addition of the outermost black pixels is involved.
~~-~~~-----'------------1
~l)l _________________________________ _
original boundary
data hiding region oftbe image
FIGURE 5.12. Improving robustness against small translation. Here we use the
outermost black pixel to determine a data hiding region (indicated by a dash box)
covering all black pixels.
5.3 Robustness and Security Considerations 79
In addition to the above approach, adding registration marks such as a
signature line helps to survive high-resolution printing and scanning. Re-
covering image from printing and scanning with precision as high as one
pixel is a non-trivial task, because this D I A-AID process may result in
small rotation, up-scaling of an unknown factor, and noisy boundary. If one
pixel in the original image corresponds to one or less than one pixels in the
scanned version, it will be very difficult to combat the distortion introduced
by the DI A-AID process. On the other hand, if significant oversampling is
performed so that one original pixel corresponds to a number of pixels in
the scanned version, it would be possible to sample at the center of each
"original" pixel, avoiding the noise introduced on the boundary andlor by
the rounding errors in de-skewing process. The registration marks help to
identify the boundary and the size of the original image as well as to correct
skewing. We noted that while the size of one original pixel represented in
the scanned image may be estimated from a well-designed registration mark
(e.g., we may estimate that one original pixel corresponds to 8 x 8 pixels
in a scanned image), minor errors in such estimation could be accumulated
when determining the width and height of the original image up to single
pixel precision. To overcome this problem, we impose constraints on the
width and height of original images, for example, to be multiples of 50.
Fig. 5.13 shows one possible design of registration marks, accompanied
by a visualization of the estimated pixel centers. A more natural and less
intrusive design is shown in Fig. 5.14(a), which adds a dotted signature box
that resembles what has been commonly used in the current practice of sign-
ing. The four corner marks of the signature box and the dash line segments
on the four sides at an interval of 50 pixels horizontally and of 25 pixels
vertically serve as a ruler to facilitate the recovery. In this experiment, we
imported the signature image into the Microsoft Word 2000 program at res-
olution 72dpi, printed out the image using an HP LJ4100DTN laser printer,
and scan it back with 600dpi resolution and 256 gray levels using a Mi-
crotek 3600 scanner. The image is binarized with a threshold that equals
to the mean of the maximum and minimum of scanned luminance values.
We use the registration marks to determine the image boundary, to perform
de-skewing, and to compute the proper scaling factor. The estimated center
of each original pixel in the scanned version is shown in Fig. 5.14(b). Sam-
pling at those pixels can recover the original digital image perfectly from
the scanned one hence allow the embedded data to be extracted correctly.
More detailed discussion on the recovery of binary image from printing and
scanning can be found in the Appendix Section 5.6.
The above mentioned approaches fall in the category of visible registra-
tion. The key purpose is to use the marks as reference to determine the
boundary of binary image and the scaling factor after the printing-and-
scanning process, and in turn, recover the image accurately. It should be
noted that the accurate recovery of image is needed regardless of how the
authentication data is stored (invisibly or visibly). Attaching authentication
80 5. Data Hiding in Binary Image
data separately (e.g., to put a message digest or a cryptographic digital sig-
nature in the form of a text string or a bar code next to the image to be
authenticated) does not solve the authentication problem under printing-
and-scanning. This is because even though in such a case the authentication
data can be easily and accurately obtained, one still has to recover the digital
version of the image to be authenticated, compute digest or signature from
this recovered image, and then compare with the attached authentication
data.
Recovery from printing-and-scanning using fewer or no visible registration
marks is desirable and is a direction of future work. For example, we may
embed a bit sequence known to detector in addition to the main payload.
If the detector cannot successfully extract this sequence from a test image,
it will perform an extensive search to estimate the distortion parameters.
These parameters can be used to produce an undistorted image from which
we can correctly extract the known sequence as well as other embedded
data.
..J .L .L .L .L L
(a) -I
, r
(b)
........ ,
.... .....
•••••••••
~
1
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - __ I
FIGURE 5.13. Example No.1 on recovering binary image from high quality print-
ing and scanning. (a) Cross-shape marks are added at four corners and at four
sides at an interval of 50 pixels horizontally and of 25 pixels vertically, helping to
determine the boundary, the scale, and the skewing angle of a scanned image; in
addition, the width and height of original images are constrained to be multiples
of 50; the image is imported to Microsoft Word at 72dpi, printed out via a laser
printer, and scanned in with 600dpi and 256 grey levels; the size of the scanned
image is 2028x444. (b) The estimated centers of each original pixel are shown
in light color; sampling at those centers can recover the original binary image
perfectly from a scanned one.
5.3 Robustness and Security Considerations 81
r ........ -- ....... -- ....... -- ....... -- ....... -- ........ .,
•
(a)
(b) ,
"
:-""
I"
. •••• I
I' ... I
I.
L ..... . ........... "..... . . . . . . . . . . . II
_____________________________________________________
FIGURE 5.14. Example No.2 on recovering binary image from high quality print-
ing and scanning. (a) A specially designed dotted signature box helps determine
the boundary, the scale, and the skewing angle of a scanned image; in addition,
the width and height of original images are constrained to be multiples of 50; the
image is imported to Microsoft Word 2000 at 72dpi, printed out via a laser printer,
and scanned in with 600dpi and 256 grey levels; the size of the scanned image is
2731 x 643. (b) The estimated centers of each original pixel are shown in light
color; sampling at those centers can recover the original binary image perfectly
from a scanned one.
5.3.2 Security Considerations
In Chapter 2, we have drawn analogy between data hiding and communi-
cation. The embedding methods serve as physical communication layer, on
top of which other functionalities and features can be built. For instance,
security issues may be handled by top layers in authentication applications,
in which the major objective of an adversary is to forge authentication data
so that an altered document can still pass the authentication test.
The security issues can be addressed by using traditional cryptography-
based authentication to produce a cryptographical digital signature and by
embedding it in the binary image. This traditional approach relies on a cryp-
tographically strong hash function to produce a digest of the document to
be signed as well as on public-key encryption to enable verification without
giving up the encryption keys, hence only authorized person can produce
a correct signature [25]. By using embedding, we not only save room for
storing and/or displaying the cryptographical signature separately, but also
can associate the authentication data with the media source in a seamless
way.
Although a cryptographical signature can be adopted to form (part of) the
embedded data, the embedding approach proposed in this chapter has the
potential of allowing plaintext to be embedded since secret information such
as keys/seeds have already been incorporated via shuffling and/or lookup
table. However, envisioning potential malicious attacks such as those studied
82 5. Data Hiding in Binary Image
in [158], it is important to study the following two problems for authentica-
tion applications, assuming that the attacker has no knowledge about any
secret keys: (1) the probability of making content alterations while preserv-
ing the m-bit embedded authentication data, and (2) the possibility for an
adversary to hide specific data in an image.
For the first problem, we have discussed in Section 5.3.1 that an n-pixel
alteration on a marked image would change the decoded data. If n is small
compared to the total number of blocks m, there are approximately n bits in
the decoded data that will be different from the originally embedded one; if
n is large, the probability of getting the decoded data to be exactly the same
as the originally embedded one is approximately 2- m , which is very small
as long as m is reasonably large. Therefore, the threat of making content
alterations while preserving the m-bit embedded authentication data is very
low.
For the second problem, it depends on whether or not multiple water-
marked versions of the same image with different data embedded are avail-
able to an adversary. When multiple copies are not available, it is extremely
hard for an adversary to embed specific data in an image, even if he/she
knows the algorithm. This is contributed by the secrecy in shuffiing ta-
ble. However, in such applications as "signature in signature", an adversary
may be able to obtain multiple copies, for example, signatures embedded
with different signing date or different payment amount. This is similar to
the plaintext attack in cryptography [25]. We would like to know whether
he/she can derive information regarding which pixels carry which bit by
studying the difference between those copies hence create new images em-
bedded with specific data (e.g., specific date or payment amount). If the
embedding imposes the minimal necessary changes to enforce a desirable
relationship (for example, in the odd-even case, at most one pixel will be
flipped in each embedding block), the pixels that differ among the multiple
copies are those used to embed hidden information. Assuming an adversary
collects sufficiently many copies and knows what data is embedded in each
copy, he/she will be able to identify which pixels carry which bit and to
hide his/her desired data by manipulating the corresponding pixels.
To prevent the above mentioned attack, we have to introduce more uncer-
tainty. One approach is to use a different shuffiing table, for example, choose
one table from K candidate ones, similar to the approach used for handling
bad shuffies in Section 4.2.3. Another approach is that instead of making
minimal changes for hiding one bit in each embedding block, we also flip,
with probability of 0.5 in each block, an additional pair of flippable pixels.
Consider an example of embedding a "0": if the number of black pixels in
a shufHed block is already an even number, with a total probability of 0.5
we flip an additional pair of pixels selected arbitrarily from three highly
flippable pixels; if the number of black pixels is an odd number, with a total
probability of 0.5 and 0.5 we flip all three highly flippable pixels or flip one
pixels selected arbitrarily from those three pixels. When more than three
5.4 Chapter Summary 83
highly flippable pixels are available, we may make the above selection from
a larger pool. Now if we look at two image copies whose hidden data differ
in just one bit, the difference between the two images via minimal-change
embedding is just at one pixel, while the difference via the above-mentioned
randomization involves many other pixels in a random fashion. In the latter
case, if a total of N bits are embedded, we can show that on average there
will be (4N + 1)/3 pixels being different. When N is sufficiently large, it is
very difficult for an adversary to identify which pixels are associated with
which bits. As a tradeoff, the randomization requires three flippable pixels
to be available for each shuffled block and changes more pixels at the embed-
ding step. Note that this countermeasure assume that for any given hidden
data, only one copy of a marked image is available to an attacker. Other-
wise, he/she may be able to average out the randomization and compromise
our solution.
5.4 Chapter Summary
This chapter addresses the problem of data hiding for binary images. We
propose a new fragile or semi-fragile data hiding method for the authentica-
tion and annotation of binary images. The method manipulates "flippable"
pixels to enforce a specific block-based relationship to embed a significant
amount of data without causing noticeable artifacts. Shuffling is applied
before embedding to equalize the uneven embedding capacity. The hidden
data can be extracted without using the original image. With the help of
a few registration marks, they can also be accurately extracted after high
quality printing and scanning. The algorithm can be applied to detect unau-
thorized use of signatures in binary image format, to detect alterations on
documents, and to annotate signatures and drawings.
Some directions for future investigation include the refinement of flippa-
bility model for different types of binary images (texts, drawings, dithered
images, etc.), and the recovery of binary image from high quality printing
and scanning using fewer or no visible registration marks.
Acknowledgement We would like to thank Prof. Adam Finkelstein of
Princeton University for the enlightening discussion on data hiding in binary
images and for proposing its application of "signature in signature", and
Ed Tang of Princeton Summer Institute '99 for his contribution to the con-
nectivity criterion for generating flippability scores. The signature image in
Fig. 5.9 was edited from http://www.whitehouse.gov / WH/EOP lOP /html/
OP _ Home.html as of the Year 2000, the artistic line drawing in Fig. 5.10 was
from the Clip Art collections of Microsoft Office software, and the patent
image in Fig. 5.11 was edited from http://www.patents.ibm.com/details
?&pn= US05825892 __ as of Year 2000.
84 5. Data Hiding in Binary Image
5.5 Appendix - Details of Determining Flippability
Scores
In this Appendix, we describe a procedure for computing flippability scores
of pixels in non-dithered binary image. The scores are used to determine
which pixels to flip with high priority during the embedding process. We
use a 3 x 3 window centered on the pixel. The procedures can be further
refined by studying a larger neighborhood and by using more extensive
analysis, especially for Step-2. The parameters and rules should be adjusted
for dithered image.
Step-l Compute smoothness and connectivity measures of 3 x 3
patterns.
The smoothness of the neighborhood around pixel (i,j) is measured by
the total number of transitions along four directions in the 3 x 3 window:
horizontal
I D
L L I({PHk,H1 =I- PHk,Hl+I}),
k=-ll=-l
vertical
I D
L L I({PHI,Hk =I- PHI+I,jH}),
k=-ll=-l
diagonal
Nd1(i,j) 2: I( {PiH,HI =I- PiH+I,Hl+Il),
k,IE{ -I,D}
anti-diagonal
Nd2(i,j)
kE{D,I},IE{ -I,D}
where I(·) is an indicator function taking values from {O, I}, and Pi,j de-
notes the pixel value of the ith row and jth column of the image. These
computations are also illustrated in Fig. 5.15. Note that regular patterns,
such as straight lines, have zero transition along at least one direction, as
shown in Fig. 5.16.
The connectivity is measured by the number of the black and white clus-
ters. A commonly used criterion, illustrated in Fig. 5.17, considers the pixels
that have the same pixel value, and touch each other along the 90 degree
direction or neighbor along the 45 degree direction as connected. The 90-
degree touching is often known as four-way connectivity, and 90-degree or
45-degree touching is known as eight-way connectivity [12J. We use 4-way or
8-way connectivity criterion depending on the specific constraints of visual
artifacts. A graph for black (or white) pixels can be constructed, in which
each vertex represents a black (or white) pixel, and there is an edge between
two vertices if and only if the two corresponding pixels are connected. An
5.5 Appendix - Details of Determining Flippability Scores 85
o 1 I I
total diagonal
transitions dl
~
=2
tota l horizontal
transitions
«
h= 3
total anti-diagonal
transitions dl = 2
n
\7
tota l vertical
o I I I I 0 transitions Ny = 2
FIGURE 5.15. Illustration of transitions in four directions, namely, horizontal,
vertical, diagonal, and anti-diagonal. The number of transitions is used to measure
the smoothness of the 3 x 3 neighborhood.
total horizontal
transitions h = 0
FIGURE 5.16. Regular patterns such as straight lines have zero transition along
at least one direction. Showing here is part of a horizontal line with zero horizontal
transition.
example is shown in Fig. 5.18 with five black pixels forming two clusters and
four white pixels forming one cluster. The number of clusters can be auto-
matically identified by traversing the graph using depth-first search strategy.
Here we present a stack-based implementation of non-recursive depth-first
search algorithm, adapted from [4J. We assume that there are M pixels
in total (counting both white and black), and the final value of "counter"
indicates the number of clusters.
e
(1) Initialization: let p[kJ store the value of h pixel and q be the pixel
value of interest (i.e., q is black if to find black clusters, and vice versa) ;
set up an empty stack and an M-element array label[·] for storing the
index of the cluster that each pixel belongs to; set label[k] = 0 for all
k = 1, ... , M; i = 1; counter = O.
(2) If label[i]#- 0 (i.e., it has already been visited) or p[i] #- q, go to (7).
(3) counter = counter + 1; push node-i into the stack.
(4) If the stack is empty, go to (7).
86 5. Data Hiding in Binary Image
(5) k = pop( ) from stack; label[k] = counter.
(6) Find all pixels directly connected with k. For each connected pixel j,
if label[j] = 0 (i.e., it has not yet been visited or pushed into stack),
assign label[j] = -1, and push node-j into stack 5. Go back to (4).
(7) i = i + 1; if i > M, stop, otherwise go to (2).
I I I
- S -0
r -4 to-uc-h w
- ith
-. ---, ~_:_:? I'~
_ ,- ;_~:_~ _~~e_rh
X ;_:!h_el --'
the center pixel /"" ~=~ .~ I I
I I
FIGURE 5.17. The pixels that have the same pixel value, and that touch each
other along the 90 degree direction (Le., (i,j ± 1) or (i ± 1,j)) or neighbor along
the 45 degree direction (Le., (i + 1, j ± 1) or (i - 1, j ± 1) ) as connected.
3x3 pattern connectivity graphs for black pixel with two clusters
and for white pixels with one cluster, respectively
FIGURE 5.18. Graph representation of the connectivity for black and white pixels.
Showing here is an example of 3 x 3 pattern with five black pixels forming two
clusters and with four white pixels forming one cluster. Four-way connectivity
(gO-degree touching) is considered in this example.
Step-2 Compute flippability score.
The smoothness and connectivity measures are passed into a decision
module to produce a flippability score. Main considerations when designing
this module are: (1) whether the original pattern is very smooth, (2) whether
flipping will increase non-smoothness by a large amount, (3) whether flip-
ping will change the connectivity. These changes or the artifacts on these
patterns are generally more significant. Listed below are the rules that our
decision module follows:
5Note that by the definition of connected, p[j] = q.
5.5 Appendix - Details of Determining Flippability Scores 87
(1) The lowest score (Le., not flippable) is assigned to uniform white or
black regions as well as to the isolated single white or black pixels.
These trivial cases are handled first.
(2) If the number of transitions along horizontal or vertical direction is
zero (Le., the pattern is very smooth and regularly structured), assign
zero as a final score for the current pixel. Otherwise, assign to the
pixel a base score S B and proceed to the next rule.
(3) If the number of transitions along diagonal or anti-diagonal direction is
zero, reduce the score. Otherwise, if the minimum number of transition
points along anyone of the four directions is below a given thresh-
old T 1 , which means the pattern is rather smooth, reduce the score
by a smaller amount. Note that we treat smooth horizontal/vertical
patterns and diagonal/anti-diagonal patterns differently because the
artifacts along the horizontal/vertical patterns are likely to attract
more attention from viewers.
(4) If flipping the center pixel does not change the number of transition
points, increase the score. Otherwise, if flipping results in the increase
of transition points (Le., reduces smoothness and makes the pattern
more noisy), decrease the score.
(5) If flipping changes the number of black clusters or white clusters,
reduce the score.
Applying these rules produces a lookup table of all 3 x 3 patterns ordered
in terms of how unnoticeable the change of the center pixel will cause. For
small neighborhood such as 3 x 3, this table has a small number of entries
(2 3X3 = 512) hence can be off-line computed. The flippability score of every
pattern in an image can then be determined by looking up the stored table.
When larger neighborhood is involved, the table size increases exponentially
and may exceed the available memory size for particular applications. This
problem can be solved by a hierarchical approach, namely, to obtain pre-
liminary flippability measure based on a small neighborhood (e.g., 3 x 3)
by table lookup, and then if necessary, to refine the measure by on-line
computing based on a larger neighborhood (see also Step-3).
Step-3 Handle special cases.
Some special cases involving larger neighborhood can be handles by de-
tecting specific patterns such as sharp corners to avoid introducing annoying
artifacts on them.
Step-4 Impose minimum distance constraint between two flippable
pixels.
Up to now, the flippability evaluation is done independently for the pat-
tern revealed in a moving window centered at each pixel, assuming that any
88 5. Data Hiding in Binary Image
pixels other than the center one will not be flipped. Pixels that are close
to each others may be considered flippable by this independent study, but
simultaneously flipping them could cause artifacts. We handle this problem
by imposing constraints on the minimum distance between two pixels that
can be flipped and pruning the pixels with relatively low flippability in its
neighborhood.
Step-5 Assign a predetermined score to the remaining boundary
points (optional).
Edge pixels that have not yet been assigned a non-zero score will be given
a small flippability score. These pixels serve as a base line for hiding a par-
ticular bit when there is no pixel with higher score available to carry the
data to be embedded. Adding this step helps to achieve a high embedding
rate without significantly affecting the visual quality.
Shown in Fig. 5.19 is one possible lookup table for 3x3 patterns, excluding
patterns that differ only by rotation, mirroring, or complement. Here we
set the threshold Tl = 3, the base flippability score BB = 0.5, and the
flippability adjustments in Step-2 are multiples of 0.125.
For dithered image, some criteria and parameters need to be revised.
For example, a pixel is given high flippability if its flipping does not cause
larger relative change in local intensity, and the connectivity is given less
consideration. Techniques in lossy bi-Ievel image compression such as those
in JBIG2 activities [15] may provide further insights to data hiding, and the
methods used in data hiding may also contribute to compression.
5.6 Appendix - Details on Recovering Binary
Images After Printing and Scanning
In Section 5.3.1, we described adding special marks at four corners and at
four sides to serve as a ruler for registration purpose. Identifying the cross
points of the corner marks in a scanned image is the first step in recovering
binary image from high quality printing and scanning. Here we propose a
projection based approach under the assumption that the approximate re-
gion of a registration mark to be recovered has already been specified. For
white background, this range should include the entire mark and preferably
no other black pixels. A white outer layer of fixed width is added to the
source image to facilitate the identification of the mark regions, as shown in
Fig. 5.20. The approximate region containing the mark can be either man-
ually specified via an interactive interface or automatically determined via
pattern matching. For simplicity, the manual approach is used in our exper-
iment, and reasonable effort is made during scanning so that the skewing of
each mark is negligible.
5.6 Appendix - Recovering Images After Printing & Scanning 89
FIGURE 5.19. One possible flippability lookup table for 3 x 3 pattern, excluding
patterns that differ only by rotation, mirroring, or complement. Larger value in-
dicates that the change of center pixel is less noticeable hence the change is more
likely to be made for hiding information.
___ J_ white margin with
r··· ····················· ····· .. ..l
fixed width for
~ : ea y identification
of the mark region
registration mark I :
I :
~ :
boundary de fi ned by
I •
J
registration marks
original image I •
boundary
FIGURE 5.20. Close-up view of corner registration marks used in Fig. 5.13. A
white outer layer of fixed width is added to facilitate the identification of the
approximate region of each mark after print-and-scan.
90 5. Data Hiding in Binary Image
To determine the cross point of a mark, we perform horizontal and ver-
tical projections and get two profiles. The profiles have a unique "plateau"
corresponding to the horizontal and vertical stroke, respectively. As illus-
trated in Fig. 5.21, the centers of the two plateaus determine the y- and x-
coordinates of the cross point, respectively.
plateau
o 20
horizontal proj.
'e-
~20
()
~ OL-~--~----~--________~______-u plareau
FIGURE 5.21. Determining the cross point of a registration mark by performing
horizontal and vertical projection. The centers of the two projection plateaus are
used as y- and x- coordinates of the cross point.
Using the identified cross points of registration marks, we can determine
the skewing angle Ct of the entire scanned image, as illustrated in Fig. 5.22.
The scaling factors can be estimated as follows: assuming the original image
size has been determined as N w x N h and the scanned image size is W x H,
all measured in pixels 6 . We further assume the coordinate of the upper-left
pixel in both the scanned image and the original image is (0,0) . Considering
6Recall that we have impose constraints that the width and height of original binary
image has to be multiples of 50. The actual multiplication factor can be determined by
registration marks that serve as a ruler. Alternatively, the multiplication factor can be
determined by estimating from the width of registration marks how many pixels in the
5.6 Appendix - Recovering Images After Printing & Scanning 91
x
- - - -5 -r:r - - - - - - - - ..
l .. :j....:"."'j~..-=-•.::.-.;;;-..:-:..:-:..;-;;..-;;",..-----------------"""""..;;;.-------,r-
r"'(...
: ... . - - ~
I .. I :"'~"'!'"
I . •• •
H .. r !"'~"'!'"
!
I
w
~
~----------------------------------~
I
I'
II
II Y
y'
t
FIGURE 5.22. Using coordinate conversion to perform scaling and de-skewing.
Coordinate x-y is for the scanned image, and Coordinate x'-y' is for the original
image. Skewing angle between the two coordinates is represented by a; the lightly
dotted squares represent original pixels, and the round dots are the corresponding
centers.
a pixel (x', y') in the original image, we would like to find the center of this
pixel in the scanned version. We first perform a scaling operation:
(5.4)
where (W - I), (H - I), (Nw - 1) and (Nh - 1) are used because the
coordinate of the first pixel starts from (0,0). We then perform rotation of
- Q degree and get the coordinate (x, y) of the estimated pixel center:
(5.5)
If the estimation is well centered in the pixel and the scanning resolution
is sufficiently high so that one original pixel corresponds to many scanned
pixels (such as those shown in Fig. 5.13 and Fig. 5.14), sampling at the esti-
mated centers will recover the original image. Improvement can be made by
considering the surrounding pixels as well as the grayscale information ob-
tained from scanning, especially when a printed image has noisy boundaries
and/or slightly blurred.
scanned image correspond to one pixel in the original. Any additions such as the white
outer layer in Fig. 5.20 need also be counted.
6
Multilevel Data Hiding for Image
and Video
Content providers such as the movie industry and news agencies have im-
posed strong demand on ownership protection, alteration detection, access
control, and source tracking (fingerprinting) of digital image and video.
For these applications, it is desirable to embed many bits in high quality
images and video satisfying both imperceptibility and robustness require-
ments. As discussed in Part-I, imperceptibility, robustness against mod-
erate compression and processing, and the ability to hide many bits are
the basic but competing requirements for many data hiding applications.
The traditional way is to target at a specific payload-robustness pair, and
follow one of two approaches: to embed just one or a few bits very ro-
bustly [49, 58, 105, 106, 109, 113], or to embed a lot of bits but to tolerate
little or no distortion [67, 69, 108, 139). Such a single robustness target gen-
erally overestimates the noise condition in some situations and/or underes-
timates in some other situations. Also, some data, such as the ownership
information and the control information for facilitating the decoding of a
large amount of payload bits, should be embedded more robustly than oth-
ers. It is therefore desirable to design a data hiding system that is capable
of conveying secondary data in high rate when noise is not severe and can
still convey some data reliably under severe processing [110, 111, 112). This
is analogous to the graceful degradation by unequal error protection (UEP)
in communication.
In this chapter, we first propose in Section 6.1 a framework utilizing sev-
eral embedding schemes and/or parameter settings to allow the amount of
extractable information adaptable to the actual noise condition. We then
present in Section 6.2 a specific algorithm designed for multi-level data
94 6. Multilevel Data Hiding for Image and Video
hiding in images that allows graceful decaying of extractable information as
noise gets stronger. Each embedding level is associated with different robust-
ness vs. payload tradeoff. Finally in Section 6.3, we extend the multi-level
embedding to video, which exhibits a variety of non-stationary behaviors
that make data hiding difficult. We will demonstrate effective strategies for
handling the uneven embedding capacity from region to region within a
frame and also from frame to frame. We also embed control information to
facilitate the accurate extraction of the user data payload and to combat
such distortions such as frame jitter.
The designs presented in this chapter can be used for such applications as
robust annotation, content-based authentication, access/copy control, and
fingerprinting. The main design objective of this work is to survive com-
mon processing in transcoding and scalable/progressive transmission, such
as compression with different ratio and frame rate conversion in the case
of video. Malicious attack of making the watermark undetectable is not a
primary concern here either because there is no incentive to do so in such
applications as annotation and authentication, or because the threat could
be alleviated by other means such as a well-determined business and pricing
model.
6.1 Multi-level Embedding
An embedding scheme usually targets at a specific robustness level, leading
to a specific total payload I. Focusing on the two types of embedding mech-
anisms discussed in Chapter 3, the relation between the watermark-to-noise
ratio (WNR) x and the maximum number of bits C that can be reliably
embedded is illustrated by solid lines in Fig. 6.1(a). The curve C(x) is es-
sentially a profile of the capacity curves for Type-I and Type-II in Fig. 3.7.
For a watermark targeted to survive at a specific level of WNR, say Xl, the
maximum number of payload bits that can be extracted reliably is C(XI)
in Fig. 6.1(b), even if the actual WNR is higher than Xl. Thus the number
of reliably extractable payload bits under different actual noise conditions
follows the solid line C],I(X) in Fig. 6.1(b), which is a step function with a
jump at the design target WNR Xl. If the design target WNR is different,
say X2, the number of reliably extractable payload bits would follow a dif-
ferent step function curve, C],2(X). Therefore, using a single design target
WNR will result in no extractable data when the actual noise is stronger
than the design parameter, while the number of extractable bits does not
increase even if the actual noise is weaker than that targeted in the design.
IThe total payload is the total amount of data embedded in the host signal. It consists
of the main user payload (such as ownership information and copy control policy) and
any additional control data embedded to facilitate the data extraction.
6.1 Multi-level Embedding 95
It is possible to use two targeted values of WNR in the design, so that a
fraction (}:l of the embedded data survives a WNR of Xl, and all embedded
data survives a higher WNR of X2. The maximum number of extractable
payload bit versus the actual noise conditions of this combined embedding
would then follow a 2-step curve Cn(x) in Fig. 6.1(c). This approach would
allow more bits to be extractable than CI ,l(X) when X 2: X2, and than
C 1 ,2(X) when Xl < X < X2.
e(') ........... /~ :f:::"--
q,) ............--........... :':" I - -
~ / T~·2
C,.l (X)
.: I
.: I
..............
. . . . ./ 1/
....' I
c , .• (x)
-...t weaker
C(')~ //"····
.................
........
/ Cul x )
"'~/
... ....... .
...........
<, ;1:, wrnl;-lo-llOi:sc:ntio
wmk'lo-noisttario
FIGURE 6.1. * Amount of extractable data by single-level & multi-level embed-
ding: (a) embedding capacity versus watermark-to-noise ratio, (b)-(d) the number
of extractable bits by single embedding level, by two embedding levels, and by in-
finitely many embedding levels, respectively.
The above 2-1evel embedding can be extended to M-level embedding,
by selecting M targeted WNR [Xl, X2, ... , X M] and the associated fraction
[(}:l,(}:2, "., (}:M] where Xl < X2 < ... < X M and L~l (}:i = 1, the maximum
number of extractable bits CM(X) is:
L~l (}:iCI,i(Xi) if
~
X> xM ;
GM(X) { L~=l (}:iCI,i(Xi) if < X < Xk+l>
Xk (6.1)
= l, ... ,M -1 ;
k
0 if X < Xl.
96 6. Multilevel Data Hiding for Image and Video
Let ai = ~, XL = Xl, Xu = XM , Xi+l -Xi = (Xu -xL)/(M -1) for fixed
XL and xu' Then as M goes to infinity, we have
if X > xu;
(6.2)
ifx<x L •
This is illustrated in Fig. 6.1(d). We see that combining many embedding
levels can achieve graceful degradation, so that the extractable information
decays smoothly as the actual noise gets strong. We shall call this multi-level
embedding. In practice, both the fractions, {ail, and the targeted WNRs,
{Xi}, can be non-uniform to allow different emphasis toward different noise
conditions.
The graceful change of the amount of extractable information is desirable
in many applications. The information to be embedded often requires un-
equal error protection (UEP). Some bits, such as the ownership information
and control bits to facilitate the decoding of the actual payload bits, are
required to be embedded more robustly than others. For access control or
copy control applications in which a non-trivial number of bits are embed-
ded in audio or video to indicate usage rules, these rules cannot be enforced
until they are decoded. It is often desirable to enforce usage rules sooner on
audio/video that are lightly compressed and have high commercial value.
This can be realized by embedding the usage rules in multiple robustness
levels. When the compression is light, the rules can be decoded by pro-
cessing just a small segment of audio/video; and when the audio/video is
heavily compressed, the rules can still be robustly decoded by processing a
longer segment. In the remaining sections of this chapter, we will present
data hiding algorithms and system designs for images and videos using this
multi-level embedding idea.
6.2 Multi-level Image Data Hiding
We study in this section the problem of multilevel data hiding in grayscale
images. Extension to color images is straightforward by working separately
with color components or with luminance component only. The embedding
domain we have chosen is the block DCT domain with a popular block size
of 8 x 8. This domain is compatible with commonly used image and video
compression standards, making it possible to perform compressed domain
embedding and to make use of various techniques already developed for that
domain (such as human visual models for JPEG compression [90][92]). It
also allows for fine tuning of the watermark strength for each local region
to achieve a good tradeoff between imperceptibility and robustness against
distortion.
6.2 Multi-level Image Data Hiding 97
We shall present our algorithm and system design for two-level data hiding
using the two types of embedding mechanisms discussed in Part-I, noting
that it is possible to extend to more than two levels. The basis of this section
is Fig. 6.1, which demonstrates that by combining several embedding levels,
one can achieve graceful decaying of the number of bits that can be reliably
extracted as the actual noise gets stronger.
One typical scenario of multiple watermarking is to convey a small amount
of side information, usually no more than a few bits, to facilitate the ex-
traction of main embedded payload. This will be discussed in Section 6.3.3.
The current section focuses on how to convey several sets of data with dif-
ferent robustness, each set having a non-trivial number of bits. The data
in each set could be either identical or different, depending on the specific
applications.
In principle, frequency-domain multi-level embedding can be achieved
by applying embedding levels of different robustness-payload tradeoff to
non-overlapped spectrum segments, or by allowing overlapped embedding.
Overlapped embedding is similar to the embedding of two or more water-
marks successively into a host signal for simultaneously achieving multiple
goals [129, 130, 137J. For example, a robust watermark and a fragile wa-
termark are added to an image for ownership protection and tampering
detection, respectively. The successive embedding should follow the order
that more robust embedding mechanisms are applied prior to the fragile
ones to ensure the successful rendering of the embedded data [137J. We
adopt the non-overlapped embedding in this chapter to avoid the interfer-
ence among different levels. A key issue is to determine what part of the
host signal to be used for each embedding level. The following analysis on
the performance of non-coherent detection of Type-I spread spectrum em-
bedding provides a guideline to the partitioning of host signal spectrum for
two-level data hiding.
6.2.1 Spectrum Partition
We have discussed the hypothesis testing formulation of Type-I additive
watermarking in Chapter 3:
(i=I, ... ,n) if b = -1
(6.3)
(i=I, ... ,n) if b = +1
where {Si} is a deterministic known sequence, b is a bit to be embedded and
is equally likely to be "+1" or "-1", di represents the total noise and inter-
ference, and n is the number of samples or coefficients to carry the hidden
information. Under an assumption that di is modelled with a simple i.i.d.
Gaussian distribution N(O, (T~), the optimal detection statistic is essentially
a correlator with {sd:
(6.4)
98 6. Multilevel Data Hiding for Image and Video
which is Gaussian distributed with unit variance and mean
1 2 2
n· (-II~II )/ad • (6.5)
n
Setting the threshold to zero gives the minimum probability of detection
error Q(E(TN )), where Q(x) is the probability P(X > x) of a Gaussian
random variable X '" N(O, 1). For instance, the error probability is 10-3
and 10- 10 for E(TN) = ±3 and E(TN) = ±6, respectively.
Under non-coherent detection, di consists of the interference from host
media and the noise from processing or attacks. The high power of host
media contributes to a large a~ value, reducing E(TN) and increasing the
probability of detection error. A popular approach to reduce the interfer-
ence from host signal is to only watermark mid-band coefficients [53]. It is
based on the observation that the low band coefficients of the host media
generally have much higher power than the mid-band, and that the high
band coefficients are vulnerable to processing and attacks. The detector in
this case is the commonly used correlator.
This "mark mid-band only" approach, however, conflicts with the com-
mOn understanding in detection theory that under Gaussian noise, the de-
tection performance should be enhanced with more independent observa-
tions, not less. This conflict arises from the fact that the noise is often
not Li.d. Gaussian in practical applications. For example, different bands of
block DCT coefficients have different variance. The observation that low-
band coefficients have higher power than those in the mid-band is a re-
flection. A more refined yet simple model would assume the noise being
independently Gaussian distributed but with different variance for differ-
ent frequency bands. The optimal detector is then a correlator preceded
with normalization of the observations by their corresponding sample stan-
dard deviations ad' giving more weight to less noisy components. The test
statistic becomes '
Tiv = (6.6)
Thus it is possible to embed data in all bands, although contributions from
those noisy bands are limited. One can also use a more general Gaussian
noise model in which the components of the host media and/or the noise may
be dependent. In this case, both whitening and normalization are performed
before applying the minimum Euclidean distance detector or maximum cor-
relation detector [7] [52].
Verification Through Experiments The above analysis is verified ex-
perimentally using 114 natural images and the block-DCT domain spread
spectrum algorithm proposed by Podichuk-Zeng [58]. The q-statistic pro-
posed by Zeng-Liu [61] is used in detection. This detection statistic, shown
6.2 Multi-level Image Data Hiding 99
in Eq. 6.7, is a correlation statistic with variance normalized to 1 without
explicitly estimating the variance of noise and interference a~. We shall
denote q' and q as the detection statistics with and without the weighting
based on an estimation of the total noise conditions, respectively:
Mz
q (6.7)
JVz/n'
Mz,
q' (6.8)
JVz';n'
where
The weights hi} reflect the impact of the noise variance term in Eq. 6.6.
The noise variance is not easy to be estimated accurately because the precise
power of the host signal is unknown in non-coherent detection, and the
variance of processing noise varies dramatically depending on what kind of
distortion/ attack is applied to the signal. For the first problem, one may
make an estimation based on the statistics of the current test media. For
the second problem, a set of known signal may be added to predetermined
locations of the host signal, serving as a set of training data to facilitate
the noise estimation [41]. In our experiment, we choose hi} based on the
variance of both host signal and potential processing noise of the frequency
band of Yi. They are empirically determined using a collection of natural
images.
0 1 5 6 14 L5 27 28
2 4 7 13 16 26 29 42
3 8 12 17 25 30 41 43
9 11 18 24 31 40 44 53
10 19 23 32 39 45 52 54
20 22 33 38 46 51 55 60
21 34 37 47 50 56 59 61
35 36 48 49 57 58 62 63
FIGURE 6.2. Zig-zag ordering of DCT coefficients in an 8 x 8 block.
100 6. Multilevel Data Hiding for Image and Video
. _-
42
I
40 -
...... -e- non-weighted
38
'- ......... --x-- weighted -
36 ... ~ ~
(/)
0 34
[" .~
~
~
; 32 ~ '~
7ii
I:
0
30
J .~
\.
~
Q)
~
"
Qj
28
"
26
24 ~...
22
o 5 10 15 20 25 30
zig-zag Ireq. index
(a)
45
- zero distort.
40
-r---- r---...... .....
---
jpg 30%
jpg 10%
35
............. ............... -+- LPF
30
.....
..........
~
(/)
0 ...................... ~
:; 25 ........ ..~
:;
7ii
I: 20
.2
'0 ~~ ...... -.......
15
..
Q)
~ ~. ....
Qj r-~
" 10
~ ~. ...... ~
5 .....
0
~ ~--
........-t"
-5
o 5 10 15 20 25 30
zig-zag Ireq. index
(b)
FIGURE 6.3. Comparison of average detection statistics for two detectors: (a)
detection statistics of non-weighted correlator q and weighted correlator q' under
zero additional distortion; (b) detection statistics of weighted correlator q' under
four different distortions. In both plots, x-axis indicates the frequency band in a
zigzag order from which watermark starts to be put in.
6.2 Multi-level Image Data Hiding 101
Using q and q' as detection statistics, we studied the above-mentioned 114
natural images, each of which was tested using three different spread spec-
trum watermarks. For each watermark and each image, we first order the
image coefficients in the familiar zig-zag manner (Fig. 6.2), and then vary
the frequency beyond which a watermark is inserted. The q and q' values
are computed under several distortion conditions including zero distortion,
JPEG with different quality factors, and low pass filtering. For each image,
we also normalize q and q' with respect to the number of embeddable coeffi-
cients that can be watermarked without introducing perceptual distortion.
This ensures that the detection values for smoother images and for more
complex images are comparable. The average normalized q and q' are shown
in Fig. 6.3. It can be seen that the q value assumes maximum when the band
from which the embedding starts is around 6 to 11, and q decreases when
either more or less number of frequency bands are involved. We also see that
q' gives larger value hence smaller probability of error than q. In addition,
q' is monotonically decreasing when fewer bands are used in embedding,
but the decrease is insignificant when leaving out the five lowest bands from
embedding. All of these are consistent with our analysis.
From the above, for a two-level embedding system, one should apply
Type-I spread spectrum embedding to mid-band coefficients for high robust-
ness at a cost of payload, and apply Type-II to low-band for high payload
and moderate robustness. Such a multi-level embedding approach would
allow the hiding of many bits and decode them successfully when image
experiences little or moderate distortion. When an image is distorted signif-
icantly, this approach can still convey those bits that have been embedded
robustly.
6.2.2 System Design
Shown in Fig. 6.4 are the block diagrams of two-level data hiding in im-
ages. The first level uses odd-even embedding, a proof-of-concept example
of Type-II, to embed the first set of data in low band. Based on the analysis
in the previous subsection, we embed the first-level data in the first two
diagonal lines of AC coefficients, i.e., the first five AC coefficients shaded
in Fig. 6.5. In addition, we perform the embedding with quantization to
enhance robustness, as discussed in Chapter 3. The quantization step sizes
we have used are equivalent to the standard JPEG quantization table with
quality factor 50% [16][23]. That is, we produce a watermarked coefficient
v' from an original v such that
v' = (round[v/Q] + 6) . Q. (6.9)
6 is determined by
0, if mod(round[v/Q], 2) = bj
6 = {
sgn(v/Q - round[v/Q]), otherwise.
(6.10)
102 6. Multilevel Data Hiding for Image and Video
original
_ima
__ ge__ ~~~~ ________________________________- .
data (part-I)
(a)
test image
~
(b)
Notation
T: transform T -\ : inverse transform
Ene: error correction encoding Dec: error correction decoding
Mod: modulation Demod: demodulation
Shuff: shuffling Shuff -I: inverse shuffling
FIGURE 6.4. Block diagram of multi-level data hiding for images: (a) embedding
process, (b) extraction process .
..............
~ ... "'1 5 ..6 }4 15 ::n- ~
.' "2 4 ..1; 'f3 16 26 29 42'
; ...,;. .. ..)' '"12
17 25 30 41 43
:? 11 18 24 31 40 44 53 I [,'\,
~
·························.0·:·
Le~~~1: l'
Payload , : 10 19 23 32 39 45 52 ~ ",,' '
I 1- '----1
i
20 22 33 38 46 51 .-56' ro
Moderate I Level-2 : 1
Robustness :, High
........................ : 1
,
: \
21 34 37 47 50.- f-j{) 59 61 Robustness,
Moderate
36 jS. ~ 57 58 62 63 Payload
-3~
FIGURE 6.5. Illustration of Two-level Data Hiding in Block-DCT Domain
6.2 Multi-level Image Data Hiding 103
where b E {a, I} is a bit to be embedded, and the binary-valued sgnO
function is + 1 if x ~ 0, and is -1 otherwise. During the embedding process,
just-noticeable-difference (JND) is computed according to an improved HVS
that will be addressed later in this section. If the changes from v to v' is
larger than JND, the corresponding coefficient is regarded as unembeddable.
No changes are made to these unembeddable coefficients and no hidden data
will be put into them.
The second level uses Type-I spread spectrum additive embedding to hide
the second set of data in mid-band. Antipodal modulation is used, by adding
or subtracting a spread spectrum signal to denote one bit:
(6.11)
where i = 1, ... , n, {Vi} are the original coefficients, {va are the marked
coefficients, and b' E {-I, +I} is the antipodal mapping from b, the bit to
be embedded. The watermark strength, {ai}, is adjusted by JND.
TDM-type multiplexing/modulation (Section 3.3) is used in both levels.
That is, each bit is embedded in a region that does not overlap with those for
other bits. For every bit to be embedded in Level-l (high payload), we assign
to it a distinct set of low-band coefficients and use odd-even enforcement to
embed the same bit value to those coefficients. The detector extracts the bit
by majority voting over the extracted values from the involved coefficients.
For every bit to be embedded in Level-2 (high robustness), we partition a
spreading sequence into non-overlapped segments and assign one segment
to that bit. To overcome uneven embedding capacity of TDM, coefficients
for each of the two embedding levels are shuffled and the embedding is per-
formed in shuffled domain (Chapter 4). An inverse shuffling and an inverse
DCT transform are then applied to obtain the watermarked image. The
data embedded in each of the two levels may be further encoded using error
correction codes.
Before we present the experimental result of multilevel data hiding for
images, we shall discuss briefly the human visual model used in our sys-
tem for computing the JND. Our human visual model is refined on top of
the frequency-masking models by Podilchuk-Zeng [58]. We try to reduce
its perceivable ringing artifacts on edges introduced by block-DCT domain
embedding. We use local image statistics to distinguish texture and edge
blocks, and attenuate the JND of edge blocks. As we will demonstrate in
the next sub-section, our refined HVS model has fewer perceivable artifacts
on edges than [58] with small sacrifice of the embedding payload.
6.2.3 Refined Human Visual Model
Almost all watermarking algorithms on grayscale and color images utilize
human visual model to ensure imperceptibility, either implicitly or explic-
itly. In a classic paper on spread spectrum watermarking, Cox et al. [49]
104 6. Multilevel Data Hiding for Image and Video
pointed out the importance to embed watermark in perceptually significant
components to achieve robustness and made use of the perceptual tolerance
of minor changes to embed watermark in those significant components. In
their implementation, the watermark is embedded in the DCT domain of
the image and a simplified scaling model is used to set the watermark power
about a magnitude lower than that of the cover image. By explicitly utiliz-
ing human visual models known as frequency-domain masking, Podichuck-
Zeng [58] and Swanson et al. [59] embed watermarks in block-DCT domain
and use masking models to tune the watermark strength in each block.
Swanson et al. also incorporated spatial-domain masking in their design.
The block DCT domain is a popular embedding domain in literature. It
is compatible with the commonly used image and video compression tech-
niques such as JPEG [16], MPEG [20] [21], and H.26x [13], making it possible
to perform compressed domain embedding and to make use of various tech-
niques already developed for that domain (such as the human visual model
for JPEG compression [90][92]). The block-based domain also has the ad-
vantage of fine-tuning watermark strength for each local region to achieve
a good tradeoff between imperceptibility and robustness against distortion.
However, this popular domain has a few major weaknesses both on imper-
ceptibility and on robustness. We shall focus on the perceptual problem in
this subsection and postpone the discussion regarding robustness till Chap-
ter 9.
The perceptual problem with block DCT domain embedding is the ringing
artifacts introduced on edges. The previously proposed frequency-masking
model has not taken this issue into account [58]. The only way for those
models to reduce artifacts is to attenuate the whole watermark signal, which
leads to less robustness and data hiding payload. Tao et al. proposed to ap-
ply block classification to reduce artifacts. They classify image blocks into
six categories (Le., edge, uniform with moderate luminance, uniform with
either high or low luminance, moderately busy, busy, and very busy), and
adjust the watermark strength differently for each category [60]. The clas-
sification, involving enumerations of many possibilities, could be computa-
tionally expensive. We propose a refined human visual model with less com-
putational complexity than [60] while introducing fewer artifacts than [58].
Before presenting the details of our refinement, we shall explain a bit more
about the frequency domain masking model used by Podilchuk et al. [58],
on top of which our refinement is applied. The masking model is based on
the following observations of human visual system: first, different frequency
bands have different just-noticeable levels, and generally the just-noticeable-
difference (JND) in high frequency bands is higher than that in low bands;
second, in a specific frequency band, a stronger signal can be modified by a
larger amount than a weak signal without introducing artifacts. Because the
blocks with edges and textures have larger coefficient values (in magnitude)
than the smooth blocks, the JND of the non-smooth blocks obtained by this
model is generally larger than the smooth ones.
6.2 Multi-level Image Data Hiding 105
FIGURE 6.6. 2-D DCT basis images of 8 x 8 blocks. The upper-left corner is the
DC basis image.
The Podilchuk model reflects little difference between the two non-smooth
cases, namely, edge block and texture block. These two cases, however, have
significant visual difference: with modification of the same strength in block
DCT domain, the artifacts is more likely to be revealed in an edge block
than in a texture one. The possible reasons are: first , the modification in
block-DCT domain is equivalent to adding or subtracting the corresponding
2-D DCT basis images shown in Fig. 6.6; and second, busy, non-structured
pattern that is close a structured feature such as an edge attracts much
attentions from eyes, while many textures themselves involve more or less
random patterns hence the add-on busy artifacts get swamped and become
indistinguishable to eyes. Our refinement tries to distinguish edge and tex-
ture blocks so that we can adjust the preliminary JND computed by the
simple masking model to achieve better invisibility. In other words, we try
to protect the edge block from over-modified. Furthermore, we observed
that compared with the edges between two non-smooth regions, a block be-
tween a smooth region and another region (either smooth or not) should be
protected more, even though the edge in that block may appear soft.
The refined HVS model, illustrated by the block diagrams in Fig. 6.7,
includes the following three steps:
Step-I : Frequency domain masking
The first step of our perceptual analysis makes use of block-DCT do-
main masking result and computes preliminary embeddability and just-
noticeable-difference (JND) for each coefficient, which determine whether a
106 6. Multilevel Data Hiding for Image and Video
embeddability & J ND
prclirnin~ JND and JND.nd
pi:\el embeddability me.asure , . . . - - --, embcdd,bili'y
values Frequency-domain
masking model
checking neighbor
block smooth/not
FIGURE 6.7. Block diagram of the refined 3-step HVS model: (top) basic modules,
(bottom) detailed procedures.
coefficient can be modified and if so, by how much amount it can be changed.
As mentioned above, this step is similar to what has been proposed in [58)
and forms a basis for further adjustment.
Step-2: Edge-block detection
We first use edge detection algorithm (e.g., Haar filtering) to produce an
edge map. We then compute the standard deviation (STD) of pixel values
in each block (Le., we obtain one value for each block, measuring the active-
ness of the block) , and compute the standard deviation of these standard
deviations in a neighborhood, for example, 3 blocks by 3 blocks. The latter
step helps eliminate many unwanted edges obtained in the first sub-step,
such as those in texture regions.
The rationale behind the double STD measure is that in a texture region,
although the STD of each block is large, the STDs of adjacent blocks in the
same texture region are similar hence does not have large deviation when
computing the second round STD. On the other hand, the STD of a edge
block is likely to be very different from the majority of its neighbor blocks.
Double STD computation can be easily implemented.
6.2 Multi-level Image Data Hiding 107
At the end of this step, we combine the edge map with the double STD
result and output an edge measure that indicates whether there is an edge
across the block and if so, how strong the edge is. The edge measure is
then used to adjust the just-noticeable-difference. The adjusted JND will be
ultimately used to control the watermark strength so that weaker watermark
will be applied to edge blocks than to texture ones.
Step-3: Identifying blocks adjacent to smooth regions
As we mentioned, artifacts by block-DCT domain embedding are more
visible in blocks that are adjacent to smooth region than in other blocks,
even if the block contains such weak edges that the watermark may not be
attenuated sufficiently by Step-2. A relatively stronger watermark can be
added if an edge block is not adjacent to smooth region than the contrary
case. To protect the blocks adjacent to smooth region from artifacts, we
attenuate the JND of a block that is adjacent to smooth block so that the
watermark applied there will be weaker. The smoothness of a block is de-
termined by the magnitudes of its AC coefficients.
Fig. 6.8 demonstrates the difference between applying step-1 only (similar
to HVS model in [58], and denote as "HVSpz") and our new 3-step model
(denoted as "HVSedge"), for both Lenna image (containing many smooth
regions and sharp edges) and Baboon image (containing many textures and
also a dark border at bottom). The image quality and detection statistics
of single additive spread-spectrum watermark are summarized in Table 6.l.
Note that other parameters are kept the same in this experiment so that the
only difference is the HVS model. From the image details shown in Fig. 6.8,
we can see that the proposed 3-step HVS model has fewer artifacts, yet the
detection statistics (Table 6.1) is still high enough to encode multiple bits.
TABLE 6.1. Comparison of the proposed HVS model ("HVSedge") and a model
used by Podichuk-Zeng ("HVSpz").
Image HVS type Detection PSNR Subjective image
statistics (dB) quality
HVSedge 25.50 42.51 good img quality
Lenna
(512x512) HVSpz 35.96 40.76 artifacts along edges
(e.g., shoulder)
HVS edge 58.49 33.59 good img quality
Baboon
(512x512) HVSpz 62.81 33.10 obvious artifacts along
bottom dark border
108 6. Multilevel Data Hiding for Image and Video
(a) origin.llcnn. image (b) marked lenna (c) marked lenna
using HVScdgc using HVSpz
(d) original baboon
image
(eJ marked baboon
using HVSedge
(I) marked baboon
using HVSpz
FIGURE 6.8. Examples of images watermarked by the proposed HVS model
("HVSedge") and a model used by Podichuk-Zeng ("HVSpz"). The artifacts by
HVSpz model are indicated by gray arrows.
6.2.4 Experimental Results
We apply the proposed two-level data hiding scheme to the 512 x 512 Lenna
image. The watermarked image has a PSNR of 42.5dB with respect to the
unmarked image and is shown in Fig. 6.9. Incorporating error correction cod-
ing and shuffling, we embed a 32 x 32 binary pattern of PINTL-Matsusita
logo in low band, which can be extracted accurately when the image ex-
periences JPEG compression of quality factor 45% or higher. We also use
spread spectrum approach to embed the ASCII code of a character string
"PINTL" in mid-band, which can be extracted without error when the im-
age is blurred or JPEG compressed with quality factor as low as 20%. The
embedding rate can be higher for images that contain larger texture re-
gion. For example, we can embed a longer string of "Panasonic Tech." and
6.3 Multi-level Video Data Hiding 109
32 X 32 pattern in the Baboon image 2, as shown in Fig. 6.10. Using our
refined human visual model, the marked image has no visible artifacts and
has a PSNR of 33.6dB with respect to the original image. The lower PSNR
of the Baboon image than that of the Lenna image is another indication
that stronger watermarks can be embedded invisibly in images with more
textures.
The large difference between the embedding rates of the two levels con-
firms the capacity comparison of the two types of embedding mechanisms
presented in Chapter 3. When the additional distortion applied to a marked
image is small, more bits can be extracted from only a few low-band coef-
ficients in Level-1 (Type-II embedding) than from mid-band coefficients in
Level-2 (Type-I embedding).
6.3 Multi-level Video Data Hiding
In this section, we extend our work to video. Besides the large data volume
and high computation complexity involved in processing video, we need to
determine an appropriate embedding domain and handle uneven embedding
capacity, which presents interesting challenges.
6.3.1 Embedding Domain
Video has significant temporal redundancy: consecutive video frames look
similar except those at scene changes or with fast motion. Each frame can
also be viewed as a stand-alone unit. Because of these, it is possible to add
or drop some frames, or switch the order of adjacent frames, without causing
noticeable difference. In addition, new frames may be generated from a few
similar frames through averaging or temporal interpolation, and the newly
generated frames may be inserted to the sequence or replace a few original
frames. If different data are embedded in the frames that contribute to the
newly generated frame, the embedded data may not be easily detectable
from the new frame. This is known as collusion attack [163]. All these ma-
nipulations could be due to potential malicious attacks as well as common
processing involved in format conversion and transcoding [24]. They should
be considered in the design of robust data hiding for video. Adding redun-
dancy and/or searching for frame-jitter invariant domain are common ways
to handle these attacks. We focused on the redundancy approach because
of its effectiveness and computational simplicity.
We adopt two methods to handle frame jittering, as illustrated in Fig. 6.1l.
The first one is to partition a video into segments, each of which consists of
2The embedding rate for the Baboon image in Type-II level can also be higher than
that for Lenna image. For the ease of visualizing the hidden data, our experiment hid the
same PINTL-Matsusita pattern of 1024 bits in both images.
110 6. Multilevel Data Hiding for Image and Video
FIGURE 6.9. ;II~ Multi-level data hiding for Lenna image (512x512). (a) original
image; (b) image with hidden data; (c) the 5x amplified difference between (b)
and (a) with black denoting zero difference; (d) extracted 32x32 PINTL-Matsusita
pattern embedded via high-payload embedding level.
FIGURE 6.10. albic Multi-level data hiding for the Baboon image (512x512). (a)
the original image; (b) the image with hidden data; (c) the 5x amplified difference
between (b) and (a) with black denoting zero difference.
6.3 Multi-level Video Data Hiding 111
embed 12. i & (i mod K) embed 12.i+1 & (i+1 mod K)
I I
~llll1
seg. i seg. i+1
FIGURE 6.11. Methods for handling frame jittering by the proposed video data
hiding system.
similar consecutive frames. The same data is hidden in every frame of a seg-
ment. This approach would tolerate frame dropping, which involves a small
number of isolated frames. Repeating frames also provides redundancy that
helps to combat noise from additional processing or attacks, offering higher
detection accuracy. Extraction can be done via weighted majority voting
with larger weights assigned to the frames experiencing less distortion.
We should point out that repeatedly embedding the same data in several
consecutive frames is not equivalent to embedding data in the correspond-
ing averaged frame, which may reduce computational complexity. This is
because the embedding operation is non-linear in general. For Type-II en-
forcement embedding, the relations such as the odd-even parity enforced
on an averaged frame often does not hold in each individual frame or the
average of a subset of these frames, hence cannot effectively survive frame
jitter. And for Type-I additive embedding, the same JND model gives signif-
icantly different result in determining what DCT coefficients can be used to
carry hidden information (i.e., the embeddability). Averaging several con-
secutive frames is equivalent to performing a low pass filtering operation
temporally. The averaged frame is smooth and the sharp details are lost,
especially when there is significant motion in the original frames. Less DCT
coefficients in the middle band of an averaged frame will be considered as
embeddable than those of the original frames, which affects the capacity and
accuracy of embedding and detection. For these reasons, we adopt repeated
embedding instead of embedding in the averaged frame.
The temporal partition of video into segments should be content based.
Video frames before and after a scene change or a significant change due to
motion should belong to different segments because the embedding capabil-
ity of these frames can be quite different. As such, the lengths of segments
may not be uniform. Repetition alone is not able to handle non-uniform seg-
ments, neither is it sufficient to combat frame reordering, frame addition,
and frame dropping of larger units. We address these issues by embedding
a shortened version of segment index in each frame. This information is
referred as frame sync and is part of the control data whose details will be
discussed in Section 6.3.3. The frame sync information can assist in detect-
ing and locating frame jittering. When used with redundancy approaches
112 6. Multilevel Data Hiding for Image and Video
such as repeatedly embedding data in separate parts of a video, this method
can further enhance the robustness against frame reordering and dropping.
In summary, we handle frame jittering by temporally segmenting a video,
applying image data hiding approach to each frame, and embedding the
same user data as well as frame sync index in every frame of the same
segment.
6.3.2 Variable Embedding Rate (VER) vs. Constant
Embedding Rate (CER)
The embedding capacity in video varies widely from region to region within
a frame and from frame to frame. As discussed in Chapter 4, VER requires
a non-trivial amount of side information but could provide higher overall
embedding capacity if the overhead introduced by the side information oc-
cupies only a small portion of the total payload. On the other hand, CER
requires only a little one-time side information at an expense of the waste
in total embedding payload. Also discussed in Chapter 4 is the use of ran-
dom shuffle to significantly increase the total number of embedded bits by
equalizing the uneven embedding capacity.
We propose to combine VER and CER for handling the uneven embed-
ding capacity in video. Because of the potentially large overhead of VER
in each small regions of a video frame, the embedding within each frame is
done using CER and shuffling, and VER is used for inter-frame unevenness
with the help of additional side information. That is, an equal number of
bits are embedded in each group of shuffled coefficients within a frame. The
group size, or equivalently, the number of bits embedded in each frame, is
different from frame to frame and depends on an estimated achievable pay-
load discussed below. The overhead is thus relatively small compared to the
total number of bits that can be embedded in most frames. The details of
this are explained below.
We have observed that the number of bits that can be embedded in each
frame may vary from very few bits for smooth frames to dozens or even
hundreds bits for frames containing large regions of details and textures.
On average, representing the side information of how many bits are em-
bedded in each frame would need many bits. However, by using variable
length codes to represent this side information and assigning shorter codes
to those frames with a smaller number of embedded bits, the average rela-
tive overhead can be made small. For both embedding levels, we estimate
the achievable embedding payload 6 of a frame based on the energy of DCT
coefficients, the number of embeddable DCT coefficients, and the detection
statistics of an embedding trial that hides a single spread spectrum water-
mark in the video frame. We also set two thresholds 71 and 72. If 6 :S 71,
we embed no user data. If 71 < 6 < 72, a predefined number of user bits
are embedded. If 62: 72, we embed user data at a higher rate determined
by 6. Table 6.2 summarizes this adaptive determination of embedding rate.
6.3 Multi-level Video Data Hiding 113
We use spread spectrum sequences +~2' +~1' and -~l to signal the afore-
mentioned three cases, respectively. In the case of C ~ 72, we also use
orthogonal modulation via several other spread spectrum sequences to con-
vey the number of embedded bits. To reduce the overhead for conveying
this side information, we limit the number of embedded bits to one of a
pre-determined finite set (e.g., {16, 32, 48, 64, ... }), which can be determined
empirically using training video clips. All these are part of the control data
that need to be conveyed for facilitating the extraction of user payload data.
We will address more about embedding control data in the next subsection.
The estimated achievable payload 6 is determined as the follows. For
Type-I additive spread spectrum embedding, the mean detection statistic
E(T) is given by Eq. 6.5 and follows a unit-variance Gaussian distribu-
tion. The bit error probability is Q(E(T)). Given the maximum bit error
probability p~max) that can be tolerated, a lower bound of mean detec-
tion statistic required for each bit is Tth = Q_l(p~max)). Assume that the
detection statistic when all embeddable coefficients are used to carry one
information bit is To. The estimated number of bits that can be embedded
is thus upper bounded by
6= (~:r (6.12)
In our experiments, we set Tth to be around 5. Similarly, 6 for Type-II
enforcement embedding is estimated based on the number of embeddable
coefficients on whom the relations can be enforced.
TABLE 6.2. Adaptive embedding rate for a video frame.
Estimated
Embedding Rate Corresponding
Achievable
for User Data Control Data
Payload 0
Zero Rate - no user
0~71 add +£2
data are embedded
Low Rate - hide a
71<0<72 small, predefined add +£1
number of bits
Higher Rate - the # add -£1' and use
o ~ 72 of bits embedded is
a few spread spectrum
sequences to convey
determined by 0
the # of bits embedded
6.3.3 User Data vs. Control Data
The embedded data (possibly error correction encoded) for accomplishing
the purpose of data hiding is referred to as user data or user payload. For
114 6. Multilevel Data Hiding for Image and Video
example, a copyright label of "(c) Princeton 2001" can be part of the user
payload to be embedded. In addition, there is other information that needs
to be conveyed to facilitate extraction of user data. These data, referred
as control data, may include the information regarding frame synchroniza-
tion (Section 6.3.1) and the number of bits embedded in each frame (Sec-
tion 6.3.2).
The amount of control data is relatively small when compared to that of
user data, but the extraction accuracy of control data is critical. Thus robust
spread spectrum embedding and the energy-efficient orthogonal modulation
should be used for control data. The spreading sequence for hiding each
control bits is orthogonal with one another and is also orthogonal to those
used for embedding user data.
The typical control data may include the total number of embedded bits
of user data, frame sync information for combatting frame jitter, and a
constant watermark vector, which can be used for image registration when
the video frame is subject to geometric distortion [170][175J. We shall take
frame sync as an example to demonstrate the embedding of control data.
Frame sync, as introduced in Section 6.3.1, is a short version of video seg-
ment index to help combating frame jittering attack. The range of frame
sync index is from 0 to K - 1, and the i-th segment is labelled with an
index mod(i, K). A larger K requires more bits to be sent, but gives a more
accurate information regarding the order of the segment, resulting in better
tolerance on frame jitter. From our experiments, we have found that K = 8
gives a good tradeoff. The video segments are then indexed in a round-robin
fashion from 0 to 7, and each index is represented by three bits. For these
three bits, the orthogonal modulation discussed in Section 3.3 is used due
to its energy efficiency: if the sync index is j, we embed the j-th sequence
out of K pre-selected orthogonal random sequences. As long as K is not too
large, we can find a sufficient number of orthogonal sequences, and keep the
detection computation and bookkeeping within a reasonable bound.
User data is embedded in each video frame using the multi-level approach
discussed in Section 6.2. The TDM approach with shuffling is applied for
hiding multiple bits at the high payload level via odd-even enforcement.
For hiding data at high robustness level via spread spectrum embedding,
we combine TDM and orthogonal modulation (Section 3.3) to double the
number of embedded bits than using TDM alone. That is, a watermark
conveying 2B bits is formed by
B
W = L bk . [I(bB+k = 1) .1!~1) + I(bB+k =F 1) .1!~2)J (6.13)
k=l
where bi E {+1, -I}, and I(.) is an indicator function. We generate two
orthogonal spreading sequences {1!(1)} and {1!(2)} and break each sequence
into B non-overlapped segments (TDM) to form the orthogonal spreading
vectors {1!~1)} and b~2)}, respectively.
6.3 Multi-level Video Data Hiding 115
6.3.4 System Design and Experimental Results
Combining all the elements discussed above, we arrive at the diagram in
Fig. 6.12. The details of the modules that perform embedding and data
extraction from each frame are similar to the multi-level image data hiding
in Section 6.2.
orig. video
D
estimate
achievable payload
extract control data extract user data extracted
from one frame from one frame user data
FIGURE 6.12. Block diagram of the proposed video data hiding system: (top)
embedding process, (bottom) detection process.
We test our approach on the luminance components of several video se-
quences. The same character string containing access control information
(without error correction coding) is hidden in two embedding levels. Be-
tween scene changes, we use equal-length segments, each containing 6 con-
secutive frames. One test video is the first 60 frames of "flower garden"
sequence, which has a frame size of 352 x 240 and a frame rate of 30 frames
per second. The average PSNR of the watermarked video with respect to the
original host signal is 32.5dB. After data hiding, the video is encoded using
MPEG-2. With a GOP structure of IBBPBBI and compressed to 1.5Mbps
or higher bit rate, 18 characters (132 bits) can be extracted accurately. An
additional, longer string of 91 characters (640 bits) can be successfully ex-
tracted when compressed to 4.5Mbps or higher bit rate. Fig. 6.13 shows the
1st and 30th frames of original and watermarked frames as well as their
amplified difference by a factor of 5.
We also tested a longer and more diverse sequence with 660 frames, by
concatenating "flower garden", "football", and "table tennis" sequences. A
total of 3032 bits are embedded at high payload level and 1266 bits at high
robustness level. All 4298 bits can be extracted accurately after 4.5Mbps
MPEG-2 compression. When the video is compressed at 1.5Mbps, the 1266
bits at high robustness level can still be correctly extracted, though the
detector shows low detection confidence on 3 bits (0.2%) under such severe
distortion. In practice, error correction coding can be incorporated to correct
a small percentage of errors, if any.
An annotated excerpt of detection log in Table 6.4 shows the extracted
control information and demonstrates the role of these control data for hid-
116 6. Multilevel Data Hiding for Image and Video
(a) (b) (c)
(d) (e) (I)
FIGURE 6.13. Multi-level data hiding for flower garden video sequence: (a)-(c)
the original 1st frame, the watermarked version, and their difference, respectively;
(d)-(f) the original 30th frame, the watermarked version, and their difference,
respectively. Both videos are compressed using MPEG-2 4.5Mbps, and the dif-
ferences are amplified by a factor of 5 with gray denoting zero difference and
black/white denoting large difference.
ing data in a variety of video sequences. We can see that (1) repeatedly
embedding the same payload in a few consecutive frames, together with
embedding frame sync index, help to combat occasional detection errors in
severely distorted frames, and (2) the adaptive embedding rate and the asso-
ciated variable-length-encoded control information provide effective means
for handling the uneven embedding capabilities across video segments. In
addition to the user data and the related control data, a constant spread-
spectrum watermark signal sharing approximately 1/4 of JND 's energy is
embedded in every frame, which can be used for indicating ownership infor-
mation and/or serve as a reference for image registration when video frames
are subject to geometric distortion. The experimental results of both image
and video data hiding are summarized in Table 6.3.
6.4 Chapter Summary
In this chapter, we demonstrated how the understanding and solutions to
the fundamental issues of data hiding presented in Part-I can be used for
specific design problems and applications. We have made extensive use of
two major types of embedding mechanisms, the modulation and multiplex-
ing techniques for embedding multiple bits, as well as shuffling for handling
uneven embedding capacity. We proposed robust multi-level data hiding
6.4 Chapter Summary 117
TABLE 6.3. Summary of experimental results for the proposed multi-level image
and video data hiding systems.
Level-I: high pavload Level-2: high robustness
embedding embedding Notes
robustness robustness
rate rate
"PINTL"
512x5121enna (35 bits) PSNR = 42.5 dB
32x32 JPEGQ;>:45%, JPEG Q;>: 20%;
pattern moderate additive low pass filtering;
(1024 bits) noise. "Panasonic additive noise.
512x512 baboon Tech." PSNR = 33.6 dB
(105 bits)
60-frame 352x240 Also hide control
flower garden 640 bits 132 bits bits to facilitate
sequence (91 char.) Mpeg-2 4.5Mbps; (18 char.) Mpeg-2 1.5Mbps; extracting user
frame dropping frame dropping data. Average
660-frame 352x240
concatenated video 3032 bits 1266 bits PSNR is 32.5dB
sequence for flower garden.
algorithms for still images and videos, and showed that the amount of ex-
tractable information can be adapted to the actual noise conditions, making
it attractive for unequal error protection on the embedded data and for pro-
gressive and scalable embedding.
Acknowledgement The work presented in Section 6.2 and 6.3 was per-
formed with Dr. Heather Yu while the first author was with Panasonic
Information and Networking Laboratories.
118 6. Multilevel Data Hiding for Image and Video
TABLE 6.4. Annotated excerpt of detection log showing the control information
extracted from 660-frame watermarked video sequence compressed at 4.5Mbps.
Extracted Control Infonnation
Frame Video Low confidence when
Rate Type for Frame # of bits # of bits
# Content extracting rate type info.
User Data Synch @high @high from this B-frame due to
ZeroILowlHigh Index robustness payload
0 Flower ro High 0 24 64 -/ compression.
1 f1 High 0 24 64 I Update synch index & embed
2 f2 High 0 24 64 L new sets of user data in this
3 f3 High 0 24 64 I new segment: bit25-48 @
4 f4 undecided 0 nJa nJa. I high robustness, and bit65-
5 f5 High 0 24 64 / i28 @ bigh payload ievel.
6 f6 High i 24 64.
7 f7 High 1 24 64
... Synch index updates from 7
to 0 in an 8-stage round robin
142 f142 High 7 50 64 / fasbion.
143 f143 High 7 50 64/
144 f144 High 0 40 64- _"""I
145 f145 High 0 40 64 Repeatedly embed same user
payload in each frame of a
146 f146 High 0 40 64
segment (same synch).
147 f147 High 0 40 64
148 f148 High 0 40 64
149 f149 Hiidl 0 40 64 No user data are embedded
150 Football ro Zero -I 0 0 ...... for a rather smooth segment.
151 f1 Zero -I 0 0 Nor is frame synch index
152 f2 Zero -I 0 0 embedded (as denoted by -i).
153 f3 Zero -I 0 0
154 f4 Zero -1 0 0
A smali, predetennined
155 f5 Zero -I 0 0
amount of user data are
156 f6 Low 1 4 8 ...... embedded in segments of
157 f7 Low 1 4 8 moderate achievabie payioad
...
to reduce overhead.
364 T.Tennis f4 High 1 18 32 .....
365 f5 High 1 18 32
366 f6 H~ 2 12 32
367 f7 Hiidl 2 12 32 Different segments of the
448 fl!8 Zero -I
...
0
...
0
> same video sequence have
different embedding
capabilities.
449 fl!9 Zero -1 0 0
450 f90 Low 7 4 8
451 f91
...
Low 7 4 8 -'"
TOTAL
660 frames,
3032 bits
3 concatenated se .
7
Data Hiding for Image Authentication
For years, audio, image, and video have played an important role in journal-
ism, archiving, and litigation. A coincidentally captured video clip became a
crucial piece of evidence in the prosecution of the well-known 1993 Rodney
King incident; a secretly recorded conversation between Monica Lewinsky
and Linda Tripp touched off the 1998 presidential impeachment; just to
name a few. Keeping our focus on still pictures, we have seen that the va-
lidity of the old saying "Picture never lies" are seriously challenged in the
digital world of multimedia. Compared with the traditional analog multi-
media signal, making seamless alteration on digital signal is much easier by
a growing number of software editing tools. With the popularity of scan-
ner, printer, digital camera, and digital camcorder, tamper detection for
images becomes an important concern [122]. In this chapter, we discuss the
use of digital watermarking techniques to partially solve this problem by
embedding authentication data invisibly into digital images.
In general, authenticity is a relative concept: whether an item is authen-
tic or not is relative to a reference or certain type of representation that is
regarded as authentic. Authentication is usually done by checking whether
specific rules and relationships that are expected to be found in an authentic
copy are still hold in a test signal. In traditional network communications, a
sophisticated checksum, usually known as hash or message digest, is used to
authenticate whether the content has been altered or not. The checksum is
encrypted using such cryptographic techniques as public-key encryption to
ensure that the checksum cannot be generated or manipulated by unautho-
rized parties. This is the digital signature technique in cryptography [25].
For traditional type of source data such as text or executable codes, the
120 7. Data Hiding for Image Authentication
checksum is stored or transmitted separately since even minor changes on
this kind of data may lead to different meaning. Perceptual data such as
digital image, audio and video are different from traditional data such as
text and computer codes in that perceptual data can be slightly changed
without introducing noticeable difference. This provides new room for au-
thenticating perceptual data. For example, we can imperceptibly modify an
image so that for each pixel, the least significant bit is set as the checksum
of other bits. In other words, the checksum is embedded into the image in-
stead of being stored separately as in the case of traditional data. Such an
embedding approach falls in the category of digital watermarking I data-
hiding. For example, fragile watermarking [31] can be used to insert into
an image some special data which will be altered when the host image is
manipulated.
Many general techniques of data hiding can be applied to this specific
applications, such as the general approaches discussed in Part I and the
image data hiding approaches presented in Chapter 6. But the algorithm
design has to be aware of a few unique issues associated with authentication,
including the choice of what to embed and the security considerations of pre-
venting forgery or manipulation of embedded data. The following features
are desirable to construct an effective authentication scheme for images:
1. to be able to determine whether an image has been altered or not;
2. to be able to integrate authentication data with host image rather
than storing separately;
3. the embedded authentication data be invisible under normal viewing
conditions;
4. to be able to locate alteration made on the image; and
5. to allow the watermarked image be stored in lossy-compression format,
or more generally, to distinguish moderate distortion that does not
change the high-level content vs. content tampering.
This chapter presents a general framework of watermarking for authen-
tication and proposes a new authentication scheme by embedding via table
look-up a visually meaningful watermark and a set of features into the trans-
form domain of an image. The embedding is a Type-II technique discussed
in Chapter 3. Making use of the quantized versions of Type-II embedding,
our proposed approach can be applied to compressed image using JPEG
or other compression techniques, and the watermarked image can be kept
in the compressed format. The proposed approach therefore allows distin-
guishing moderate distortion that does not change the high-level content
from content tampering. The alteration made on the marked image can be
also located. These functionalities make the proposed scheme suitable for
building a "trustworthy" digital camera. We also demonstrate the use of
7.1 Review of Prior Art 121
shufRing (Chapter 4) in this specific problem to equalize uneven embedding
capacity as well as to enhance the embedding rate and security.
7.1 Review of Prior Art
The existing works on image authentication can be classified into several
categories: digital signature based, pixel-domain embedding, and transform-
domain embedding. The latter two categories belong to invisible fragile or
semi-fragile watermarking.
Digital signature schemes are built upon the ideas of hash (or message
digest) and public-key encryption that were originally developed for veri-
fying the authenticity of generic data in network communications. Fried-
man [124] extended it to digital image as follows. A signature computed
from the image data is stored separately for future verification. This image
signature can be regarded as a special encrypted checksum. It is unlikely
that two different natural images have the same signatures, and even if
a single bit of the image data changes, the signature may be totally dif-
ferent. Furthermore, public-key encryption makes it very difficult to forge
signature, ensuring a high security level. Following this work, Schneider et
al. [133] and Storck [134] proposed content-based signature. Signatures are
produced from low-level content features, such as the mean intensity of each
block, to protect image content instead of the exact representation. Another
content-signature approach by Lin et al. developed the signature based on
a relation between coefficient pairs that is invariant before and after com-
pression [42] [126]. Strictly speaking, these signature schemes do not belong
to watermarking since the signature is stored separately instead of being
embedded into images.
Several pixel-domain embedding approaches have been proposed. In Ye-
ung et al.'s work, a meaningful binary pattern is embedded by enforcing
certain relationship according to a proprietary look-up table. Their work al-
lows the tampering that is confined in some local areas to be located [139].
Walton proposed an approach by embedding data via enforcing relationship
between sets of pixels [135]. Another pixel-domain scheme was proposed
by Wong [136]. This scheme divides an image into blocks, then copies the
cryptographic digital signature of each block in the least significant bits of
the pixels for future verification. However, images marked by these pixel-
domain Invisible Fragile Watermarking schemes cannot be stored in lossily
compressed format such as JPEG compression, which is commonly used in
commercial digital cameras.
In addition to pixel-domain approaches, several block DCT-domain schemes
may be used for authentication purpose. Swanson et al. [72] round coeffi-
cients to multiples of just-noticeable difference or mask value, then add or
subtract a quarter to embed one bit in an 8 x 8 block. Koch et al [69] embed
one bit by forcing relationships on a coefficient pair or triplet in mid-band.
122 7. Data Hiding for Image Authentication
The two approaches achieve limited robustness via pre-distortion, and the
embedding is likely to introduce artifacts in smooth regions. Similar problem
exists in other approaches, including a DCT-domain quantization approach
by Lin et al. [42][127], and a Wavelet-domain quantization approach by
Kundur et al. [125], both of which embed a signature in transform domain
by rounding the quantized coefficients to an odd or even number. Additional
data can be embedded to recover some tampered or corrupted regions, such
as the self-recovery watermarking proposed by Fridrich et al. in [123] and
by Lin et al. in [42]. Readers may refer to [128][132] for more surveys on
fragile watermarking and watermarking on authentication.
Recalling the desirable requirement for image authentication presented
in the previous section, we find that many existing approaches in literature
cannot satisfy all requirements. The digital signature proposed in[124], as
well as the content based signature reported in [133] and [134] do not satisfy
the requirements 2 and 4. The pixel-domain scheme [135, 136, 139] can
not be stored in lossy compression format. In addition, transform-domain
schemes [42, 69, 72, 125] do not handle the uneven embedding capacity
problem raised in Chapter 4, therefore may either introduce artifacts in
smooth region or embed only a small number of authentication bits.
7.2 Framework for Authentication Watermark
We propose a general framework including the following elements for wa-
termark based authentication:
1. what to authenticate,
2. what data to be embedded for authentication purpose,
3. how to embed data into an image,
4. how to handle uneven embedding capacity, and
5. how to ensure security.
The first element is fundamental and affects the other elements. We have
to decide whether to authenticate the exact representation of an image,
or to have some tolerance toward certain processing such as compression,
cropping and scaling. In addition, we need to determine other functionalities
we would like to achieve, such as the capability of locating alterations. The
designs of the next two elements, namely, what and how to embed, are based
on the answer to the first element. More specifically, we can either mainly
rely on the fragility of the embedding mechanism to detect tampering (e.g.,
to put zeros in the least significant bits of all pixel values and later to check
whether such properties still hold or not on test images), or rely on the
embedded data (e.g., to robustly embed a 64-bit check sum of the image
7.3 Transform-domain Table Lookup Embedding 123
features such as the block mean intensity or image edge map, and later to
check whether the extracted check sum matches the one computed from the
test image), or use both. For local embedding schemes such as the TDM type
modulation discussed in Chapter 4, special handling with smooth regions,
or in general, the uneven embedding capacity, is necessary to achieve high
embedding rate and to locate alterations. Besides an appropriate design of
what and how to embed, the detailed implementation must take security
issues into account in order to meet the demands in practical applications,
for example, to make it difficult for attackers to forge valid authentication
watermarks in a tampered image.
Following the above framework, we discuss our proposed authentication
watermarking approach based on both the fragility of embedding mechanism
and matching the embedded features with the features extracted from a test
image. The detection of tampering relies on both the embedding mechanism
and the embedded data. The alterations can also be located unless there is
global tampering or the tampered area is too large. We shall present our
approach in the context of grayscale images with JPEG compression. The
extension to images compressed using other means such as Wavelet and to
color images will be briefly discussed in Section 7.6. A block diagram of
the embedding process is shown in Fig. 7.1. Aside from the block labelled
"embed" , it is identical to the JPEG compression process [23]. Watermarks
are inserted into the quantized DCT coefficients via a look-up table. Ex-
plained below are two aspects of watermark-based authentication, namely,
to embed what data and how to embed them.
Lookup Table
Original
Image
I block
DCT I • Marked
Coeff.
Bit Stream of
Marked Image
Data to
be embedded
FIGURE 7.1. Block diagram of embedding process for authentication watermark-
ing. The "Quant." module represents the quantization step.
7.3 Transform-domain Table Lookup Embedding
The data for authentication purpose is generally embedded using a Type-
II approach discussed in Chapter 3 for its high embedding capacity and
fragility, both of which are useful in authentication. Here we present a Type-
II embedding using a look-up table in a transformed domain. This transform
domain look-up table embedding is an extension of the pixel-domain scheme
124 7. Data Hiding for Image Authentication
proposed by Yeung et al. [139). The embedding is performed on the quan-
tized coefficients with a set of pre-selected quantization step sizes, which
are known to the detector because the extraction of hidden data must be
performed in the same quantized domain. As discussed in Chapter 3, this
quantization is a pre-distortion step to obtain limited robustness against
compression and other distortion.
A proprietary look-up table (LUT) is generated beforehand by image
owner or digital camera manufacturer. The table maps every possible value
of JPEG coefficient randomly to "I" or "0" with a constraint that the
runs of "I" and "0" are limited in length. To embed a "I" in a coefficient,
the coefficient is unchanged if the entry of the table corresponding to that
coefficient is also a "I" . If the entry of the table is a "0" , then the coefficient
is changed to its nearest neighboring values for which the entry is "1".
The embedding of a "0" is similar. This process can be abstracted into the
following formula where Vi is the original coefficient, vi' is the marked one,
bi is the bit to be embedded in, Q(.) is the quantization operation 1, and
LUT[·] is the mapping by a look-up table:
Q(Vi) if LUT[Q(Vi)] = bi
v/ = { Vi + {) if LUT[Q(Vi)] i- bi , and (7.1)
0= minldl{d = Q(x) - Vi s.t. LUT[Q(x)] = bi }.
The extraction of the signature is simply by table lookup. That is,
bi = LUT[Q(v/)] (7.2)
where bi is the extracted bit.
The basic idea of the embedding process is also illustrated by the example
in Fig. 7.2. Here, zeros are to be embedded in two quantized AC coefficients
with values "-73" and "-24" of an 8 x 8 image block. The entry in the table
for coefficient value "-73" is "1". In order to embed a "0", we have to change
it to its closest neighbor for which the entry is "0". In this example, "-73" is
changed to "-74". Since the entry for coefficient value "24" is already "0",
it is unchanged.
As mentioned earlier, the detection of tampering is based on both the
embedding mechanism and the embedded data. The clues provided by the
embedding mechanism is as follows: when a small part of a watermarked
image is tampered without the knowledge of the look-up table, the extracted
bit from each tampered coefficient becomes random, i.e.,
1
:2
A A
P(bi = 0) = P(bi = 1) =
implying that it is equally likely to be the same as or be different from
the bit bi originally embedded. For the moment, we assume that a detector
IFor a uniform quantizer with quantization step size q, the quantization operation
Q(x) is to round x to the nearest integer multiples of q.
7.3 Transform-domain Table Lookup Embedding 125
original i?
-73 IT]- - -x- -!fJ Embed ' 0" (changed)
.~ Mark~
36 73 8 7 3 1 1 0 0 74
2' 5 10 -2 1 0 0 0 0 24
-IJI
5 2 -I o ·1 0 0 0
J=
2 ·1 0 0 0 0 0 0
/
-
0
0
0
0
0 0
0 0 0 0
0 0 0
0 0
0
- embed e-e-
0 0 0 0 0 0 0 0 I
0 0 0 0 0 0 0 0 I
24 0 ----- Ol t> 24 Embed ' 0" (unchanged)
Look-up [ .. .... -75 .-74 -73 1-72 ...... 23 24 25
Table ...... 1 0 1 1· . . . . . 0 0 1
(LUT)
FIGURE 7.2. Frequency-domain Embedding Via Table Lookup: zeros are embed-
ded to two quantized DCT coefficients "-73" and "24" by enforcing relationship
according to a look-up table. After embedding, the coefficients in the watermarked
image are "-74" and "24" .
has knowledge of the originally embedded data {bJ and the justification of
this assumption will be presented later. From a single coefficient, it is not
reliable to determine whether tampering occurs or not because there is a
50% chance that the extracted data matches the originally embedded one,
i.e., bi = bi. However, if the tampering affects several coefficients in a block
and/or coefficients in several blocks, the chances of miss detection (i.e. , all
decoded data of altered region happen to match the originally embedded
ones) are reduced exponentially:
where n is the number of coefficients affected by tampering. For example,
miss detection probability is around 0.00098 when n is equal to 10.
According to Chapter 3, the table lookup embedding is a Type-II embed-
ding and relies on deterministic relationship enforcement. From set partition
point of view, all possible values of a quantized coefficient are divided into
two subsets, each of which conveys special meaning and the partition rule
is set by the table. One subset contains values which map to "I" according
to the lookup table, and the other subset contains those that map to "0".
The embedding process introduces minimal necessary changes to force a
quantized coefficient to take value from the subset that maps to the binary
data to be embedded.
7.3.1 Considerations for Imperceptibility fj Security
Several steps are taken to ensure that the embedding is invisible:
126 7. Data Hiding for Image Authentication
• The run of "1" and "0" entries in the LUT is constrained to avoid
excessive modification on the coefficients;
• The DC coefficient in each block is not changed to avoid blocky effect
unless the quantization step is very small 2;
• Small valued coefficients (mostly in high frequency bands) are not
modified to avoid large relative distortion.
Coefficients that are allowed to be changed according to these constraints
are called embeddable or changeable. The number embeddable coefficients
vary significantly, and this "uneven embedding capacity" problem has been
discussed in Chapter 4. Also, extraction errors may occur due to image
format conversion, rounding, and other causes involving no content changes.
To address these issues, we first apply shuffling to equalize the unevenly
distributed embedding capacity, as discussed in Chapter 4. A proper block
size is determined according to the overall embedding capacity measured by
the total number of changeable coefficients. The side information regarding
the block size can be conveyed using the approaches discussed in Chapter 4
and 6, for example, using additive spread spectrum embedding. Then, one
bit is embedded in each shuffled block by repeatedly embedding the same
bit to all embeddable coefficients in the shuffled block. The bit is extracted
by a detector in the same shuffled domain via majority voting.
The algorithm shown in Table 7.1 is for generating an L-entry look-up
table T[·] with maximum allowed run of r and index i E {1, ... , L}.
TABLE 7.1. An algorithm for generating look-up table with constrained runs
Step-1: i = 1.
Step-2: If i > rand T[i - 1] = T[i - 2] = ... = T[i - r],
then T[i] = 1 - T[i - 1].
Otherwise, generate a random number out of {O, 1}
with a probability 0.5 : 0.5, and set T[i] to this value.
Step-3: Increase i by 1. If i > L, stop. Otherwise go back to Step-2.
To analyze the minimum secure value of r, we start with the case of r = 1,
which has only two possibilities:
T[i + 1] = 1- T[i], T[O] E {O, 1}, i E N
or equivalently,
T[i] = {O1 (i is even)
(i is odd)
or T[']
Z
= {1
0
(i is even)
(i is odd)
2This constraint may be loosened to allow the DC coefficients in texture regions to be
modified, as the change there is likely to be invisible.
7.3 Transform-domain Table Lookup Embedding 127
This implies that the odd-even embedding discussed in Chapter 3 is a special
case of table-lookup embedding. Since there is very little uncertainty in the
table, it is easy for unauthorized persons to manipulate the embedded data,
and/or to change the coefficient values while retaining the embedded data
unchanged. Therefore, r = 1 is not a proper choice if no other security
measure, such as a careful design of what data to embed, is taken.
When r is increased to 2, the transition of the LUT entries has Markovian
property as shown in Fig. 7.3. We can show that starting from "0" or "1",
the number of possible LUTs with i elements long, F i , forms a Fibonacci
series:
Fa = 1,F1 = 1. (7.3)
The total number of possible sequences with length L 256 is on the
order of 1053 . Although this number is smaller than the number of possible
sequences without run length constraint, which is 2256 or on the order of
1077 , the table still has high uncertainty, and the probability of obtaining
table by guessing is very small. Thus from embedding mechanism point of
view, the minimum secure choice of r is 2.
/0 . .
1"" 1 ...
i=1 2 3 ...
FIGURE 7.3. (left) Markovian property of restricted LUT generation with max-
imum run of 2, where "wp" stands for "with probability"; (right) An expansion
tree illustrating the generation of restricted LUTs of length i.
The mean squared error incurred by table-lookup embedding with r =
2 is computed as follows. First, we consider the error incurred purely by
quantization, i.e., rounding the input coefficient in the range of A E [(k -
1/2)Q, (k + 1/2)Q) to kQ:
MSE( quantize to kQ ) IA (7.4)
This is the case if the entry corresponding to the original quantized coeffi-
cient in the table has the same value as the bit to be embedded. We then
128 7. Data Hiding for Image Authentication
consider the case of having to shift the coefficient to (k - 1) Q in order to
embed the desired bit:
(k+~)Q 1
MSE(shift to (k - l)Q)IA = l [x - (k - 1)Q]2 x Qdx
(k-!)Q
= (7.5)
where y = x - (k -l)Q. By symmetry, the MSE for shifting to (k + l)Q is
same as above. Hence the overall MSE is:
overall MSE
MSE( to (k-1)Q ) IA + MSE( to (k+1)Q ) IA
~
4 +
MSE( to kQ ) IA
2
1 13 2 1 Q2 7 2
= 2 x 4x 12 Q +"2 x 12 = 12 Q (7.6)
This is achievable since a look-up table with r = 2 requires at most one-
Q-step modification away from kQ. The MSE is a little larger than the
case of r = 1 (i.e., the odd-even embedding in Chapter 3), which gives an
MSE of Q2/3 and is equivalent to double the quantization size as far as the
distortion is concerned.
7.3.2 Estimating Embedded Data and Changes
Earlier when explaining the extraction of embedded data, we have assumed
that a detector has the knowledge of the originally embedded data {bi }. As
we will show next, this knowledge is not necessary in practice, especially with
the incorporation of majority voting, error correction coding and shuffling.
Assuming that the tampering is restricted to a small portion of an im-
age, the changed coefficients tend to occur in small clusters and the number
of them is not large. After shuffling, these coefficients will be diffused to
many blocks in the shuffled domain. Because each shuffled block is unlikely
to receive too many changed coefficients by the nature of shuffling 3, the
few changed coefficients in each shuffled block will not affect {b i }, which
is the bit ultimately extracted from that block via table lookup and er-
ror correction coding. This implies that the extracted data, {b i }, can be
regarded as a good estimate of the originally embedded data, {b i }. Using
{b i } as "ground truth", we can determine what bit value is supposed to be
embedded into each embeddable coefficients by an embedding system. By
comparing the supposedly embedded data with the data actually extracted
3This is supported by the same analysis on shuffling in Section 4.7, with the percentage
of blue balls (the balls of interest), p, being very small.
7.4 Design of Embedded Data 129
from each coefficients of a test image, we are able to identify the changed
coefficients.
The change identification by this two-step process relies on the fragility
of embedding, namely, the tampering may change the originally embedded
data. In the next section, we shall see a second way to detect tampering,
which relies on the embedded data. By then, we will get a more complete
picture of the authentication framework introduced in Section 7.2.
7.4 Design of Embedded Data
We mentioned earlier that the embedded data can be used to detect tam-
pering. The authentication data we propose to embed consists of a visually
meaningful binary pattern and content features. As we shall see, the combi-
nation of these two types of data is suitable for such image authentication
applications as building "trustworthy" digital cameras.
7.4 .1 Visually Meaningful Pattern
The visually meaningful pattern, such as letters and logos, serves as a quick
check for signaling and locating tampering. Shown in Fig. 7.4 is a binary
pattern "PUEE" . The decision on whether an image is altered or not can be
made by (automatically) comparing the extracted pattern with the original
one, if available, or by human observation of the extracted pattern. The
latter case relies on a reasonable assumption that a human observer can
distinguish a "meaningful" pattern from a random one. It is also possible
to automatically make such decisions, for example, through a randomness
measure.
PUEE
FIGURE 7.4. A binary pattern as part of the embedded data
7.4 .2 Content-based Features
Content features offer additional security to combat forgery attack and help
distinguish content tampering vs. minor, non-content change. Due to the
130 7. Data Hiding for Image Authentication
limited embedding bit rate, content features have to be represented using
very few number of bits. An example of content features is the most sig-
nificant bit of macroblock mean intensity. Another example is the sign of
intensity difference between a pair of blocks or macroblocks. These features
bring dependence on images to the data to be embedded, therefore effective
against forgery attacks that rely on the independence [158J. Other features,
such as the edge map and luminance/color histogram, can also be used. A
general formulation of local features of the block (i, j) is
bofoo, J' = f(v.
-1,-
. k 2' ."'-'l..-
k 1,)- . V· . V'+l ,3'
V· 1 ,J'-1.,1'-% . V'+k 3,)'+k)
""-1. 4
where kl' k2, k3, k4 E N, 'Qi,j represents a collection of all the coefficients
in the block (i,j), and fO is a deterministic function which is known to
both the embedder and the detector. The detector compares the features
computed from the test image and the ones extracted by table look-up (Le.,
the features embedded by the embedder). A mismatch between the two sets
of features is an indication that the image has been tampered.
Content features are especially useful in authenticating smooth regions.
As discussed in Section 7.3.1, no data can be embedded in smooth regions
without introducing visible artifacts, hence it is impossible to rely on the
embedding mechanism to signal the tampering of these regions, i.e., it is
impossible to embed data with certain regularity and later check the alter-
ation of such regularity at detector. The tampering associated with smooth
regions includes the case of altering a smooth block to another smooth
one (e.g., changing luminance or color) and the case of altering a complex
block to a smooth one. Changing smooth block to complex block is easy
to detect because when the detector sees the complex block, it assumes
some data have been embedded, but the extracted data from these altered
block will be random as an attacker has no knowledge of the secrets used
in embedding. Although there are limited meaningful changes that can be
applied by changing original blocks (either smooth or complex) to smooth
ones, we believe that an effective authentication scheme should take smooth
region authentication into account instead of risking miss detection of pos-
sible meaningful alterations. Since the embedded data can be used to detect
tampering, alterations in smooth regions can be detected by relying on the
embedded data, i.e., we embed features derived from smooth regions in the
embeddable regions and later check the match between the features com-
puted from a test image and those extracted by the watermark detection
module. In addition, the features based on block mean intensity are useful
in detecting intentional alterations such as the possible meaningful tamper-
ing by only changing some DC coefficients, because the embedding scheme
often leaves DC coefficients untouched to avoid blocky artifacts.
7.5 Experimental Results 131
7.5 Experimental Results
We first present the results of an earlier design that does not use shuffling
and embeds less data than the shuffling approach. A JPEG compressed im-
age of size 640 x 432 is shown in Fig. 7.5. Fig. 7.6(a) is the same image
but with a 40 x 27 binary pattern "PUEE" of Fig. 7.6(b) and the MSB
of macroblock mean intensity embedded in. This image is visually indis-
tinguishable from the original. In terms of PSNR with respect to original
uncompressed image, the watermarked one is only IdB inferior to the im-
age with JPEG compression. The smooth regions of the image are mainly in
the sky area. For these blocks, backup embedding is used, namely, the data
associated with the i-th block are embedded in both the i-th block and a
companion block indicated in see Fig. 4.4.
FIGURE 7.5. An original unmarked 640 x 432 image Alexander Hall (stored in
JPEG format with a quality factor of 75%) . Watermarking is performed on its
luminance components.
The marked image is modified by changing "Princeton University" to
"Alexander Hall" and "(c) Copyright 1997" to "January 1998", shown in
Fig. 7.7(a). This image is again stored in the JPEG format. For the ease of
visualizing embedded pattern, we shall denote the block that embeds "0" as
a black dot , the block that embeds "I" as a white dot, and the block with
no obvious majority being found in detection as a gray dot. Similarly, to
visualize the feature matching result, we use a black dot for the unmatched
block, white for the matched one, and gray for the block within which we
have not found an obvious majority in detection hence have low confidence in
determining a match or unmatch. With these notations, Fig. 7.7(b) and (c)
132 7. Data Hiding for Image Authentication
PUEE
FIGURE 7.6. [alb] Watermarked image without using shuffling during embedding:
(a) watermarked image; (b) embedded binary pattern.
show the extracted binary pattern and content feature matching result of the
tampered image. The random pattern corresponding to the tampered blocks
are clearly identifiable. We can see that using the backup embedding, there
are very few unembeddable bits at an expense of the reduced embedding
rate. Also notice that round-off and truncation errors may occur during the
verification and tampering, which contributes to several unexpected dots
out of the altered regions.
As discussed in Chapter 4, shuffling equalizes the uneven embedding ca-
pacity and allows embedding more data. The example shown in Fig. 7.8
has a BCH encoded version of the "PUEE" pattern and multiple content
features embedded in its luminance components. There is no visual dif-
ference between this watermarked image and the original unwatermarked
copy in Fig. 7.5. The embedded content features include the most signif-
icant bit of the average macroblock intensity and the smoothness of each
macroblock. The combined result of pattern comparison and feature match-
ing provides information regarding both content tampering and minor non-
content changes such as those introduced by rounding or recompression.
7.5 Experimental Results 133
...
PUE'E
FIGURE 7.7. [ble] Authentication result with watermark embedded without shuf-
fling: (a) an altered version of the watermarked image (stored in 75% JPEG);
(b) extracted binary pattern from edited image; (c) feature matching result.
Fig. 7.9(a) shows various content alterations made to the watermarked im-
age. A comprehensive report combining both pattern comparison and fea-
ture matching result is shown in Fig. 7.9(b), with whiter pixel indicates
higher likelihood of content changes. We can see that the improved au-
thentication system using shuffling is able to identify content changes with
little false alarm. The quantization domain used by the embedding step is
the same as JPEG with quality factor 75%, implying the system is able to
tolerate compression and other distortions that are comparable to or less
severe than JPEG 75%.
134 7. Data Hiding for Image Authentication
FIGURE 7.8. An image with authentication watermark embedded using shuffling
7.6 Extensions
Multi-level Embedding & Unequal Error Protection We mentioned
in Section 7.4 that two sets of data, namely, a meaningful visual pattern and
a set of low-level content features, are embedded in the image for authentica-
tion purpose. More generally, the idea of multilevel data hiding in Chapter 6
can be incorporated to embed several sets of data with different levels of er-
ror protection and using embedding mechanisms with different robustness.
The embedded data sets could be image features at different resolutions:
the coarser the level is, the more it is protected. The multi-resolution fea-
tures with unequal error protection can help us authenticate an image in a
hierarchical way.
Other Compression Format Besides JPEG compressed image, we have
also extended our approach to Wavelet compression and found it effective
in detecting tampering. In addition to the advantage in efficient coding,
Wavelet domain offers convenience to implement the above-mentioned hier-
archical authentication [125, 138]. Since wavelet is selected as the core of the
new JPEG2000 compression standard [16], building authentication systems
that are compatible with JPEG2000 is a trend in the near future.
Color Images For color images, we currently work in YCrCb coordinates
and use the proposed approach to mark luminance components while leaving
chrominance components unchanged. A better way is to apply the proposed
7.6 Extensions 135
(a)
ch
(b)
FIGURE 7.9. Authentication result with watermark embedded using shuffling:
(a) an altered version stored in 75% JPEG format of Fig. 7.8; (b) authentication
result with whiter dot denoting higher likelihood of content manipulation for the
corresponding area in (a).
136 7. Data Hiding for Image Authentication
approach to chrominance components as well to embed more data and to
detect potential tampering of colors. We may also work in other color coor-
dinates, such as RGB. However in practice, due to the limited computation
precision, we are likely to find a few pixels whose YCrCb or RGB values
may change after one or more rounds of color coordinate conversion [180].
This is similar to the rounding and truncation errors incurred by going back
and forth between pixel domain and transform domain. Pre-distortion via
quantization and error correction codes can help combat these errors.
Video A system for authenticating MPEG compressed digital video
can be designed by marking I-frames of video streams using our proposed
scheme. In addition, I-frame serial number can be used as part of embedded
data to detect modification such as frame reordering and frame dropping.
Alternatively, these frame jitters can be detected via a spread-spectrum
watermark in each frame, which is embedded using the same approach as
the embedding of frame synch index in Section 6.3. P- and B-frames can be
similarly watermarked but with larger quantization step size and more error
protection in embedding to survive motion compensation during moderate
compression. This practical consideration is similar our implementation of
multilevel data hiding for video in Section 6.3. Compressed domain embed-
ding in these predicted P- and B- frames are also possible by manipulating
the residues of motion compensation.
7.7 Chapter Summary
In this chapter, we have presented a general framework of watermarking
for authentication. We pointed out the importance of the embedding mech-
anism and the data to be embedded in authentication applications. We
proposed a new authentication scheme by embedding a visually meaning-
ful watermark and a set of features in the quantized transform domain via
table look-up. The use of this Type-II embedding in quantized domain has
enabled us distinguishing between content tampering and moderate distor-
tion that does not change the high-level content. The alteration made on
the marked image can be also localized. In addition, we demonstrated the
use of shuffling in this specific problem for equalizing the uneven embedding
capacity and enhancing both embedding rate and security. More discussions
about the attacks and countermeasures on watermark based authentication
will be presented in Chapter 9.
Acknowledgement Fig. 7.5 was edited from a Princeton HomePage Pho-
tograph at http://www.princeton.edu/ Siteware/Images/ Cornerpictures/
cornerpixs.shtml taken by Robert P. Matthews as of Year 1997.
8
Data Hiding for Video Communication
Motivated by the traditional watermarks in paper, ownership verification
and tampering detection are the initial motivations of embedding digital
watermarks in multimedia source. Examples of these watermark systems
are shown in Part II. In general, data hiding provides a way to convey side
information that can also be used for many other purposes. For example,
Silverstein et al. proposed to embed into an image a map indicating the re-
gions for which an enhancement scheme effectively improves the perceptual
quality and later to use this embedded information to direct the selective
enhancement only for these regions [144][145]; Song et al. used embedding
in motion vectors to facilitate key distribution and key updating in secure
multimedia multicast [45][147]. In this chapter, we discuss the applications
of data hiding in video communications, where the side information helps
to achieve additional functionalities or better performance.
Delivery of digital video through network and wireless channels is be-
coming increasingly more common. Limited bandwidth and channel errors
are two major challenges in video communication. Transcoding a video to
a lower rate helps to cope with the bandwidth limitation by gracefully de-
grading visual quality, while concealing corrupted regions of an image/video
is commonly used to compensate the perceptual quality reduction caused
by transmission errors. In the followings, we will explain how to apply data
hiding to these problems to enhance the performance.
138 8. Data Hiding for Video Communications
8.1 Transcoding by Downsizing Using Data Hiding
A number of bit rate reduction techniques have been proposed for transcod-
ing, including frame dropping, suppressing color, discarding high order neT
coefficients and re-quantizing neT [193]. The reduction of spatial resolu-
tion can significantly reduce the bit rate [188], but the processing is rather
involved. More specifically, most videos are stored in compressed domain
involving neT transform and motion compensation. As many applications
require real-time transcoding, it is desirable to carry out the transcoding in
the compressed domain and the approach should aim at reducing the com-
putational complexity while achieving a reasonable visual quality. Typically,
motion estimation consumes 60% of encoding time [188], while motion com-
pensation consumes 11%. In order to transcode with low delay, we need to
concentrate on reducing the complexity in these two steps [149].
8.1.1 Overview of Proposed Approach
We propose a fast approach to obtain from an MPEG stream a new MPEG
stream with half the spatial resolution. That is, each macroblock (16 x 16) in
the original video becomes one block (8 x 8) in the reduced-resolution video.
We work directly in the compressed domain to avoid the computationally
expensive steps of converting to the pixel domain. Two problems need to
be addressed in order to generate a standard-compliant bit stream for the
reduced-resolution video: (I)For the I-frames, how to produce an 8 x 8 neT
block for the reduced-size video from four 8 x 8 DCT blocks of the original
video? (2)For P- and B-frames, how to produce a motion vector and residues
for the new 16 x 16 macroblock from four 16 x 16 macroblocks? For the
first problem, several computationally efficient solutions have been proposed
in [186]. For the second problem, we need to compute one motion vector
from four motion vectors and to produce the neT of the residues for one
macroblock in the reduced-resolution video given the neT of the residues for
four macroblocks in the original video. In [188], an algorithm was proposed
to estimate the motion vector of the reduced-resolution video by using a
weighted mean of the original motion vectors. The neT of the residues is
computed by reconstructing the P- and B-frames for both the original and
the reduced resolution, and by using the estimated motion vectors. The
computation of this closed-loop neT domain approach is still rather costly.
We focus on the P-frames in this section; B-frames can be treated simi-
larly. We first propose an adaptive motion estimation scheme to approximate
the motion vectors of the reduced-resolution video using the motion infor-
mation from the corresponding blocks in the original full-resolution video as
well as their neighboring blocks. This idea is similar to the overlapped block
motion compensation in [187]. We then propose a transform domain open-
loop approach to produce the neT of the residue, thus eliminating the need
to reconstruct the frames as in [188]. After downsizing the original four-block
8.1 Transcoding by Downsizing Using Data Hiding 139
residues to one-block, we use subblock motion information to improve the
image quality. As the subblock motion information is not compatible with
MPEG-like standards, it is sent as side information using data hiding so as
to comply with video encoding standards. The transcoded bit stream can
thus be decoded at reasonable visual quality by a standard decoder. Better
image quality can be obtained, however, with a customized decoder after
extracting the hidden information. Because the residue is computed in an
open-loop manner, there is a tendency of error accumulation, particularly
when there is considerable motion and/or large GOP. To overcome this, the
GOP structure can be modified to reduce the number of frames between
two successive I-frames.
In the following, we shall elaborate the use of data hiding for conveying
subblock motion information.
8.1.2 Embedding Subblock Motion Information
When the motion vectors are changed, obtaining accurate residue informa-
tion usually requires reconstructing the entire frames. Directly downsizing
residues saves computation in frame reconstruction, but since it is differ-
ent from the accurate residue, motion compensation may not produce good
result if only a single motion vector is used (as shown in Figure 8.1 (b)).
If all four motion vectors can be used, as shown in Figure 8.1 (c), the re-
sulting motion compensation would be better and image quality can be
significantly improved. This is similar to the use of sub block motion com-
pensation in [185]. For our current problem, we would like to send u, as well
as the differences u - Ui where i = 1 ... 4. However, the syntax of MPEG-
1/2 does not allow subblock motion information to be included. To maintain
compatibility with MPEG-like standards, the subblock motion information
can be sent in the user-data part of the stream. This would maintain image
quality but at the expense of increasing the bit rate. A standard decoder
would give reasonable visual quality, while a customized decoder would be
able to extract the side information and produce improved images.
We propose to send the subblock motion information as side information
using data hiding. Specifically, we embed the subblock information in the
DCT residues. Since modifications of high frequency DCT coefficients tend
to produce less noticeable artifacts, we can embed the side information in
these coefficients, keeping DC coefficients and low order AC coefficients un-
changed. One way to send the difference between U and Ui is to encode it
in the highest frequency coefficient of the ith subblock DCT residue whose
quantized value is non-zero, i.e., we replace the coefficient with the motion
vector difference. This embedding preserves the efficiency in run-length cod-
ing, hence introduces little overhead in terms of the bit rate of the video.
The horizontal and vertical motion components are encoded separately by
spliting the coefficients of a block into two parts and encode one motion
component in one part.
140 8. Data Hiding for Video Communications
downsize
4 macroblocks to 1
"nIb)
I
!
l_~. . w ••• w ... w ••• w".w ••• w •
I single motion vector
!
..l
(standard compliant)
(c)
subblock motion
(non-compliant)
(a)
FIGURE 8.1. Relationship among motion vectors of an original video frame and
its downsized version
B.l.3 Advantages of Data Hiding
We mentioned above two ways of conveying side information, namely, at-
taching it separately such as in user data field, and embedding in the media
source. In Chapter 3, we discussed the bit reallocation nature of data hiding
and explained that embedding does not offer "free" bandwidth. Instead, the
most obvious advantage of data hiding is that the hidden data is carried
with the source in a seamless way. This close association enables conveying
of side information while maintaining compliant appearance when no user
data field is available. It also enhances security since the existence and/or
the location of the hidden data can be made difficult to identify. Thus, for
robust data hiding, an unauthorized person has to distort the host media
by a significant amount to remove the hidden data.
The efficiency in encoding side information is another advantage of data
hiding. It is possible to embed the secondary data for a particular region
A in A or another region that has deterministic relation with A. Such
association can help us save overhead in encoding the region index if the
secondary data is sparsely distributed and separately encoded. In addition,
when the side information is directly put in user data field (if available), the
total number of bits will increase (Fig. 8.2-top). To keep the total number
of bits almost identical to the original one, the media source has to be
transcoded into a lower rate (Fig. 8.2-middle). Such transcoding is not a
trivial task because sophisticated rate control may have to be involved for a
good tradeoff between bit rate and visual quality. In contrast, as illustrated
in Fig. 8.2 (bottom), data embedding is a convenient way to convey side
information while preserving the total bit rate due to its bit re-allocation
nature.
8.1 Transcoding by Downsizing Using Data Hiding 141
Attachment
Data Hiding
{
bltrate
complexity
11
11 _I
1 Video
Video'
Video + Side Info.
• user data field
o video data field
FIGURE 8.2. Comparison of sending side information via data hiding vs. attaching
to user data field.
8.1.4 Experimental Results
Our implementation is performed on MPEG-1, while the extension to other
DCT based hybrid video coding is straightforward. We tested our approach
using the two well-known sequences "football" and "table tennis" with a
15-frame GOP. The picture size is 352 x 224 for the original sequences and
176 x 112 for the reduced-resolution sequences. The quantization scaling
factor for I-frames is 8, for P-frames 10, and for B-frames 25. We compare
three schemes listed in Table 8.1 , where AMVR is proposed in [188] and the
rest are our proposals.
TABLE 8.1. List of three schemes for experimental comparison
full name use what motion vectors use what residues decoder
AMVR !¥iaptive !!!otion weighted average of accurate residue standard compliant
yector resampling 4 original motion by reconstructing
vectors PIE frames
AMEC ~aptive motion weighted average of downsized residues customized, using
with ~mbedding & 4 original motion from 4 orig. blocks embedded info. to
£llstomized decoder vectors and neighbours with embedded motion reconstruct frames
AMES l!daptive motion weighted average of downsized residues standard compliant
with ~mb ed ding & 4 original motion from 4 orig. blocks without using
§tandard decoder vectors and neighbours with embedded motion embedded info.
Fig. 8.3(a) shows the average PSNR for the prior art AMVR, our approach
decoded by standard MPEG decoder (AMES) and by a customized MPEG
decoder (AMEC) . AMEC has a PSNR gain up to 2 dB over AMES due to
the use of embedded subblock motion vectors extracted from DCT residues.
However, when compared with the more complex AMVR, AMEC loses up
to 2 dB. This not only shows a tradeoff among quality, complexity and bit
rate, but also demonstrates the limitation of using the open-loop method
to compute the DCT residues. However, when the original video is encoded
142 8. Data Hiding for Video Communications
32 ................................... . ..................... .
30
DAM VR
(a) .AMEC
24 AMES
22
20
p B p B
Shen et aL customized decoder standard decoder
(AMVR) (AMEC) (AMES)
"." '~~~np%':'
~'r'~;J:
...... :~ ~
'; .
-
.;:~~
..
.'. ~·;:(,t~.
~.;~- -. ",~:~~r",:: "'?'
(b)
~
.~
,
..
.. '" ~ "
.~
'~,.-:">~ . <::
~:~~:~ , ...~~ ~t:}:·::;'. .
,/{;::.
) . / . 'd
.. ~ ..
~ ;-fi~l";'
,",
. .' ;' • f
,
..;.
,
~
-
.
~ ~ ;. -:;:.\:> ~
~
.''-;:~ rr.anlC:23
FIGURE 8.3. Performance comparison of various transcoding methods: (a) PSNR,
and (b) visual effect.
at a high bit rate and is of high quality, the gap of PSNR between using
AMVR and AMEC would be much smaller. In either case, no obvious visual
difference is observed between AMEC and AMVR, as shown in Fig. S.3(b) .
The figure also shows that some artifacts may appear in the video decoded
by a standard decoder (the AMES case) due to the motion compensation
residue not matching the motion vector. This is expected since the core
idea of the proposed scheme is not to optimize the quality obtained by a
standard decoder. Despite of the artifacts, the overall visual quality by a
standard decoder is reasonable with about 30% computation being saved
over the prior art AMVR. In addition, the quality is improvable when a
customized decoder is available.
8.2 Error Concealment and Data Hiding 143
8.2 Error Concealment and Data Hiding
We mentioned earlier that error resilience is important when transmitting
image and video over unreliable networks such as the Internet and the wire-
less channels. The corrupted regions usually take the form of blocks or strips
due to the block coding nature of the popular image/video codecs. Tech-
niques for combating transmission errors have been classified into three
categories [189][190]: (1) sender-side techniques to make the encoded bit
stream resilient to potential errors, (2) receiver-side techniques to conceal
or alleviate the negative effects incurred by errors, and (3) interactive tech-
niques involving both sender and receiver. Among the receiver-side tech-
niques, temporal and spatial interpolations are principal tools to conceal
small corrupted regions based on the inherent temporal and spatial corre-
lation within the multimedia sources. Various interpolation approaches for
concealment have been proposed, each with different tradeoff between com-
plexity and recovered perceptual quality. In the case of spatial interpolation,
namely, to conceal the corrupted blocks from the surrounding uncorrupted
blocks, the simple bilinear or bicubic interpolation has such problems as
blocky artifacts and blurring edges. Edge directed interpolation proposed
in [179] [184] improves the perceptual quality of recovered images/videos by
estimating the major edges in the corrupted blocks and by avoiding interpo-
lation across edges. This basic idea is illustrated in Fig. 8.4, and an example
of concealing lost blocks in Lenna image is shown in Fig. 8.5. Note that the
improvement in visual quality is at an expense of computation complexity.
About 30% computation in the error concealment algorithm of [184] is for
estimating edge information from surrounding blocks [148].
~
edge
••: edge-directed
estimation
interpolation
FIGURE 8.4. Illustration of edge directed interpolation for concealing lost blocks
8.2.1 Related Warks
One of the first associations between error concealment and data hiding
is found in [47], where data hiding is used to store a score in each block
when encoding an image/video to a standard compliant bitstream. The
score indicates the effectiveness that the associated block can be concealed
using the surrounding information. When real-time transcoding (or dynamic
rate shaping) to a lower bit rate is needed, the blocks with high embedded
scores are dropped with high priority. Smooth blocks and blocks with simple
144 8. Data Hiding for Video Communications
(a) original lenna image (b) COl1Upted lenna image (c) conce.led lenna image
~
25% blocks in " checkerboard COl1Upted blocks are concealed
pattern are corrupted via edgc·directed interpolation
FIGURE 8.5. An example of edge directed interpolation for concealing lost blocks
using the techniques in [184].
edges that are inferable from surrounding blocks are assigned high priority
because they can easily be concealed from surrounding blocks at the receiver
side. The score can be embedded in DCT coefficients using simple Type-II
approaches such as the odd-even enforcement, since the robustness against
distortions is not critical here. While the priority could be analyzed on the
fly, embedding into the image/video the resulting scores from pre-analysis
saves computation in real-time transcoding and maintains the decodability
of the image/video by a standard decoder.
The idea of balancing computation became the motivation in a recent
error concealment work [148] . As can be seen from the earlier discussion, a
potential hurdle against adopting sophisticated error concealment scheme
in practical systems is their computational complexity because the com-
putation power and time available at a decoder are usually quite limited
in many applications. It is desirable to shift some computation from the
receiver side to the ·transmitter side. In [148] , the edge information of an
image/video block is proposed to be embedded in a companion block before
transmitting the image over a lossy channel. The embedded edge informa-
tion can be extracted on the receiver side and be used to recover corrupted
blocks without the need of estimating the edges from the neighbors. This
significantly reduces a decoder's burden in inferring the edge information
from neighboring blocks for the purpose of concealment . Furthermore, the
embedded edge is more accurate than the estimation by a concealment mod-
ule. This is because the embedded edge is generated by the sender who has
the knowledge of all blocks, while the receiver has to estimate the edge
information of a lost block using the surrounding blocks only.
8.2 Error Concealment and Data Hiding 145
8.2.2 Proposed Techniques
Two novel uses of data hiding associated with error concealment are dis-
cussed in this book. One is for studying the robustness of data hiding algo-
rithms via attack, namely, to use block concealment as a tool to remove the
embedded watermark. For consistency in organization, we postpone the dis-
cussion to Part III, where the attacks and the countermeasures are addressed
in detail. Another use of data hiding is to embed parity bits to recover a
small number of bit errors in motion vectors. We summarize below the ba-
sic idea of this motion vector protection, which is one of the modules in an
error concealment system for transmitting video over Internet. The detailed
design and experimental results of the system can be found in [150].
As we know, motion estimation and compensation are used in most video
coding standards to reduce the temporal redundancy between video frames.
A key to this redundancy reduction is to estimate and encode motion vec-
tors, which indicate the displacement between blocks in the current frame
and their best matching blocks in a reference frame. During video transmis-
sion, the accuracy of motion vectors is critical in ensuring the high quality
of the received pictures [190]. Song et al. proposed to insert parity bits
across Group of Macroblocks (GOBs) in one frame [45][146], which does
not provide sufficient protection for channels with packet loss. Considering
channels that are subject to bursty errors, we propose to generate frame-
wise parity bits from motion vectors of P-frames and to embed them in the
successive I frames for the purpose of recovering some errors in the received
motion vectors. More specifically, we take MPEG-1 as an example, in which
the encoded motion vectors are differentially Huffman coded [19]. Because
a video is transmitted in packets over networks, the loss of or the errors
in motion vectors can be detected by lower layer protocols and we shall
focus on correcting as many errors as possible here. The error correction is
achieved via parity bits in our work. To compute the parity bits, we first
arrange the coded bits of the motion vectors of each P frame row by row,
and pad zeros to each row to make them equally long. For the convenience
of discussion, we denote the i-th bit of the motion vector in the j-th row
of the k-th P-frame within a group-of-picture (GOP) as ~(k)(i). The parity
bits of the motion vectors in a GOP are computed as:
(8.1)
where ED is the modulo-2 sum, and Pj (i) is the i-th parity bit of the j-th row
for a total of Np P-frames. The modulo-2 sum enables us to correct a single
bit error among N p bits after locating the erroneous motion vectors. We
choose to convey the parity bits to decoder by embedding them in the DCT
coefficients of the successive I-frame. As have been discussed in Section 8.1.3,
using data hiding to convey the parity bits avoids the increase of video bit
rate (hence the increase of the network load) and computationally expensive
rate control.
146 8. Data Hiding for Video Communications
Due to the congestion and other dynamic changes of channel conditions
during video transmission over the Internet and the wireless channels, the
packet loss tends to occur in burst. In addition to protecting motion vectors,
we also make use of interleaving during packetization to reduce the occur-
rence that adjacent blocks get corrupted simultaneously. More specifically,
we take N-block by M-block as a unit, and pack these blocks into N packets
in such a way that the blocks in consecutive packets are not side by side with
each others. The packetization satisfying this condition enables our isolat-
ing lost blocks when several consecutive packets get corrupted or dropped,
therefore allows a better recovery via interpolation using the information
from surrounding blocks.
8.3 Chapter Summary
In this chapter, we discussed the use of data hiding beyond traditional appli-
cations. We use two specific examples, namely, transcoding via downsizing
and error concealment to demonstrate the idea of using data hiding to send
side information. The side information helps achieve better performance by
a customized decoder and in the mean time retains the decodability with
reasonably good quality by a standard-compliant decoder.
Quite a few business models can be supported by the customized decod-
ing framework. For example, the embedded data may consist of both access
control policies and electronic coupons. The embedded coupons aim at en-
couraging and rewarding customers to follow the access control policies.
Data hiding can also be used in conjunction with multimedia encryption
such as those in [191][194] for access control. This is a promising direction
to be explored for the Digital Rights Management (DRM) of multimedia
contents.
Acknowledgement The transcoding and error concealment works were
jointly done with Dr. Peng Yin of Princeton University and Thompson
Multimedia Laboratory.
Part III
Attacks and
Countermeasures
9
Attacks and Countermeasures on
Known Data Hiding Algorithms
An attack on a watermarking system is to obliterate the watermark so
that the original goal of embedding watermarks cannot be achieved. In
general, attacks test the robustness and security of the entire data hiding
system, from the embedding mechanism to the system architecture. For
robust watermarking, attackers' goal is to make the watermark detector
unable to detect the existence of watermark, or create ambiguity to prevent
the detector from making a definite decision. An effective attack does not
have to remove the watermark. For example, jitter attack can cause miss
synchronization and lead to miss detection of watermark even though the
watermark is present in the multimedia signal [159].
It is important to understand that the attacks are meaningful mainly for
applications in competitive environment where incentives to obliterate the
watermarks exist. This includes: (1) ownership protection where an adver-
sary is a pirate, (2) tampering detection where an attacker wants to make
an authentication system consider an unauthorized multimedia signal or a
tampered one as authentic, and (3) copy/access control where an adversary
wants to copy or access a protected multimedia signal in such a way that vi-
olates the specified policies. Applications in a non-competitive environment
such as annotation and enhancing communication performance (Chapter 8)
usually are not subject to attacks, although there may be specific robust-
ness requirements such as surviving certain lossy compression. For appli-
cations in competitive environment, finding effective attacks and analyzing
them play an important role in identifying the weaknesses and limitations
of watermarking schemes, as well as in suggesting directions for further
improvement. More importantly, it helps us to reach a realistic understand-
150 9. Attacks on Known Data Hiding Algorithms
ing regarding what data hiding can do and cannot do for the purposes of
ownership protection, tampering detection, and copy/access control.
A number of attacks as well as some countermeasures have been reported
in the literature [29, 152, 157, 159, 166]. Most of the previous attacks and
the attacks presented in this chapter target at specific types of watermark-
ing schemes, for which analysts have full knowledge of the watermarking
algorithms. The analysts are able to perform experiments with many non-
watermarked, watermarked, and attacked samples, and to observe the re-
sults in real time. In the next chapter, we will discuss attacks in an emulated
competitive environment in which analysts have no knowledge of the wa-
termarking algorithms [156][167].
Three types of attacks and the corresponding countermeasures are dis-
cussed in this chapter. The block replacement attack in Section 9.1 targets
at removing robust watermarks embedded locally in an image. Geometric
attacks on robust image watermarks have been considered as a big challenge
in literature. In Section 9.2, we present a countermeasure by embedding and
detecting watermarks in a domain that is resilient to rotation, scale, and
translation. The double capturing attack in Section 9.3 outlines a general
attack for forging fragile watermarks.
9.1 Block Replacement Attack on Robust
Watermark
Claiming to "provide legal protection that the international copyright com-
munity deemed critical to the safe and efficient exploitation of works on dig-
ital networks", the Digital Millennium Copyright Act (DMCA) [27], signed
into law on October 28, 1998, prohibits "circumvention of technological mea-
sures used to protect copyrighted works" , and prohibits "tampering with the
integrity of copyright management information" 1. While the emphasized
technologies by the DMCA Act are cryptographic approaches such as en-
cryption, deliberately designing tools to circumventing data hiding based
copy / access control mechanisms is also a potentially illegal behavior. Le-
gitimate tools can be exploited by adversaries to build powerful attacks.
This section will discuss such an attack via block concealment techniques
that were originally designed for error resilient image/video transmission.
Before we discuss the details of this attack, we shall give a brief review of
the attacks on robust watermarks proposed in literature.
IThe DMCA act states a number of exceptions including good faith research that is
"necessary to identify and analyze flaws and vulnerabilities" of the encryption technolo-
gies.
9.1 Block Replacement Attack on Robust Watermark 151
9.1.1 Existing Attacks on Robust Watermarks
Several attacks as well as some countermeasures have been reported in the
literature. Forging a fake "original" image for multiple ownership claims [153]
can be thwarted by imposing invertibility requirement on watermarks [153,
154,155] and/or exploiting more than one detection scheme such as blind de-
tection (i.e., detection without the use ofthe original unmarked image) [61].
Collusion attack involves the averaging of multiple copies of the same orig-
inal but having different (independent) markings [163]. When the detector
is in the public domain, it is possible for attackers to systematically obtain
knowledge about the watermarks based on the detector's responses to many
manipulated versions of a watermarked image [152]. Watermarks can also
be attacked by geometric distortion, including rotation, scale, translation,
warping, line dropping/adding, or in conjunction with moderate lowpass
filtering and interpolation [159, 162, 165], but these attacks are not always
effective when the original image is available to perform registration.
9.1.2 Attack via Block Replacement
We proposed a new method that can remove robust invisible watermarks
that are locally inserted. Unlike the existing work in literature, our attack
does not require multiple copies of marked images or impose restrictions in
detection. The idea is to replace image blocks by an interpolated version
using neighbors. As we have seen in Chapter 8, the replacement algorithm
was originally proposed for error concealment (i.e., to recover lost or cor-
rupted blocks) [179] [184] and for low bit rate coding [47]. Blocks surrounding
the lost block are used to infer edge information and then to give an edge-
directed interpolation of the missing block, as illustrated in Fig. 8.4. In the
proposed attack, we keep all blocks on the border of an image untouched,
but replace all other blocks by their interpolated version using the method
in [184]. The replacement is performed on block basis. As illustrated in
Fig. 9.1, we first divide an image into 4 x 4 or 8 x 8 blocks. For each block
except the one on the image border, an interpolated version is obtained from
the neighboring blocks. The original block is then replaced by its interpo-
lated version. The replacement can be done selectively, for example, the
original block will be retained if the interpolated version has too many arti-
facts. We can see that such block-by-block replacement essentially removes
the watermarks originally embedded into each block.
Shown in Fig. 9.2(a) is a watermarked Lenna image using the block-DCT
domain additive spread-spectrum method of [57][58] resulting in a PSNR
of 42.1 dB with reference to the unmarked original. The detection statistic
is 138.5 if the original is used, and is 19.3 if the original is not used. The
detection threshold is generally set between 3 and 6 for a maximal detection
probability with a false alarm probability of 10- 3 to 10- 10 . The marked
image Fig. 9.2(a) is then compressed using JPEG with quality factor 10%.
152 9. Attacks on Known Data Hiding Algorithms
replace
~ original block
....
I ===>
....
I
~ lind L..L..;........&..i.o.I usc surrounding blocks to e timatc
surrounding the center block of intere t via
blocks edge-based interpolation
repeatedly replace each block
FIGURE 9.1. Block diagram of the proposed block replacement attack
The result, shown in Fig. 9.2(b), has a PSNR of 29.40 dB. The two detection
statistics are reduced to 34.96 and 12.40 respectively, both still well above
the threshold in spite of the severe visual distortion. This confirms the claim
that the spread spectrum watermark algorithm survives well under severe
JPEG compression [57][58]. The proposed attack is then applied to the
marked image, and the result, shown in Fig. 9.2(c), has a PSNR of 29.58 dB.
Aside from the softer appearance, few artifacts are observed and the image
is much more pleasing than Fig. 9.2(b). However, the two detection statistics
have been reduced to 6.30 and 4.52, much below those from Fig. 9.2(b) and
comparable with the threshold. The results are summarized in Table 9.1.
(.) mnrked Qrigionl (b) JpEG 0- 10% (c) after proposed ;mack
FIGURE 9.2. A watermarked Lenna image and distorted versions by JPEG and
proposed block replacement attack, respectively: (a) watermarked Lenna image
by Podilchuk-Zeng scheme; (b) watermarked Lenna after JPEG Q = 10% com-
pression, showing blocky artifacts; (c) watermarked Lenna after proposed attack
(with block size 4 x 4), appearing slightly blurred.
9.1 Block Replacement Attack on Robust Watermark 153
TABLE 9.1. Experimental results of block-replacement attack on block-DCT
based spread-spectrum watermarking. Normal detection threshold for the pres-
ence of a watermark is set in the range between 3 and 6.
Fig. 9.2(a) Fig. 9.2(b) Fig. 9.2(c)
wlo distort. JPEG 10% attacked
PSNR 42.12 dB 29.40 dB 29.58 dB
w.r.t. unmarked
detection statistics 138.51 34.96 6.30
with orig. avail.
detection statistics 19.32 12.40 4.52
without orig. avail.
9.1.3 Analysis and Countermeasures
The proposed attack can be viewed as a non-linear low pass filtering. It
makes use of the fact that images, as with other perceptual sources, is
highly correlated in pixel domain and can tolerate distortions within the
just-noticeable-difference. This makes it possible to make good inference
from surrounding samples when small parts of an image are lost. The infor-
mation embedded in a local region will be lost with the removal of pixels
in that region even if the embedding is not done in the pixel domain (e.g.,
the block DCT domain embedding). The proposed attack can be extended
to multi-resolution based watermarking schemes.
There are several ways to reduce the effectiveness of the proposed attack.
The block replacement attack assumes that local information can be inferred
from neighbors. This is true for smooth regions and for regions with edges
extending across several blocks. However, small features that fall in a single
block cannot be inferred. This suggests the use of larger blocks for local
embedding to combat the proposed attack. Embedding watermark in 16 x 16
block DCT coefficients, for example, would be much more difficult to attack
than embedding in 8 x 8 block DCT coefficients. Also, the proposed attacks
are not very effective for low-resolution images in which many key features
are small. In this connection, it should be noted that the block replacement
algorithms in [179] [184] is more effective for edge blocks than texture ones.
Algorithms such as [178] may be used for blocks that are highly textured.
Our experiments have shown that the proposed attack is not effective for
spread spectrum embedding in the DCT transform domain of the entire
image such as the one proposed by Cox et al. in [49]. To distinguish from
the block-DCT based embedding, we shall call this whole-DeT embedding.
Although the detection statistic of an image marked by whole-DCT ap-
proach decreases after the proposed attack, it is still well above threshold, as
shown in Table 9.2. We can interpret the ineffectiveness of the proposed at-
tack against whole-DCT embedding in a couple of ways. From pixel domain
154 9. Attacks on Known Data Hiding Algorithms
TABLE 9.2. Effectiveness comparison of block replacement attack on global and
local spread-spectrum watermarking.
embedding domain block-DCT whole-DCT
PSNR 29.58 dB 29.36 dB
wrt unmarked
detection statistics 6.30 24.74
with orig. avail.
point of view, the watermark embedded in whole DCT domain has been
spread throughout the image in pixel domain, implying that the watermark
information in each block is correlated with that in other blocks. Therefore,
a large part of the watermark information can still be recovered from the
neighborhood during interpolation. From frequency domain point of view,
while both block-DCT and whole-DCT embedding hides watermark in the
low frequency DCT coefficients, the equivalent frequency range are differ-
ent. A I-D illustration of this issue is shown in Fig. 9.3. The DC coefficients
after block-DCT transform correspond to not only the DC but also the low
frequency bands of the entire image. The frequency bands represented by
the lowest AC coefficients in each block are at higher frequency than those
by the block DCs, implying that the "low-band" after block-DCT transform
is not very low. Cox et al. have pointed out the importance of embedding
watermark in perceptually significant spectrum (mostly low and middle fre-
quency bands) to achieve robustness against distortion [49J. Embedding
watermark in higher frequency, such as the block-DCT embedding, reduces
the robustness against low-pass filtering. In fact, the poorer performance of
the block-DCT embedding against linear low-pass filtering than the whole-
DCT has been reported in [47J, and Liang et al. proposed to watermark the
DC image formed by the DC coefficient of each block [54J. The experiments
associated with our proposed attack indicate that the block-DCT embed-
ding is also much more vulnerable to non-linear low-pass filtering than the
whole-DCT embedding because of the higher embedding spectrum. In sum-
mary, while the transform domain embedding with small block size such as
8 x 8 has been considered superior to the whole image transform domain
due to the ease of applying human visual system to finely tune watermark
locally (see Section 6.2.3), our first countermeasure suggests the opposite,
namely, the block size should not be too small.
Motivated by the robustness of whole-DCT embedding, another counter-
measure can be devised by adding redundancy to the watermarks. For exam-
ple, the same watermark can be embedded in a group of four blocks. Thus,
the lost watermark could be partially recovered from neighbors, producing
a higher detection statistic. However, the watermark or hidden data embed-
ded via the Type-II approach (deterministic relationship enforcement), may
still not be recoverable.
9.2 Countermeasures Against Geometric Attacks 155
(x(n)} (K samples per block)
block I block 2 block 3
:li.IWJfOfl.<JXJ:'--
: To T,
f _________:~
.0. TK_l : block
: : DCT
, '
!...- -- -- - -- - - -- -- - -- - - -- ~
X LO Xu Xj,K_l
, EQUIVALENCE
(x(n)}
Frequency-domain
Interpretation
rlIrJrf?lI?I n
OJ
bandpass
filters
down freq. band freq. band
sample approximately approximately
byK represented by represented by
{X,.o.X,.o .... } {X,,'.X~, .... }
Xl,I,X 2,1, ...
FIGURE 9.3. Spectrum analysis of block-based transform: the block DCT trans-
form is equivalent to bandpass filtering followed by downsampling.
More Effective Attacks Various improvements can be done from at-
tacker point of view. For example, an attacker can choose not to replace a
block with the concealed version if the block involves key perceptual features
such as eyes or if the concealed version significantly differs from the original
block. In addition, better interpolation methods for textured regions would
also further improve the perceptual quality of the attack.
9.2 Countermeasures Against Geometric Attacks
via RST Resilient Watermarking
In non-coherent detection where the original unwatermarked image is not
available, the spread-spectrum robust image watermarks generally suffer
from geometric distortion such as rotation, scaling, and translation (RST).
156 9. Attacks on Known Data Hiding Algorithms
This is mainly due to the misalignment introduced by the RST attacks be-
tween the embedded watermark and the reference watermark presented to
a detector. Because spread spectrum watermarks generally have low auto-
correlation at non-zero shift and taking correlation is the popular way to
detect these watermarks, the misalignment will be likely to render low detec-
tion statistics from the popular correlator-type detectors. Among the three
basic geometric distortions, namely, rotation, scaling, and translation, the
resilience against translation is the easiest. Fourier magnitude is known to be
invariant with respect to the shift in time or spatial domain, so embedding
in this domain will be resilient against small shifts 2. On the other hand,
combating rotation and scaling is less straightforward. To illustrate this, we
implemented a spread spectrum embedding similar to the one in [49] but
embedded the watermark in the magnitude of Discrete Fourier Transform
(DFT) coefficients rather than the DCT coefficients. For a 512 x 512 Lenna
image, the PSNR of the watermarked image with respect to the original is
42.88 dB. The detection results on the marked Lenna image are shown in
Table 9.3. We can see that minor rotation and scaling are powerful enough
to render the watermark undetectable, even though the watermark can sur-
vive strong compression and translation with detection statistics well above
the threshold 3.
TABLE 9.3. Detection statistics of spread spectrum watermarking on 512 X 512
Lenna image in DFT magnitude domain under various distortions.
Test Detection Test Detection
condition statistics condition statistics
w/ no distortion 13.54 w/ no wmk 1.31
right shift 5 pixels 13.23 rotate I-degree 1.09
JPEG Q=70% 12.35 scale down by 5% 0.58
JPEG Q=30% 8.30
In this section, we focus on the non-coherent detection in which the orig-
inal unwatermarked image is not available to a watermark detector to per-
form registration with the test image (possibly rotated, scaled, and/or trans-
lated). Estimating the parameters of the RST distortion and then undoing
the distortion is non-trivial. One way to combat RST attacks is to embed
an invisible registration pattern [170] [175] which is known to detectors and
helps to estimate the RST parameters. The weakness of this solution is that
in a competitive environment, an adversary may estimate this pattern by
2Larger shifts are likely to incur cropping, which may reduce the detection statistics.
We will discuss this in the experimental result section.
3The threshold is usually set between 3 and 6, corresponding to a false alarm proba-
bility of 10- 3 to 10- 10 .
9.2 Countermeasures Against Geometric Attacks 157
averaging multiple watermarked images that have the same registration pat-
tern embedded in. He/She can then remove this pattern and apply geomet-
ric distortion. Another class of geometric resilient watermarking techniques
uses implicit synchronization by either adopting salient features of host sig-
nal as reference [171], or embedding watermark in a canonical, normalized
space based on moments [169]. These approaches require that salient fea-
tures be extracted accurately, which is not always possible under distortions
and attacks.
We proposed a new alternative by embedding and detecting spread spec-
trum watermarks in a RST resilient domain. This resilient domain is mo-
tivated by special properties of Fourier transform and is closely related to
Fourier-Mellin transform.
9.2.1 Basic Idea of RST Resilient Watermarking
We consider an image i(x, y) and its RST version i'(x, y) with
i'(x, y) = i(a(x cos a + ysina) - xo, a( -xsina + y cos a) - Yo) (9.1)
where the rotation, scaling, and translation parameters are a, a, and (xo, Yo),
respectively. The magnitudes of the Fourier transform of these two images,
II(fx'/y) 1 and lI'(fx, fy)l, have the following relations
II'(fx, fy)1 = a- 2 II(a- 1 (fx cos a + fy sin a), a- 1 ( - fx sin a + fy cosa))1
(9.2)
This tells us that (1) the Fourier magnitude is invariant to translation, (2)
the Fourier transform of a rotated image is the rotation of Fourier transform
of the image by the same angle, (3) the scaling in spatial domain gives the
inverse scaling in Fourier domain [11]. If we rewrite Eq. 9.2 using log-polar
coordinates, the image scaling results in a translational shift along the log
radius axis and the image rotation results in a cyclical shift along the angle
axis. That is,
or
II'(p, 0)1 = a- 2 II (p-Ioga,0-a)1 (9.4)
where the coordinate transform is
{ fx = e cosO
P
(9.5)
fy = eP sinO
We define g(O) to be a I-D projection of II(p,O)1 such that
g(O) = I)og(II(p, O)l) (9.6)
P
158 9. Attacks on Known Data Hiding Algorithms
where a summation is used instead of an integration due to the discrete
nature of p for the DFT of an digital image. We find it beneficial to add the
two halves of g(O) together, obtaining
(9.7)
where Of E [00, ... ,900). The reason will be explained in Section 9.2.3.
Clearly, gl (0) is invariant to translation and scaling except by a mul-
tiplicative factor that does not affect correlator-type detection. Rotation
results in a circular shift of gl (0), which can be handled by an exhaustive
search if the angle 0 is quantized to a finite number of degrees.
9.2.2 Embedding and Detection Algorithms
The basic idea presented above outlines a RST resilient watermark detector.
The detector first extracts the 1-D signal gl (0) from a test image, then
performs correlation-type detection between this 1-D signal and an input
watermark 4. The basic algorithm for watermark detection is summarized
as follows:
1. Compute a discrete log-polar Fourier transform of the input image.
This can be thought of as an array of K rows by N columns, in which
each row corresponds to a value of p, and each column corresponds to
a value of O.
2. Sum up the logs of all the values in each column, and add the result of
the column j to the result of the column j + N /2 (j = 0, ... , ( ~ - 1))
to obtain an invariant descriptor Q, in which
(9.8)
where OJ is the angle that corresponds to the column j in the discrete
log-polar Fourier transform matrix.
3. Compute the correlation coefficient D between Q and the input wa-
termark vector ~, as
(9.9)
4. If D is greater than a threshold T, it indicates that the watermark is
present. Otherwise, the watermark is absent.
4More sophisticated detection corresponding to a more realistic modelling of noise can
be adopted to achieve higher detection performance. Here for proof of concept, we use
correlation detector in our experiments.
9.2 Countermeasures Against Geometric Attacks 159
Corresponding to the watermark detection method, we can construct a
watermark embedding algorithm according to the methodology described
in [51]. In that paper, watermarking is cast as a communication problem
with side information at the transmitter, which is a configuration studied
by Shannon [82]. The side information in watermarking is the original un-
watermarked image that is known to the embedder. In our problem, the
embedder need to manipulate the Fourier coefficients in Cartesian coordi-
nate to embed a selected watermark into the 1-D signal g1 (B). In particular,
we change the image coefficients in an iterative way such that the 1-D signal
of the watermarked image will have high correlation with the watermark to
be embedded. The detailed embedding follows the three steps in [51]:
1. Apply the same signal-extraction process to the unwatermarked image
as will be applied by the detector, thus obtaining an extracted vector,
Q. In our case, this means computing g1 (B).
2. Use a mixing function, S = !(Q,1Q) , to obtain a mixture between Q
and the desired watermark vector, 1Q. At present, our mixing function
simply computes a weighted average of 1Q and Q, which is a convenient
but sub-optimal approach. More sophisticated mixing methods, for
example, those examined in [56], may be used.
3. Modify the original image so that when the signal-extraction process
is applied to it, the result will be §. instead of Q. This process is im-
plemented as follows:
(a) Modify all the values in column j of the log-polar Fourier trans-
form so that their logs sum to Sj instead of Vj' This could be
done, for example, by adding (Sj -vj)/K to each ofthe K values
in column j. Care must be taken to preserve the symmetry of
DFT coefficients.
(b) Invert the log-polar resampling of the Fourier magnitudes, thus
obtaining a modified, Cartesian Fourier magnitude.
(c) The complex terms of the original Fourier transform are scaled
to have the new magnitudes found in the modified Fourier trans-
form.
(d) The inverse Fourier transform is applied to obtain the water-
marked image.
Unfortunately, there is inherent instability in inverting the log-polar re-
sampling of the Fourier magnitude (Step 3b). We therefore approximate this
step with an iterative method in which a local inversion of the interpolation
function is used for the resampling [173].
160 9. Attacks on Known Data Hiding Algorithms
9.2.3 Implementation Issues
A number of problems arise during the implementation of the watermarking
algorithms proposed in last section. We summarize the handling of a few
issues below. More detailed discussion can be found in [173].
DFT, Rotation, and Interpolation In Section 9.2.1, we present the
basic ideas of RST resilient watermarking in terms of continuous Fourier
transform. In practice, we have to deal with discrete scenarios both in spa-
tial domain and in frequency domain. We would also like to take advantage
of the fast algorithms for computing the DFT. Conceptually, the DFT of
an image is obtained by taking a 2-D Discrete Time Fourier Transform
(DTFT) of a tiled version of the image, as shown in Fig. 9.4(a). Stone et
al. [183] have noted that the tiling imposes an inherent problem to any al-
gorithm that relies on the rotational properties of the Fourier transform.
This is because when the image is rotated, the rectilinear tiling grid is not
rotated along with it. Thus, the DFT of a rotated image is not the rotated
DFT of the image. The problem is illustrated in Fig. 9.4(b) and (c). While
more sophisticated approaches are possible, such as directly computing the
log-polar Fourier transform without using Cartesian DFT as an intermedi-
ate step, the computational complexity is generally high. In our work, we
approximate the Fourier transform of an image in log-polar coordinate by
resampling the image DFT in Cartesian coordinate using a log-polar grid.
In general, interpolation has to be performed on Fourier coefficients during
both embedding and detection. The interpolation is needed not only when
obtaining log-polar sampling points from the Cartesian sampling points, but
also when obtaining Cartesian sampling points from the log-polar sampling
points in the embedding step 3b. To obtain dense sampling grids that would
allow inexpensive interpolation such as bilinear interpolation, we pad the
image with zeros before performing DFT. Zero-padding also adds larger
separation between the implicit tiles in the spatial domain, alleviating the
distortions shown in Fig. 9.4.
*** 1: 1: 1:
*** 1: 1: 1:
*** 1: 1: 1:
Rotate before tiling
FIGURE 9.4. Rectilinear tiling and image rotation
9.2 Countermeasures Against Geometric Attacks 161
FIGURE 9.5. An image and the magnitude of its 2-D Discrete Fourier Transform.
Coefficients near the DC and low frequency band along horizontal and vertical
directions exhibit large magnitude, which is refered as "cross artifacts" .
FIGURE 9.6. A rotated image with zero padding and the magnitude of its 2-D
DFT. The "cross artifacts" also rotates by the same degree in this example.
162 9. Attacks on Known Data Hiding Algorithms
Cross Artifacts in Spectrum The rectangular boundary of an image
is known to have cross artifacts in the image's Fourier spectrum (Fig. 9.5).
The coefficients along the cross have much larger magnitude than the others.
This is partly due to the dominant horizontal and vertical image features,
as well as to the discontinuity at horizontal and vertical edges of an image
by the implicit tiling. When zero padding is used for obtaining fine sam-
pling grids and/or for image rotation without cropping, the cross artifacts,
possibly rotated, are more significant (Fig. 9.6). The artifacts may also be
asymmetric if an image has much larger energy in some directions than
in other directions. For example, images with significant vertical structures
such as trees and buildings yields more energy in the horizontal frequency,
while images with strong horizontal structures such as seascape yields more
energy in vertical frequencies (Fig. 9.7). Our current solution to this prob-
lem is to simply ignore the bumps in the extracted 1-D signal by dropping a
neighborhood around each of the two highest-valued elements. We also di-
vide the extracted signal g(B) into two halves and add then together (i.e., to
use gl (B) in Eq. 9.7 instead of g( B) in Eq. 9.6) to deal with the asymmetry.
Alternative solutions include blurring of the image edges [181] or more gen-
erally, applying windowing operation. These solutions require modification
to the watermark embedder to include both forward and inverse operations,
and have been left for future work.
Coefficient Dynamic Range and Extreme Frequencies Because of
the large dynamic range of the magnitude of the DFT coefficients, the low
frequency coefficients can be overwhelming. Furthermore, the lowest fre-
quencies and highest frequencies are usually not reliable for watermarking
because of the strong host interference for low frequencies and the vul-
nerability under distortion and attacks for high frequencies. Our current
solution is to neglect the extreme frequency bands and not to embed water-
mark there. A better solution is to embed watermark in all frequencies and
weigh different bands differently according to the reliability, as discussed in
Section 6.2.1. We also use the log of magnitude rather than the magnitude
to obtain the 1-D function g(B) (Eq. 9.6).
Whitening Before Detection For natural images, gl (B) is likely to vary
smoothly as a function of B. This indicates that the noise term in blind
watermark detection is highly correlated and a simple correlation detection
is not optimal for such noise. Assuming that the noise is colored gaussian,
the optimal detection according to the detection theory should whiten the
noise first, then perform correlation detection. This idea has been discussed
in [52], showing improvement in the watermark detection. Thus, in the de-
tection stage of our work, we apply a whitening filter to both the extracted
signal and the watermark being tested before computing the correlation co-
efficients. The whitening filter was designed to decorrelate the elements of
the 1-D signals extracted from natural images and was derived from 10,000
9.2 Countermeasures Against Geometric Attacks 163
(a)
(b)
(c) (d)
FIGURE 9.7. Images with dominant structures and their 2-D DFTs. An image
with dominant vertical structure (a) yields strong magnitude along horizontal
frequencies (b) . An image with dominant horizontal structure (c) yields strong
magnitude along vertical frequencies (d).
164 9. Attacks on Known Data Hiding Algorithms
images in [176]. These training images were not used in the subsequent
experiments.
9.2.4 Experimental Results
The following results were obtained by extracting a vector gl((J) of length
90 from an image and neglecting the 16 samples surrounding the peak that
corresponds to the DFT cross artifacts. This leaves a 1-D descriptor of 74
samples long. The detection process involves comparing the watermark with
all 90 cyclic shifts of the extracted descriptor. In this section we examine
the fidelity, the false alarm behavior (also called false positive), and the
robustness against RST distortions and JPEG compression.
Fidelity The tradeoff between fidelity and robustness is controlled by
adjusting the relative weighting used in the mixing of the watermark signal
and the signal extracted from the original image (see Section 9.2.2). As the
relative weight assigned to the watermark signal is increased, the strength
of the embedded watermark is increased at the expense of lower fidelity.
We have chosen the weights empirically, yielding an average signal-to-noise
ratio of about 40dB 5. For simplicity, the same weights are used for different
images in our experiments. Fig. 9.8 shows a histogram of the signal-to-noise
ratio obtained from watermarking 2000 images. Fig. 9.9 shows an original
image, the watermarked version, and their difference.
500
40.
A
400
350
1300
~'5250
f
1 200
J \
150
100
50
,. 20 30 40
SNR (dB)
50 60 70
FIGURE 9.8. SNR histogram of watermarked images
We notice that due to the embedding in a whole image transform domain,
the watermark strength cannot be tuned for each local region. Thus, if the
image contains homogeneous texture, the watermark can be well hidden.
SHere the "signal" is the image, and the "noise" is the watermark.
9.2 Countermeasures Against Geometric Attacks 165
However, if an image is highly non-homogeneous containing widely varying
textures and smooth regions, the mark could become visible in some places
unless the weighting is significantly attenuated. Improving image quality
in non-homogeneous images requires modifying the algorithm to shape the
watermark according to local textures and in the mean time preserving the
RST resilience. This has been left for future work.
FIGURE 9.9. Image watermarked by the proposed RST resilient watermarking
algorithm: (a) original image, (b) watermarked image, (c) the amplified differ-
ence between original and watermarked, with gray indicating no difference and
white/black indicating large difference.
False Alarm We performed a study of false alarm probabilities under
different thresholds using 10,000 images. A false alarm or false positive oc-
curs when the detector incorrectly concludes that an unwatermarked image
contains a given watermark. The probability of false alarm is defined as
Pfp = P{Dmax > T} = P{(Do > T) or (Dl > T) or ... (D89 > Tn
(9.10)
166 9. Attacks on Known Data Hiding Algorithms
where T is the detection threshold, Dmax is the maximum detection value
from all 90 cyclic shifts examined (Do, ... , D89 ) when running the detector
on a randomly selected, unwatermarked image. This probability is first es-
timated experimentally by applying the detector to 10,000 unwatermarked
images from [176], each image being tested by 10 different spread spec-
trum watermarks. The experimental false alarm probability is plotted in
solid lines in Fig. 9.10 with each trace corresponding to one of the 10 wa-
termarks. We also apply a theoretical model in [55] to estimate the false
alarm probability, especially for the threshold T greater than 0.55 because
we obtain no detection values above 0.55 in the experiment. This theoretical
estimate is indicated by dotted lines in Fig. 9.10.
OetectionvaluBsinunwalefmarked!1T18Q8S
1'""
!""
'800
10-80 0.1 D.2 0.3 0.4
-,
0.5
FIGURE 9.lO. [alb] Detection results for 10 watermarks in lO,OOO unwatermarked
images: (a) distribution of detection statistics Dmax; and (b) false alarm proba-
bility. Each solid trace corresponds to the result of one of the lO watermarks, and
the dotted line in (b) represents theoretical estimates.
Robustness Robustness tests against geometric distortions as well as
JPEG compression have been performed on 2000 images. Seven geometric
distortions illustrated in Fig. 9.11 are examined: rotation with and without
cropping (f and b), scaling up with and without cropping (g and c), scaling
down (h), and translation with and without cropping (i and d). In order
to isolate the RST distortion from cropping, the images have been padded
with gray (shown in Fig. 9.11(a)) so that none of the RST testing shown
in Fig. 9.11(b)-(d) causes the image data to be cropped. The embedder
has been applied to these expanded images and then the padding in the
watermarked image is replaced with unwatermarked gray padding prior to
the distortion and detection.
For each of the seven geometric attacks, we plot a set of histograms of
the detection statistics and ROC curves 6 for several distortion parameters
6 A receiver operating characteristic (ROC) curve describes the probability of correct
detection versus the probability of false alarm [7].
9.2 Countermeasures Against Geometric Attacks 167
(a) (b) (c) (d)
(e) (f) (g) (h) (i)
FIGURE 9.11. Geometric attacks tested in our experiments: (e) and (a) are the
original and padded original, respectively; (b)-(d) the rotation, upscale, and trans-
lation without cropping, respectively; and (f)-(i) the rotation, upscale, downscale,
and translation with cropping, respectively.
in Fig. 9.12- 9.15. The detection performance prior to attack (i.e., no dis-
tortion) is shown in dashed lines for comparison. These results demonstrate
that when no cropping is involved, the proposed watermarking algorithm
exhibits very good resilience to rotation, scale, and translation with only
small decrease of detection statistics compared with the case without dis-
tortion. These are shown in Fig. 9.12(a)(b), Fig. 9.13(a)(b), Fig. 9.14, and
Fig. 9.15(a)(b) . When RST distortion is accompanied with cropping, the
detection performance degrades: the larger the cropping is, the more nega-
tive impact on performance there is. These are shown in Fig. 9.12(c)(d),
Fig. 9.13(c)(d) , and Fig. 9.15(c)(d). The performance difference toward
cropping is expected because our algorithm has not been explicitly designed
to withstand cropping. More detailed description of our experiment setup
can be found in [172] [173].
We also test the robustness of the proposed algorithm against JPEG
compression, as surviving common image processing besides geometric dis-
tortions is important in practical applications. We tested JPEG compres-
sion at quality factor (QF) of 90, 80, and 70, and the results are shown in
Fig. 9.16. We can see that the likelihood of detection decreases with the
amount of compression noise introduced and that the amount of this de-
crease is dependent on the false alarm probability Pfp' For relatively high
Pfp = 10- 3 , the current method is extremely robust to JPEG compression
at the qualities tested. At more restrictive false alarm probabilities, for ex-
ample, Pfp = 10- 8 , the moderate JPEG compression at quality factor 70
still yields an acceptable detection probability of 88%.
168 9. Attacks on Known Data Hiding Algorithms
0"'
I··~ .
1°·
~.1 0.'
...
,~'.:-.,,-=--,,".;;-----,,~.----',~
PrcbabIlllyolfalaedatactlan
.,
r~ j ...
r..,.
~O.5
fo •
.. ----,-':oo·'------,J,,·
,~·,::-·,,=="=--,,:'m·
ProbabIlyoflalsedllec:llon
FIGURE 9.12. ~II~ Detection results after rotation 4°, 8°, 30°, and 45°. Showing
here are the histogram of detection statistics (a) and detection ROC (b) for rota-
tion without cropping, and the histogram (c) and the ROC (d) for rotation with
cropping. Dashed lines represent the detection prior to attack.
9.2.5 Concluding Remarks on RST Resilient Watermarking
In summary, we proposed a solution to the common robustness problem
against rotation, scale, and translation. While the solution is related to ear-
lier proposals in the pattern recognition literature regarding the invariants
of the Fourier-Mellin transform, we do not explicitly create a truly RST
invariant signal, which has many implementation difficulties [174]. Instead,
we create a 1-D signal that changes in a trivial manner under rotation,
scale, and translation, and handle a number of important implementation
issues. Our experiments have shown that the proposed watermarking algo-
rithm is robust to rotation, scale, and translation. It also survives moderate
JPEG compression, although the resilience is lower than non-RST resilient
watermarking that embeds a watermark directly in Fourier or DCT coef-
ficients. This is partly because the human visual models in Fourier, DCT,
and block-wise transform domain are much better studied than our RST
resilient domain, which enables hiding stronger watermark in those domain
without introducing artifacts. Quite a few simplifications have been made
in our design, and they can be improved in future research.
9.2 Countermeasures Against Geometric Attacks 169
10-" 10'"
Probability oIla1e8 cleleclion
I'·'
i
i~08
lo.os
,,'
FIGURE 9.13. ~ Detection results after upscaling 5%, 10%, 15%, and 20%.
Showing here are the histogram of detection statistics (a) and detection ROC
(b) for upscaling without cropping, and the histogram (c) and the ROC (d) for
upscaling with cropping. Dashed lines represent the detection prior to attack.
"
I"~O.5
~
I"
FIGURE 9.14. [alb] Detection results after down scaling 5%, 10%, 15%, and 20%.
Showing here are the histogram of detection statistics (a) and detection ROC (b).
Dashed lines represent the detection prior to attack.
170 9. Attacks on Known Data Hiding Algorithms
10·'0 10"
Probabilltyolfaisedelec!loo
r
r.
,
10.,0 10"
Probabilt\y01 Ialsedelection
FIGURE 9.15. :II~ Detection results after translation of 5%, 10%, and 15% of the
image size. Showing here are the histogram of detection statistics (a) and detection
ROC (b) for translation without cropping, and the histogram (c) and the ROC
(d) for translation with cropping. Dashed lines represent the detection prior to
attack.
1·~'--~--~~'~5~~'.'~~"--~~~~ ~~"~~----~"'''--------~'~~------~'if
DstactIonvalua(maxof90comtIa1Jons) Probabllllyaff_dIItectIon
FIGURE 9.16. [alb] Detection results after JPEG compression with quality factor
90, 80, and 70. Showing here are the histogram of detection statistics (a) and
detection ROC (b). Dashed lines represent the detection prior to attack.
9.3 Double Capturing Attack on Authentication Watermark 171
9.3 Double Capturing Attack on Authentication
Watermark
In Chapter 7, we discussed using watermark, usually fragile or semi-fragile
against distortion, to detect tampering. These authentication watermarks
are inserted either in pixel domain [139], or in transform domain [72, 125,
137]. An effective attack on authentication watermark attempts to build an
altered or unauthorized image for which the detector will still regard it as an
untampered image. These attacks are quite different from those for robust
watermarks. Holliman et al. studied several forgery attacks in [158], where
the main idea of the attacks is to replace each media component (such as
a block, a pixel, or a coefficient) with one of the candidates collected from
many watermarked images so that the replacement contains a valid water-
mark indicating authenticity and in the mean time meaningful semantic
changes can be made. These attacks are effective if the watermark is inde-
pendent of the image content (e.g., the same watermark is embedded in each
image) and/or independent of the locations in which it is inserted (e.g., the
same watermark is embedded in each block). As discussed in Section 7.4,
carefully designing what data to embed and introducing dependency with
cover image are effective countermeasures against these attacks. In this sec-
tion, we propose a new attack that does not require the use of multiple
marked images.
9.3.1 Proposed Attack
Consider the scenario that an image containing an authentication water-
mark has been tampered. The modified marked image is then captured
either by scanning or by a digital camera, and a new fragile watermark
inserted by the watermark module in the capturing device. Since the cap-
turing process can destroy almost all original fragile watermarks, the new
image will only bear the new fragile watermark, hence will be regarded as
the authentic. The general approach is therefore " fragile-watermarking ---)
editing ---) fragile-watermarking ... " (Fig. 9.17), which in some sense is anal-
ogous to the multiple ownership claims in robust watermarking [153]. We
shall call this type of attacks double capturing attack.
~ ~
~ iP~i~ c::::::> _
I" caplUrc alter
~ 111~1
2"" captu re
FIGURE 9.17. Attack on authentication watermarking via double capturing
172 9. Attacks on Known Data Hiding Algorithms
Double capturing attack touches a fundamental aspect of image authen-
tication, i.e., the authenticity is always relative with respect to a reference.
More specifically, fragile watermarking can only detect alteration after the
embedding, but can tell nothing about the authenticity before embedding.
9.3.2 Countermeasures Against Proposed Attack
For specific application such as the digital camera case, we may use addi-
tional features, such as the distance between the lens and the center target
object, as part of the embedded data to combat the proposed attack. This is
because the focal length when capturing a real scene are different from those
when capturing a tampered image from a printed copy or computer mon-
itors. The focal length may be readily obtained from a camera's focusing
mechanism. However, with the progress of visualization tools including the
development of high quality digital projectors and huge display wall [182]'
such a solution will eventually have limitation.
~_r===>
1" capture alter
c:::==>!ijJ
2nd capture
{~FI' ~RI} ~RI detectable, {~RI' ~F.1' ~R2 }
detectable ~FI altered detectable
I ~F i (i-th fragile wmk) ~R i (i-th robu t wmk)
FIGURE 9.18. Countermeasure against the proposed double capturing attack by
embedding both robust and authentication (fragile) watermarks.
Another countermeasure is to insert both fragile and robust watermarks
in an image, as shown in Fig. 9.18. This double watermarking can not only
protect both the integrity and ownership of an image [137], but also par-
tially solve the above problem since the double captured image contains
two robust watermarks while the single captured image contains only one.
In practice, if every watermarking system for authentication purpose (such
as those in digital cameras) also embeds one robust watermark which is
randomly selected from M orthogonal candidates in each captured image,
we can determine whether multiple capturing occurs by checking how many
robust watermarks are in the test image. The probability of not being able
to find multiple robust watermarks after double capturing (i.e., the two
robust watermarks inserted by two captures are identical) is 11M, i.e., in-
versely proportional to the number of candidate watermarks, hence it is
small for large M. This approach have been implemented as double water-
9.3 Double Capturing Attack on Authentication Watermark 173
marking in [137] 7, which combines the transform-domain authentication
scheme in Chapter 7 with a robust spread spectrum watermarking scheme
by embedding the robust watermark first, followed by embedding the au-
thentication watermark. This countermeasure can also be implemented via
the multi-level data hiding in Chapter 6.
Acknowledgement Dr. W. Zeng provided the source code of the error
concealment algorithm in [184]. The work of RST watermarking in Sec-
tion 9.2 was jointly performed with Drs. 1. Cox, J. Bloom, M. Miller, C-Y.
Lin, and Y-M. Lui while the first author was with NEC Research Institute
and Signafy, Inc ..
7The original use of double watermarking in [137] is to protect both the ownership
and the integrity of an image, but the implementation is directly applicable as a coun-
termeasure against the proposed attack.
10
Attacks on Unknown Data Hiding
Algorithms
In the last chapter, we discussed watermark attacks with the embedding and
detection algorithms known to analysts, which is the case for most attacks
studied in the literature. The public challenges organized by the Secure
Digital Music Initiative (SDMI) in Fall 2000 provided an opportunities for
researchers to study attacks under an emulated competitive environment.
We will discuss in this chapter attacks that have appeared to be successful
on the watermarking schemes under SDMI's consideration at that time. We
shall point out weaknesses of those watermark schemes and propose some
directions of improvements. We will also discuss a few general approaches
that could be used by an attacker in a real competitive environment, thus
setting a framework for studying the robustness and security of data hiding
systems.
10.1 Introduction
Secure Digital Music Initiative (SDMI) is an international consortium that
has been developing open technology specifications aiming at protecting the
playing, storing, and distributing of digital music [160]. Imperceptible digi-
tal watermarking has been proposed as key elements in these system. Upon
detection, the watermarks may direct certain actions to be taken, for exam-
ple, to permit or deny recording. A system may incorporate a combination
of robust and fragile watermarks. Robust watermarks can survive common
signal processing and attacks, and are crucial for ensuring the proper func-
tioning of the entire system. The fragile watermarks may be used to indicate
176 10. Attacks on Unknown Data Hiding Algorithms
whether the audio has experienced certain processing such as MP3 lossy
compression [161]. The SDMI watermarks are considered as public water-
marks. That is, the detection does not use the original unwatermarked copy
(Le., blind detection) , and a single or a set of secret keys for detecting the
watermarks are encapsulated in all publicly available detection devices.
In early September 2000, SDMI announced a three-week public challenge
for its Phase-II screening, inviting the public to evaluate the attack resis-
tance for four watermark technologies (A, B, C, F) and two other technolo-
gies (D, E). In the following, we summarize the attacks and analysis on the
four watermark technologies.
Watennark
Sample-l (special signal) Sample-Z
(origillal) (marked)
~
A llY
Marked Audio
~~ __ I.S" __ "Watermark
Found"
I
I Sample-3 GOAL:
I (marked) Sample-4
I
I
I
III ---..r-;;:;:~~ (attacked)
Attack
"Watermark
NOT Found"
I
I
FIGURE 10.1. Illustration of SDMI attack problem. For each of the four water-
mark challenges, Samples 1 rv 3 are provided by SDMI. Sample 4 is generated by
participants in the challenge and submitted to SDMI oracle for testing.
10.1.1 SDMI Attack Setup
In this challenge, the watermark embedding and detection algorithms are
not known to the public. Limited information is available only through
oracle submissions. After each submission, the detection is performed by
the SDMI staff and the result is sent back with a response time of about
4 ""' 12 hours. For each of the four challenges, SDMI has provided three
audio samples, as illustrated in Fig. 10.1. They are:
• sampl?wav (original audio with no watermark)
• samp2?wav (samp1?wav watermarked by Technology-?)
10.1 Introduction 177
• samp3?wav (a different audio watermarked by Technology-?)
where the substitution symbol "?" stands for each of the four challenges: "a",
"b", "c", or "f". All audio samples are 2-minute long, sampled at 44.1 kHz
with 16-bit precision. The audio contents are mostly popular music. Sample-
1 for all four technologies are identical, while sample-3 are all different.
A participant of this challenge generates an attacked audio file sample-4
from sample-3, then uploads it to SDMI's oracle for testing. The detection
response is binary, i.e., either "possibly successful" or "unsuccessful". Ac-
cording to SDMI's emails, a "possibly successful" attack must render the
detector unable to find the watermark, while retaining the auditory quality
comparable to the original one (sample-3). This indicates that a successful
attack should sits in the region IV of Fig. 10.2. Interestingly, in the un-
successful case, there is no indication whether the detector can still find
watermark (region I and II of Fig. 10.2) or the detector can no longer find
watermark but the auditory quality is considered unsatisfactory (region III
of Fig. 10.2). For convenience, we shall denote the four pieces of audio as
8 1 , 8 2 , 8 3 , and 8 4 .
watermark presence
region with acceptable
sound quality
II
watermark
detectability
threshold
•
low III
B
ideal attack region
low "<E--i--3>~ high
perceptual
quality
FIGURE 10.2. Illustration of watermark detect ability and perceptual quality
10.1.2 Comments on Attack Setup
The SDMI public challenge presents an emulated competitive environment,
providing attackers with a limited amount of information and restricted ac-
cess to watermark detectors in a very short time frame. The task is more
178 10. Attacks on Unknown Data Hiding Algorithms
difficult than what can be found in reality. First, in real world, a watermark
detector encapsulated in a compliant device will be available to an attacker
for unlimited uses, and the detector's response time will be instantaneous
rather than hours. Second, a user of the real system will be able to distin-
guish whether or not a detector is able to find watermarks, regardless of the
audio quality. These two aspects would enable an attacker polling the detec-
tor with different input and obtaining the corresponding output, which in
turn provides a large amount of useful information for attacks. Furthermore,
the SDMI business model allows a user to pass a piece of non-SDMI music
that does not have watermark embedded through an SDMI admission de-
vice to make it SDMI-compliant with watermarks embedded in. This implies
that a non-trivial number of original-watermarked audio pairs rather than
a single pair are likely to be available to an attacker in real world. As can
be seen in the next section, these pairs provide valuable information regard-
ing how watermarks are embedded and the information can be exploited
in attacks. One should also note that the perceptual quality imposed on
embedding and on attacks are different in reality. The quality criterion for
embedding is much higher because part of the commercial value of a piece
of audio is determined by the sound quality and in many situations it has
to meet the most critical demand among a highly diversified audience (from
easy listening by the general public to professional listening by experts). On
the other hand, the sound quality criterion for attacks only need to satisfy
a less demanding audience who are willing to tolerate slightly poor quality
if for free listening.
The setup also suggests that the SDMI challenge emphasized on evaluat-
ing the effectiveness ofrobust watermark in each technology and did not take
much consideration on the fragile watermark. Referring to SDMI's business
model, to enforce a copy control policy that allows no MP3 compression on
a piece of music prior to the admission to an SDMI compliant device, the
robust watermark embedded in the music would convey to the device this
policy while the fragile watermark will be used to detect whether the music
has experienced compression or not. If the bits in the fragile watermark
are designed to be a pre-determined secret pattern and are independent of
the host audio, an attacker can obliterate the above policy by restoring a
fragile watermark after performing MP3 compression. This attack is likely
to introduce less perceptual distortion than removing a robust watermark,
therefore, should be given sufficient consideration. The fragile watermark-
ing can be formulated as an authentication problem, for which the attacks
and counter-attacks can be studied similar to the material in Chapter 7 and
Chapter 9. In the following, we first report our attacks and analysis on the
robust watermark in SDMI challenge, then briefly discuss issues related to
the fragile watermark.
10.2 Attacks and Analysis on SDMI Robust Watermarks 179
10.2 Attacks and Analysis on SDMI Robust
Watermarks
In this section, we first explain a general framework for tackling the attack
problem. We then use two different successful attacks on Watermark C
as examples to demonstrate our attack strategies, address implementation
issues, and present analysis in detail. For completeness, the attacks for the
other three watermark techniques A, B, and F are also briefly explained.
10.2.1 General Approaches to Attacks
An attacker may take one of the following three general approaches to tackle
the problem:
(Type-I) exploiting the design weakness via blind attack,
(Type-2) exploring the embedding mechanism from {SI, S2}, the known
original-watermarked pairs, or from the watermarked signal {S3} alone, and
(Type-3) a combination of the two.
Type-1 attacks are said to be blind in the sense that they do not rely on
any understanding of embedding mechanism or the special properties held
by watermarked signals. This includes commonly used robustness tests, such
as compression, time-domain jittering, pitch change, resampling at different
rate, D I A-AID conversion, and noise addition [164]. The counter-attack
strategy for such blind attacks is to find as many weaknesses as possible
and to correct them. A good design, therefore, should have at least covered
the typical robustness tests and their combinations. One of our attacks for
Watermark-C and our attack for Watermark-F are blind attacks.
Type-2 attacks are designed using the knowledge about the embedding
mechanism. Such knowledge, even if not available at the start, can be ob-
tained by studying the input-output response of the embedding system.
For example, if we find the difference between Sl and S2 is a small signal
around certain frequency, we may design an attack to distort S3 over the
corresponding frequency range. A few of our attacks belong to this category.
This type of attack is analogous to the plaintext attack or ciphertext attack
in cryptanalysis 1 [25]. The differences are: (1) signal processing analysis
replaces the cryptanalytic tools in creating watermark attacks, and (2) the
goal of watermark attacks is to render detector unable to detect the wa-
termarks, instead of "cracking codes". The useful signal processing tools
include the time-domain and frequency-domain differences, the frequency
1 Plaintext attack refers to deducing the encryption key or decrypting new cipher
texts encrypted with the same key, based on the cipher text of several messages and their
corresponding plaintext. Ciphertext attack only uses the knowledge of the cipher text of
several messages.
180 10. Attacks on Unknown Data Hiding Algorithms
response, the auto- and cross-correlation, and the cepstrum analysis [22J.
We also note that the original and watermarked signals are not easily avail-
able simultaneously to the public in some watermarking or data hiding
applications, e.g., watermarked-based authentication or DVD video water-
marking system. Hence, Type-2 attacks may not be a major concern in
those cases. But in SDMI applications where an unwatermarked music may
be "admitted" into SDMI domain by embedding a watermark, any success-
ful watermarking design has to take Type-2 attacks into consideration. One
possible counter-attack strategy is to intentionally wipe off the otherwise
distinct "signature" of a particular embedding. Some obscuring processes
may reduce the robustness against blind attacks if the obscuring distorts
the embedded watermarks, showing a tradeoff among robustness against
various attacks.
Because it is not always possible to find clear clues about embedding from
a limited number of original-watermarked pairs, especially when the "wipe-
off" is applied, attacks can be designed by combining Type-1 and Type-2
attacks.
10.2.2 Attacks on Watermark-C
We have proposed two different attacks on Watermark-C. Attack-Cl ex-
plores the weakness of Watermark-C under pitch change. Attack-C2 is
based on observing the difference between original and watermarked signal
{8I, 8 2 }. Both attacks were confirmed as successful by the SDMI oracle.
Observations from Samples of Watermark-C By taking the differ-
ence of samp1c.wav and samp2c.wav, bursts of narrow band signal are ob-
served, as shown in Fig. 10.3. These bursts appear to be around 1350 Hz.
Attack-Cl Attack-C1 accelerates audio samples by a small amount, which
in turn changes the pitch. Blind attacks of 3% pitch increase have been ap-
plied to all four watermark proposals, and SDMI detectors indicated that
they are effective to Watermark C. The relations between the input and
output time index of this speed-up process is illustrated in Fig. 10.4, along
with several other time-domain jittering/warping that we have studied dur-
ing the challenge.
One implementation we used is to upsample the audio by M times fol-
lowed by lowpass filtering and downsampling by N times, giving an overall
resampling rate of M/N. The original sampling frequency of Fs is changed
to M/N . Fs. The resampled audio is then played or stored with the same
sampling rate as before, i.e., Fs. The entire process changes the pitch by
a fraction of (N - M)/M. A precise spectrum interpretation of this can
be obtained based on multi-rate signal processing theory [17]. For sampling
rate conversion with M/N > 1, the spectrum is squeezed along frequency
axis by a factor of N / M, leaving the frequency band of (x:; 7r, 7r] with zero;
10.2 Attacks and Analysis on SDMI Robust Watermarks 181
TlmtHlomain difference between merked and orig. of Tech-c
~ 400 ............ , .. .
.~
~ 200
!
!
!-200
~
o
~
~OOL-------~--------~--------~------~L-------~--------~
0.34 0.36 0.38 0.4 0.42 0.44 0.46
time (second)
FFT mag. ofdifferenc;e between anp1c and sarnp2c [15001,15500)
4-5
..:f'"1.
3.5
~ 3
~
.. _- - ' ' I t
-
r: /"\",\
1'1 \;'-"''1.4\
. ~' ..................~
\:, 1 ,
1.5
600 600 1600 1800 2000
FIGURE 10.3. Technology-C: (a) the waveform of the difference between sam-
ple-Ie and sample-2c exhibits tone bursts, and (b) the short-time DFT of one
tone burst. The samples observed here occur around O.34-th second.
182 10. Attacks on Unknown Data Hiding Algorithms
uniformly
low down
no jitter or
warping
:;
&
::I
o
4-
o uruformly speed up
><
III
"t:l
C
III triangu lar warping
E function
<-----;:""---:
'----+--- warping according to
ainu o id function
time index of input
FIGURE 10.4. Relations between the input and output time index of a few
time-domain jittering/warping, including uniform speed-up, uniform slow-down,
sinusoid warping, and triangular warping.
for the case of 0 < M/N < I, the frequency band [0, ~ 71"] of the origi-
nal spectrum is stretched to cover the whole new spectrum, dropping the
high frequency band (~ 71",71"] of the original spectrum. At the end of this
rate conversion, the magnitude of the new spectrum is scaled by M / N, the
sampling frequency 271" radian per sample corresponds to ~ Fs Hz, and the
pitch has not changed. When the signal is played at Fs samples per second,
the spectrum with frequency unit of radian per sample is unchanged, but
the frequency of 271" radian per sample is now mapped to Fs Hz, effectively
changing the pitch by a fraction of (N - M)/M. Attack-C1 can also be
implemented using commercial audio editing software. For example, the Ef-
fects ---; Pitch menu of GoldWave v4.19 [177] were used as an alternative
way to perform pitch shift attacks (Fig. 10.5).
The ability to perceive pitch change varies from individual to individual
and depends on whether a reference is available. While most people can
discriminate pitch difference as low as 0.6% [89], it is nevertheless rather
difficult for a person to identify small pitch changes if he/she has never heard
the original before. The standard pitch itself also changed significantly in
music history [87][91]. For example, the pitch of piano's A major, changed
steadily from as low as 420Hz in the early 18th century to as high as 457Hz in
late 19th century before settling down at the current international standard
of 440Hz. Our attack with 3% pitch increase (about a quarter tone) has
10.2 Attacks and Analysis on SDMI Robust Watermarks 183
FIGURE 10.5. Graphics user interface of GoldWave audio editing shareware
passed SDMI's strict 2nd round quality testing performed by "golden ears"
after the challenge.
As described previously, we observed that the embedding mechanism adds
a narrow band signal to the audio at around 1350Hz. Pitch change can be
an effective attack because it stretches or squeezes the spectrum, causing
misalignment, which in turn reduces the detector response from the popular
matched-filter-type detection. One way to enhance the robustness against
Attack-C1 is to estimate and undo the stretching, which is likely to be com-
putationally expensive. Another way is to embed and/or detect watermark
in a domain that is resilient to stretching/squeezing [103].
Attack-C2 Our second attack belongs to Type-2, attempting to jam the
frequency band around 1350Hz where it was observed that a narrow band
signal had been added by the embedding mechanism. This narrow band
watermark signal has some randomness, making jamming difficult. We have
seen the excellent anti-jamming capability of spread spectrum watermark
in earlier chapters. This commonly used noise-like watermark has good sta-
tistical property so that the power of uncorrelated additive noise has to be
large enough to effectively jam the watermark [49]. However, to preserve
auditory quality, the noise power has to be kept low. Our successful attack
184 10. Attacks on Unknown Data Hiding Algorithms
is to apply notch filtering to the audio signal at selected frequencies. The
filtering introduces significant changes in magnitude and phase around the
notch (shown in Fig. 10.6) [22], effectively damaging the embedded water-
mark. Specifically, we used the Effects --t Filters --t Bandpass/stop menu of
the audio editor GoldWave to perform notch filtering, with a stop band of
1250-1450Hz and steepness of 5 (i.e., 10th order).
IH(ro)12 arg(H(ro))
ro
, ,
\ ~,,'
I 0
Ie
zeros
poles
I ......... _...... ro
FIGURE 10.6. A 2nd order notch filter: (from left to right) zero-pole plot and
frequency response (magnitude and phase).
Attack-C2 has passed SDMI's 2nd round quality testing performed by
"golden ears" . For signals with sufficiently rich spectrum, the magnitude and
phase changes caused by notch filtering may not be detectable by a person
because of frequency masking and other human auditory phenomena. In
the next section, we will see that the embedding process of Watermark-B
has a step of notch filtering, suggesting that Watermark-B is a potential
attack on Watermark-C. It also suggests that the distortion on audio signal
imposed by our Attack-C2 is comparable with the distortion imposed by
the embedding process of Watermark-B.
10.2.3 Attacks on Watermark A, B f3 F
Watermark A Our attack on Watermark-A, referred as copy attack, is a
Type-2 attack. By analyzing the short-time FFT of the samples, we observed
regular patterns of phase difference. The observation leads to a time vary-
ing model describing the phase difference between sample-la and sample-2a.
Based on the model, our attack "copies" the phase change between sample-
la and sample-2a to sample-3a, aiming at recovering the phase modification
done by embedding process. We also introduced some randomness in middle
frequency bands during phase manipulation. A variation of this attack incor-
porating magnitude manipulation was also submitted. Both were confirmed
by SDMI oracle as successful.
Watermark B Our attack on Watermark-B is also a Type-2 attack. A
spectrum notch is observed around 2800Hz for some parts of the audio
10.2 Attacks and Analysis on SDMI Robust Watermarks 185
x 10' OFT magnitude and phase of audio waveform of TeCh-a(Ch-l)
14 ~~---------r------~---'------------'------------.------------'
12
••
-w
of sample-1 & sample-2
- - mag. drfferMC8
.20~------~~--------~--------~------~~==~~~~
0.5 1.5 2.5
frequency (Hz) x 10"
x 10' OFT magnitude and phase of audio waveform of Tedl....(ch·l)
.,
tl0~'--~----f---~-----f-~HH~~~ IIIH--HI
&
~e
~
t::
o m.gn~ude of sample-l & sample-2 - mag_difference
- - phase difference
-- zero phase
O~~~~~~~~~----------~
frequency (Hz)
1.5
.
2.5
x 10
FIGURE 10.7. Technology-A: FFT magnitude of original and watermarked sig-
nals, and phase difference between the two signals for a 1OOO-sample segment. The
two figures are for signals around 3.22-th second and 4.33-th second, respectively.
186 10. Attacks on Unknown Data Hiding Algorithms
and around 3500Hz for some other parts. In addition, the phase difference
between original and watermarked audio signals exhibits unique butterfly
shape, indicating that notch filtering is involved in embedding. Our attack
fills in those notches with random but bounded coefficient values. We also
submitted a variation of this attack involving different parameters for notch
description. Both were confirmed by SDMI oracle as successful. Interest-
ingly, an embedding technique similar to our observations from Technology-
B was found in US Patent 4,876,617 "Signal Identification" [101] after the
challenge. This once again indicates that relying on the secrecy of the embed-
ding algorithm is not a long-term solution to protecting a public watermark
system.
x 10' DFT magMude and phase 01 audoo wawlioon 01 Tech-b(ch-I. 98 6158<:. 1000 samples)
~~~~--=2~~--»OO==--~3~200~~~~--~~~--~~~---4000~--~4200~---«ooL---4~600~~
lrequency (Hz)
FIGURE 10.8. Technology-B: FFT magnitudes of sample-1b and sample-2b and
their difference for 1000 samples at 98.67 sec.
Watermark F Our attack on Watermark-F explores the weakness of this
watermarking approach under time varying warping in time domain, thus
is a Type-1 attack. In particular, we warped the time axis by inserting a
periodically varying delay. The delay function comes from our study on
Watermark-A, therefore the perceptual quality of our attacked audio is ex-
pected to be better than or comparable to that of the audio watermarked by
Technology-A. We also submitted variations of this attack involving differ-
ent warping parameters and different delay function. The warping we per-
formed follows a sinusoid or a triangle function, as illustrated in Fig. 10.4.
The attacks were confirmed by SDMI oracle as successful.
After the challenge, Boeufl and Stern presented their analysis and suc-
cessful attack on Watermark F in [151]. An autocorrelation analysis was
applied to the difference signal between the original and watermarked au-
dio, and a periodicity of 1470 samples Uo
second) was observed. Further
10.2 Attacks and Analysis on SDMI Robust Watermarks 187
study in their report suggested that the difference between 8 1 and 8 2 is
a periodic spread spectrum signal with a period of 1470 samples, and the
watermark is scaled with different scaling factor for every 147 samples. The
scaling factor appears to be a function of the average power of the host au-
dio signal in a local window. The watermark can be detected non-coherently
(without using the original audio) by taking a correlation over a long win-
dow to suppress the strong interference from the host signal. This detection
strategy has been analyzed in Chapter 3 of this book. Having found that
the same spread spectrum signal is embedded in both 8 1 and 8 3 , Boeufl
et al. designed an algorithm to first search for the initial offset from which
the watermark starts to be put into 83 and then successfully remove the
watermark by subtraction. Their work provides a foundation to explain the
effectiveness of our blind attack: without purposely performing registra-
tion/synchronization or without embedding in a resilient domain, spread
spectrum embedding is vulnerable against jittering 2.
10.2.4 Remarks
We presented a general framework for analyzing the robustness and secu-
rity of audio watermark systems. The framework was demonstrated by our
successful attacks in the SDMI public challenge. We pointed out that (1)
the likelihood of the weaknesses in a watermark system being explored by
an intelligent adversary is high, prompting the need of thorough testing
by watermark designers; (2) a large amount of information regarding the
embedding mechanism, derived from pairs of the original and watermarked
signals, can be used to build powerful attacks, prompting the need of obscur-
ing distinct traces between original and watermarked signals. The second
point, though not having received much attention in the literature, is cru-
cial for SDMI applications and has a tradeoff with respect to the robustness
against other attacks.
Due to various limitations of the challenge including the very short time
frame, we adopted practical strategies to increase our chance in finding
successful attack(s) and in understanding all four watermark technologies.
For example, we did not incorporate sophisticated human auditory system
(HAS) models that can further improve the perceptual quality. Instead,
we focused on finding attacks that render miss detection by a watermark
detector without significantly degrading perceptual quality. As illustrated
in Fig. 10.2, instead of starting from highly noisy audio around the point
A, we look for attacks (such as those around the point B) that is as close
to high perceptual quality region as possible and in the mean time as far
away from detect ability threshold as possible. These are crucial start points
2The counterpart of audio (I-D) warping/jittering attacks in image (2-D) is the geo-
metric distortions. We have discussed the attacks and countermeasures for image water-
marks under rotation, scale, and translation in Chapter 9.
188 10. Attacks on Unknown Data Hiding Algorithms
from which many optimizations, improvement, and fine-tuning are feasible
to allow us to proceed to the ideal attack region (region IV in Fig. 10.2).
10.3 Attacks and Analysis on SDMI Fragile
Watermarks
We have mentioned earlier that an SDMI system may use both robust and
fragile watermarks. In addition to rendering the robust watermarks un-
detectable, an adversary may forge a fragile watermark to obliterate the
access/copy control mechanism. For example, let us consider a policy pro-
posed in SDMI activities that lossy compression on audio files is not allowed.
Lossy compression, which allows for the easy exchange of audio files over
network, is likely to destroy the fragile watermark but still retain the robust
watermark. Adversaries may first compress an audio file, then before admit-
ting the audio to an SDMI-compliant device, they can decompress the file
and reconstruct the fragile watermark. Upon examining the existence and
the content of the robust and fragile watermarks in an audio file, an SDMI
device draws a false conclusion that the audio has not been compressed and
that the user has not violated the access control policy.
More abstractly, the fragile watermark in an SDMI system serves the
purpose of tampering detection, which is a major application of fragile wa-
termarks. Issues regarding the designs, the attacks, and the countermeasures
of watermark-based authentication have been discussed in Chapter 7, where
the basic authentication idea is to keep a reference and compare a test signal
with it later. It is desirable to keep the data volume of the reference small
so that the overhead in storage or transmission of the entire data is small.
The location of the reference does not have to be secret, but the reference
must (1) be unambiguous in the sense that two sets of meaningful data are
unlikely to have the same signature, and (2) be difficult to be tampered
without leaving any trace. For perceptual source such as digital audio, the
reference can be combined with the perceptual source in a more seamless
way via watermarking. For example, one can embed a predetermined data
pattern or some features of the host audio signal into the audio, and later
when the authenticity of an audio is in question, one can verify the in-
tegrity of these embedded data to decide on the authenticity of the audio
signal. The watermark-based authentication relies on either the embedded
data, or the fragility and secrecy of the embedding mechanism, or both.
Compared with non-embedding approaches that only make use of the first
element (e.g., attaching a cryptographic digital signature to the audio), the
watermark-based approach may offer additional security if designed prop-
erly. A poorly designed watermark algorithm, however, can leave holes for
adversaries to forge a valid authentication watermark.
10.3 Attacks and Analysis on SDMI Fragile Watermarks 189
One potential flaw regarding fragile watermark in SDMI-like application is
to rely too much on the secrecy of embedding mechanism. In [156], Technol-
ogy A is taken as an example to demonstrate how the embedding mechanism
of a fragile watermark can be explored. Weak echoes have been observed in
high frequency bands around 8-16K Hz 3. The polarity and delay of echoes
vary about every 1/50 second, and they are likely to be used to encode au-
thentication information. The data embedded in such high frequency bands
are likely to be distorted by lossy compression (such as MP3) and low-
pass filtering. If the authentication data (Le., the data to be embedded)
is not wisely chosen, an adversary can explore the inner workings of em-
bedding mechanism and use this knowledge to recover the authentication
data after performing disallowed processing/distortion on the audio signal.
A simple choice of embedding the same pattern fragilely for different audio
files, could leave holes for adversaries to repair the authentication data by
using the knowledge of the embedding mechanism. Holliman et al. discussed
a few cases of counterfeiting watermarks in images [158] and pointed out the
weaknesses of embedding data that are independent of the host media. If
the fragile watermark in an SDMI system were designed to be independent
of the host media, it would be vulnerable to forgery attack. The perceptual
quality of attacked signals can be very good. This is because an attacker
does not need to destroy the robust watermark (which could introduce per-
ceptual distortion); what he/she needs to do is just to recover the fragile
watermark that generally has lower energy and is perceptually transparent.
A countermeasure against forging fragile watermark is to introduce de-
pendency, which has been discussed in Section 9.3. That is, we embed some
data, or called "features", that are derived from the host audio signal. De-
noting the features derived from an audio signal 8 1 as d1 = f(8 1 ), and
those derived from an altered signal 8 2 as d2 = f(8 2 ) (e.g., 8 2 could be
an MP3 compressed version of 8d, we would like to choose a function f(·)
such that d1 and d2 are sufficiently different. Encryption and cryptographic
digest/signature can be used in designing f(-) and the keys associated with
f(·) should be kept secret. Readers may notice a potential problem that the
features derived from an audio signal could be different from those derived
from its watermarked version, Le., f(81 ) =I- f(E(81, dd) where E(.,.) is an
embedding function. This problem can be easily fixed by embedding the
data derived from the ith segment of an watermarked audio in the (i + l)th
segment of the unwatermarked audio to obtain (i + l)th watermarked seg-
ment.
In summary, the fragile watermarks in SDMI-like system should be care-
fully designed to eliminate weaknesses against counterfeiting attacks and
other security holes.
3This scheme was also found in US Patent 5,940, 135 [104] awarded to Aris Technolo-
gies, Inc., now Verance Coporation.
190 10. Attacks on Unknown Data Hiding Algorithms
Acknowledgement The work on attacking four SDMI robust watermark
technologies was jointly done with Scott Craver at Princeton University. In
particular, the sinusoid warping attack on Technology F was proposed by
Craver.
11
Conclusions and Perspectives
Multimedia data hiding can be used for a wide range of applications, includ-
ing ownership protection, alteration detection, access/copy control, annota-
tion, and conveying other side information. In this book, we have presented
analytic approaches and experimental results of various aspects of data hid-
ing. In addition to the design issues, we discussed attacks on watermarking
algorithms with a goal of identifying weaknesses and limitations of the ex-
isting frameworks and designs as well as proposing improvement.
While we have discussed many advantages of data hiding and enumer-
ated a number of possible applications, it is necessary in practice to justify
case by case the need of data hiding versus such alternatives as putting
the side information in the user data field. We should keep in mind that in
spite of the interesting intellectual challenge and the current popularity of
data hiding in the research community, engineering practice would always
favor simplicity, efficiency, and effectiveness. On the other hand, currently
identified challenges, weaknesses, as well as limitations are not yet suffi-
cient in drawing conclusion on the usefulness of digital watermarking and
multimedia data hiding. The field is still young and involves a number of dis-
ciplines such as signal processing, computer security, psychology, economics,
business administration, as well as legal issues. Paradigms and underlying
theories are either just being set or to be set. Therefore, objective and
multi-disciplinary approaches would continue to be necessary for studying
multimedia data hiding.
Despite their differences, data hiding (steganography) and cryptography
are tightly connected. Many ideas of cryptography have been found very
useful in such existing data hiding works as tampering detection. As for the
192 11. Conclusions and Perspectives
future research, it would be rewarding to undertake a more general investi-
gation regarding what new value can be offered by combining stegnography
and cryptography, and how to make use of this combination to complement
the weaknesses and limitations when using each of them individually. Such
a study could lead to a better design of practical media security systems,
and to better solutions of the Digital Rights Management (DRM) for mul-
timedia.
Regarding the gap between the highly simplified channel models and the
real-world scenarios in today's data hiding research, a rigorous analysis of
the capacity versus robustness of data hiding in a realistic setting incorpo-
rating non-trivial perceptual models is worthwhile to pursue. It is expected
that fundamental studies regarding embedding capacity could ultimately
throw light to designing or improving practical data hiding systems for a
variety of applications.
Besides the classic use in ownership protection and copy/access control,
we have demonstrated that data hiding can be a useful tool to send side
information in video communications. This is an emerging use of data hid-
ing and can be further explored for applications other than those discussed
in this book. Along with this pursuit, research need to be done toward the
integration of error resilience, transcoding, network condition measurement,
dynamic resource allocation, and admission control in a multimedia commu-
nication system [24] [192], aiming at studying the relations and the interplay
of various modules that were generally addressed individually. This inter-
disciplinary study could lead to better understanding and deployment of
multimedia communications.
References
[ References on Mathematics and Algorithmic Foundations]
[1] W. Feller: An Introduction to Probability Theory and Its Applications,
voLl, 3rd Ed. revised print, John Wiley & Sons, 1970.
[2] D.E. Knuth: The Art of Computer Programming, vol. 2, 3rd Ed.,
Addison-Wesley, 1997.
[3] V.F.Kolchin, B.A.Sevastyanov, V.P. Chistyakov: Random Allocation,
V. H.Winston & Sons, 1978.
[4] R. Sedgewick: Algorithms in C, Addison-Wesley, 1990.
[ References on Communications]
[5] R. E. Blahut: Theory and Practice of Data Transmission Codes, 2nd
Edition (draft), 1997.
[6] T.M. Cover, J.A. Thomas: Elements of Information Theory, 2nd Ed.,
John-Wiley & Sons, 1991.
[7] H.V. Poor: Introduction to Detection and Estimation, 2nd Ed.,
Springer-Verlag, 1994.
[8] J.G. Proakis: Digital Communications, 3rd Ed., McGraw-Hill, 1995.
[9] A.S. Tanenbaum: Computer Networks, 3rd Edition, Prentice Hall,
1996.
194 References
[10] S. Verdu: Multiuser Detection, Cambridge University Press, 1998.
[References on Multimedia Signal Processing and Coding]
[11] R.N. Bracewell: The Fourier Transform and Its Applications,
McGraw-Hill, 1986.
[12] KR. Castleman: Digital Image Processing, Prentice Hall, 1996.
[13] ITU Telecommunication Standardization Sector (ITU-T): Video Cod-
ing Experts Group (VCEG), http://www.tnt.uni-hannover.de/
project/ vcegj.
[14] A.K Jain: Fundamentals of Digital Image Processing, Prentice Hall,
1989.
[15] JBIG Committee: ISO Final Committee Draft (FCD) 14492 for
JBIG2 Standard, July 1999.
[16] Joint Photographic Experts Group (JPEG), http:/ / www.jpeg.org .
[17] J.S. Lim, A.V. Oppenheim (Eds.): Advanced Topics in Signal Process-
ing, Prentice Hall, 1988.
[18] Mathworks, Inc: Documentation of Matlab 5.3, http:// www. math
works.com , 2000.
[19] J.L. Mitchell, W.B. Pennebaker, C.E. Fogg (Eds): MPEG Video: Com-
pression Standard, Digital Multimedia Standards Series, Chapman &
Hall, 1996.
[20) Moving Picture Experts Group (MPEG), http://www.cselt.it/
mpeg/.
[21) MPEG Points and Resources, http:// www.mpeg.org .
[22) S.J. Orfanidis: Introduction to Signal Processing, Prentice Hall, 1996.
[23) G. K Wallace: "The JPEG Still Picture Compression Stan-
dard" ,IEEE Trans. on Consumer Electronics, vol. 38 , no.l, ppI8-34,
1992.
[24] Y. Wang, J. Ostermann, Y-Q. Zhang: Digital Video Processing and
Communications, Prentice Hall, 2001.
[ References on Cryptography, Security, and Copyright )
[25) B. Schneier: Applied Cryptography: protocol, algorithms, and source
code in C, 2nd Ed., John Wiley & Sons, 1996.
References 195
[26] W. 'frappe and L.C. Washington: Introduction to Cryptography with
Coding Theory, Prentice Hall, 200l.
[27] U.S. Copyright Office: "The Digital Millennium Copyright Act of
1998" (DMCA), Summary and Public Law, 1998.
[ Tutorials, Surveys, and Special Issues on Data Hiding]
[ See also:] [128, 132]
[28] R.J. Anderson, F.A.P. Petitcolas: "Information Hiding: An Annotated
Bibliography", http://www.cl.cam.ac.uk/ rvfapp2/ stegano graphy/
bibliography /, August 1999.
[29] I.J. Cox, M.L. Miller, J.A. Bloom: Digital Watermarking, Morgan
Kaufmann Publishers, 200l.
[30] F. Hartung, M. Kutter: "Multimedia Watermarking Techniques", Pro-
ceedings of IEEE, vol.87, no.7, pp.1079-1107, July, 1999.
[31] F. Mintzer, G. W. Braudaway, M. M. Yeung: "Effective and Ineffective
Digital Watermarks", Proc. of the IEEE International Conference on
Image Processing (ICIP'97), Santa Barbara, Oct. 1997.
[32] F.A.P. Petitcolas, R.J. Anderson, M.G. Kuhn: "Information Hiding
- A Survey", Proceedings of IEEE, vol. 87, no.7, pp.1062-1078, July,
1999.
[33] M.D. Swanson, M. Kobayashi, A.H. Tewfik: "Multimedia data-
embedding and watermarking technologies", Proceedings of IEEE,
vol. 86, pp.1064-1087, June, 1998.
[34] Special Issue on Watermarking, Communications of the ACM, vo1.41,
no.7, July, 1998.
[35] Special Issue on Emerging Applications of Multimedia Data Hiding,
EURASIP Journal on Applied Signal Processing (JASP), vo1.2002,
no.2, Feb. 2002.
[36] Special Issue on Copyright and Privacy Protection, IEEE Journal on
Selected Areas in Communication (JSAC), v.16, n.4, May 1998.
[37] Special Issue on Identification and Protection of Multimedia Informa-
tion, Proceedings of IEEE, vol.87, no.7, July, 1999.
[38] Special Issue on Watermarking, Signal Processing, vol.66, no.3, Else-
vier Science, May, 1998.
196 References
[ Theses on Data Hiding and Watermarking J
[39J B. Chen: Design and Analysis of Digital Watermarking, Information
Embedding, and Data Hiding Systems, Ph.D. Dissertation, MIT, June
2000.
[40J D. Karakos: Digital Watermarking, Fingerprinting, and Compression:
An Information-Theoretic Perspective, Ph.D. Dissertation, University
of Maryland, College Park, June 2002.
[41J D. Kundur: Multiresolution Digital Watermarking: Algorithms and
Implications for Multimedia Signals, Ph.D. Dissertation, University
of Toronto, 1999.
[42J C-Y. Lin: Watermarking and Digital Signature Tecniques for Multi-
media Authentication and Copyright Protection, Ph.D. Dissertation,
Columbia University, Dec. 2000.
[43J L. Qiao: Multimedia Security and Copyright Protection, Ph.D. Disser-
tation, University of Illinois at Urbana-Champaign, 1998.
[44J M. Ramkumar: Data Hiding in Multimedia - Theory and Applications,
Ph.D. Dissertation, 2000.
[45J J. Song: Optimal Rate Allocation and Security Schemes for Image
and Video Transmission over Wireless Channels, Ph.D. Dissertation,
University of Maryland, College Park, 2000.
[46J M. Wu: Multimedia Data Hiding, Ph.D. Dissertation, Princeton Uni-
versity, April 2001, http://www.ee.princeton.edu/..-.minwu/ research
/phd_thesis.html .
[47J W. Zeng: Resilient Video Transmission and Multimedia Database Ap-
plication, Ph.D. Dissertation, Princeton University, April 1997.
[ Spread Spectrum Additive Watermarking (Type-I Embedding) J
[ See also: J [98, 102, 105, 113, 172, 173J
[48J W. Bender, D. Gruhl, N. Morimote, "Techniques for Data Hiding",
Proc. of SPIE, vo1.2420, pp40, 1995.
[49J 1. Cox, J. Kilian, T. Leighton, T. Shamoon: "Secure Spread Spectrum
Watermarking for Multimedia" , IEEE Transaction on Image Process-
ing, vol. 6, no.12, pp.1673-1687, 1997.
[50J I.J. Cox: "Spread Spectrum Watermark for Embedded Signaling",
U.S. Patent 5,848,155, Dec. 8, 1998.
References 197
[51] I.J. Cox, M.L. Miller, A. McKellips: "Watermarking as Communica-
tions With Side Information", Proceedings of the IEEE, vol. 87, no. 7,
pp. 1127-1141, 1999.
[52] G. Depovere, T. Kalker, J-P. Linnartz: "Improved Watermark Detec-
tion Using Filtering Before Correlation", IEEE Int. Conf. on Image
Processing, vol. 1, pp. 430-434, Chicago, IL, Oct. 1998.
[53] A. Herrigel, J. Oruanaidh, H. Petersen, S. Pereira, T. Pun: "Secure
Copyright Protection Techniques for Digital Images" , Proc. of Second
Information Hiding Workshop (IHW), Lecture Notes in Computer
Science, Springer-Verlag, vol. 1525, 1998.
[54] J. Liang, P. Xu, T.D. Tran, "A universal robust low frequency wa-
termarking scheme," submitted to IEEE Trans. on Image Processing,
May 2000. A short version appeared in Conf. on Information Sciences
and Systems (CISS'OO), Princeton, NJ, March 2000.
[55] M.L. Miller, J.A. Bloom: "Computing the probablity of false water-
mark detection", Proceedings of the Third International Workshop on
Information Hiding (IHW), 1999.
[56] M.L. Miller, J.A. Bloom, I.J. Cox: "Exploiting Detector and Im-
age Information in Watermark Embedding", Proc. of the IEEE In-
ternational Conference on Image Processing (ICIP'OO), Vancouver,
Canada, Sept. 2000.
[57] C. Podilchuk, W. Zeng: "Perceptual Watermarking of Still Images",
IEEE First Workshop of Multimedia Signal Processing, 1997.
[58] C. Podilchuk, W. Zeng: "Image Adaptive Watermarking Using Visual
Models", IEEE Journal Selected Areas of Communications (JSAC),
vo1.16, no.4, pp.525-538, May, 1998.
[59] M. D. Swanson, B. Zhu, A. H. Tewfik, "Transparent Robust Image
Watermarking" , Proc. of the IEEE International Conference on Image
Processing (ICIP'96), Lausanne, Switzerland, Sept. 1996.
[60] B. Tao, B. Dickinson: "Adaptive Watermarking in the DCT Domain" ,
Proc. of the IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP), 1997.
[61] W. Zeng, B. Liu, "A Statistical Watermark Detection Technique
Without Using Original Images for Resolving Rightful Ownerships
of Digital Images," IEEE Trans. Image Processing, vol. 8, no. 11, pp.
1534-1548, Nov. 1999.
198 References
[ Robust Data Hiding Via Enforcement (Type-II Embedding) ]
[ See also:] [95, 97, 99, 100, 107, 108, 125, 127, 131, 136, 137, 138, 139,
145, 146, 168]
[62] B. Chen, G.W. Wornell: "Digital Watermarking and Information Em-
bedding Using Dither Modulation", Proc. of IEEE Workshop on Mul-
timedia Signal Processing, Dec. 1998.
[63] B. Chen, G.W. Wornell: "Dither Modulation: A New Approach to
Digital Watermarking and Information Embedding", Proc. of SPIE,
Security and Watermarking of Multimedia Contents, vol. 3657, Jan.,
1999.
[64] B. Chen, G.W. Wornell: "Quantization Index Modulation: A Class of
Provably Good Methods for Digital Watermarking and Information
Embedding" , IEEE Trans. on Info. Theory, vol.47, noA, pp1423-1443,
May 2001.
[65] J. Chou, S.S. Pradhan, L.E. Ghaoui, K. Ramchandran: "Watermark-
ing Based on Duality With Distributed Source Coding and Robust
Optimization Principles" , Proc. of the IEEE International Conference
on Image Processing (ICIP'OO), Vancouver, Canada, Sept. 2000.
[66] J.J. Eggers, R Bauml, R Tzschoppe, B. Girod: "Scalar Costa Scheme
for Information Embedding", submitted to IEEE Trans. on Signal
Processing, 2002, preprint available at http:/ / www-nt.e-technik.uni-
erlangen.de/ rveggers/ publications.html.
[67] C-T. Hsu, J-L. Wu: "Hidden Signatures in Image" , Proc. of the IEEE
International Conference on Image Processing (ICIP'96), vol. 3, Lau-
sanne, Switzerland, Sept. 1996.
[68] M. Kesal, M.K. Mihcak, R Koetter, and P. Moulin: "Iteratively De-
codable Codes for Watermarking Applications,", Proc. of 2nd Inter.
Symp. on Turbo codes and Related Topics, Brest, France, Sept. 2000.
[69] E. Koch, J. Zhao: "Towards Robust and Hidden Image Copyright
Labeling", Proc. of IEEE Workshop on Nonlinear Signal and Image
Processing, 1995.
[70] G.C. Langelaar and RL. Lagendijk: "Optimal Differential Engergy
Watermarking of DCT Encoded Images and Video", IEEE Trans. on
Image Processing, voLlO, no.1, pp.148-158, Jan. 2001.
[71] M. Ramkumar, A.N. Akansu: "A Robust Scheme for Oblivious De-
tection of Watermarks / Data Hiding in Still Images", Symposium on
Voice, Video, and Data Communication, Proc. of SPIE, 1998.
References 199
[72] M.D. Swanson, B. Zhu, A.H. Tewfik: "Robust Data Hiding for Im-
ages", Proc. of IEEE DSP Workshop, 1996.
[ Data Hiding Capacity and Related Fundamental Issues]
[ See also:] [64]
[73] M. Barni, F. Bartolini, A. De Rosa, A. Piva: "Capacity of Full Frame
DCT Image Watermarks" , IEEE Trans. On Image Proc., vol. 9, nO. 8,
pp.1450 -1455, Aug. 2000.
[74] M.H.M. Costa: "Writing On Dirty Paper", IEEE Trans. on Info. The-
ory, vol. IT-29, nO. 3, May 1983.
[75] L.M. Marvel, C.G. Boncelet: "Capacity of the Additive Stegano-
graphic Channel", preprint, 1999, http:// www.eecis.udel.edu/
""'marvell·
[76] P. Moulin, J.A. O'Sullivan: "Information-Theoretic Analysis of In-
formation Hiding", preprint, Sept. 1999, revised Dec. 2001, http:/ I
www.ifp. uiuc.edul "",moulinl paper.html.
[77] P. Moulin and J.A. O'Sullivan: "Information-Theoretic Analysis of
Watermarking", Proc. of the IEEE International Conference on
Acoustics, Speech, and Signal Processing (ICASSP), 2000.
[78] P. Moulin, M.K. Mihcak, G-1. Lin: "An Information-theoretic Model
for Image Watermarking and Data Hiding", Proc. of the IEEE In-
ternational Conference on Image Processing (ICIP'OO), Vancouver,
Canada, Sept. 2000.
[79] M. Ramkumar, A.N. Akansu: "Information Theoretic Bounds for Data
Hiding in Compressed Images" , Proc. of IEEE 2nd Multimedia Signal
Processing Workshop, 1998.
[80] S. D. Servetto, C. I. Podilchuk, K. Ramchandran: "Capacity Issues in
Digital Image Watermarking", Proc. of the IEEE International Con-
ference on Image Processing (ICIP'98), Chicago, IL, Oct. 1998.
[81] C.E. Shannon: "The Zero-error Capacity of a Noisy Channel", IRE
Trans. on Info. Theory, IT-2, pp8-19, 1956.
[82] C.E. Shannon: "Channels With Side Information at the Transmitter" ,
IBM Journal of Research and Development, pp. 289-293, 1958.
[83] J.R. Smith, B.O. Comiskey: "Modulation and Information Hiding in
Images", Proc. of the First Information Hiding Workshop (IHW),
1996.
200 References
[84] M. Wu, B. Liu: "Digital Watermarking Using Shuffling", Proc. of
IEEE International Conference on Image Processing (ICIP'99), Kobe,
Japan, 1999.
[85] M. Wu, B. Liu: "Modulation and Multiplexing Techniques for Mul-
timedia Data Hiding", Invited paper, Proc. of SPIE ITcom 2001 -
Multimedia Systems and Applications IV, SPIE vol. 4518, Denver,
CO, Aug. 2001
[86] M. Wu, B. Liu: "Data Hiding in Image and Video: Part-I - Fun-
damental Issues and Solutions", submitted to IEEE Trans. on Im-
age Processing, Jan. 2002, http://www.ece. umd.edu/ ""minwu/ re-
search.html #Journal .
[ Perceptual Models: HAS and HVS ]
[ See also:] [14, 24]
[87] Association of Blind Piano Tuners: "History of Pitch", http://www.
uk-piano.org/ history /pitch.html, 2000.
[88] P.R. Cook (eds.): Music, Cognition, and Computerized Sound: An
Introduction to Psychoacoustics, The MIT Press, 1999.
[89] Doug Coulter: Digital Audio Processing, R&D Books, 2000.
[90] H.A. Peterson, A.J. Ahumada, A.B. Watson: "Improved Detection
Model for DCT Coefficient Quantization", Proc. SPIE Conf. Human
Vision, Visual Processing, and Digital Display IV, vol. 1913, pp.191-
201, Feb. 1993.
[91] E.E. Swenson: "The History of Musical Pitch in Tuning the Pi-
anoforte", http://www.mozartpiano.com/pitch.html. 2000.
[92] A.B. Watson: "DCT Quantization Matrices Visually Optimized for
Individual Images", Proc. SPIE Conf. Human Vision, Visual Pro-
cessing, and Digital Display IV, vol. 1913, pp.202-216, Feb. 1993.
[ Specialized Embedding: Binary Images ]
[93] A.K. Bhattacharjya, H. Ancin: "Data Embedding in Text For a Copier
System", Proc. of the IEEE International Conference on Image Pro-
cessing (ICIP'99), Kobe, Japan, Oct. 1999.
[94] A. Finkelstein: personal communication, 1998.
[95] E. Koch, J. Zhao: "Embedding Robust Labels Into Images for Copy-
right Protection" , Proceedings of the International Congress on Intel-
lectual Property Rights for Specialized Information, Knowledge f3 New
Technologies, 1995.
References 201
[96) Y. Liu, J. Mant, E. Wong, S.H. Low: "Marking and Detection of
Text Documents Using Transform-domain Techniques", Proceedings
of SPIE, vol. 3657, Electronic Imaging (EI'99) Conference on Secu-
rity and Watermarking of Multimedia Contents, San Jose, CA, 1999.
[97] K Matsui, K Tanaka: "Video-steganography: How to Secretly Embed
a Signature in a Picture", Proc. of IMA Intellectual Properly Project,
vol. 1, no. 1, 1994.
[98) N.F. Maxemchuk, S. Low: "Marking Text Documents", Proc. of the
IEEE International Conference on Image Processing (ICIP'97), Santa
Barbara, Oct. 1997.
[99) M. Wu, E. Tang, B. Liu: "Data Hiding in Digital Binary Image" , Proc.
of IEEE International Conference on Multimedia & Expo (ICME'OO),
New York City, NY, 2000.
[100] M. Wu, B. Liu: "Data Hiding in Binary Image for Authentication
and Annotation" , revised for publication in IEEE Trans. on Multime-
dia, March 2002, http://www.ece. umd.edu/ rvminwu/ research.html
#Journal .
[ Specialized Embedding: Audio)
[101) S.J. Best, R.A. Willard: "Signal Identification", US Patent 4,876,617,
Thorn EMI Pic, October 1989.
[102] L. Boney, A.H. Tewfik, KN. Hamdy: "Digital Watermarking for Audio
Signals" , Proc. of Inter. Conf. on Multimedia Computing and Systems
(ICMCS '96), Hiroshima, Japan, pp.473-480, June, 1996.
[103] X. Li, H. Yu: "Transparent and Robust Audio Data Hiding in Cep-
strum Domain" , Proc. of IEEE International Conference on Multime-
dia & Expo (ICME'OO), New York City, NY, 2000.
[104) R. Petrovic, J.M. Winograd, K Jemili, E. Metois: "Apparatus and
Method for Encoding and Decoding Information in Analog Signals" ,
US Patent 5,940,135, Aris Technologies, Inc., August 1999.
[ Specialized Embedding: Video]
[ See also:] [146]
[105] F. Hartung, B. Girod: "Watermarking of Uncompressed and Com-
pressed Video", Signal Processing, vol.66, no. 3, pp. 283-301, May,
1998.
202 References
[106] M.L. Miller, LJ. Cox, J.A. Bloom: "Watermarking in the Real World:
An Application to DVD", Proc. of Watermark Workshop at ACM
Multimedia '98, 1998.
[107] D. Mukherjee, J-J. Chae, S.K. Mitra, B.S.Manjunath: "A Source and
Channel Coding Framework for Vector-Based Data Hiding in Video",
IEEE Trans. on Circuits and Systems for Video Technology, v.10, n.4,
pp630-645, June, 2000.
[108] M.D. Swanson, B. Zhu, A.H. Tewfik: "Data Hiding for Video-in-
Video", Proc. of the IEEE International Conference on Image Pro-
cessing (ICIP'97), Santa Barbara, Oct. 1997.
[109] M.D. Swanson, B. Zhu, A.H. Tewfik: "Multiresolution Scene-Based
Video Watermarking Using Perceptual Models", IEEE Journal on
Selected Areas in Communication (JSAC), v.16, n.4, pp540-550, May
1998.
[110] M. Wu, H. Yu, A. Gelman: "Multi-level Data Hiding for Digital Image
and Video", Proceedings of SPIE, vol. 3845, Photonics East Confer-
ence on Multimedia Systems and Applications, Boston, MA, 1999.
[111] M. Wu, H. Yu: "Video Access Control via Multi-level Data Hid-
ing", Proc. of IEEE International Conference on Multimedia (3 Expo
(ICME'OO), New York City, NY, 2000.
[112] M. Wu, H. Yu, B. Liu: "Data Hiding in Image and Video: Part-II - De-
signs and Applications" , submitted to IEEE Trans. on Image Process-
ing, Jan. 2002, http://www.ece. umd.edu/ "'minwu/ research.html
#Journal.
[113] W. Zhu, Z. Xiong, Y-Q. Zhang: "Multiresolution Wavelet-Based Wa-
termarking of Images and Video" , IEEE Trans. Circuits and Systems
for Video Tech, vol. 9, pp. 545-550, June 1999.
[ Specialized Embedding: 3-D Graphic Data]
[114] O. Benedens: "Geometry-based Watermarking of 3-D Models", IEEE
Computer Graphics and Applications, Jan. 1999, pp46-55.
[115] R. Ohbuchi, H. Masuda, M. Aono: "Watermarking Three-Dimensional
Polygonal Models Throug Geometric and Topological Modifications" ,
IEEE Journal on Selected Areas in Communications (JSAC), vol. 16,
no. 4, May 1998, pp551-559.
[116] E. Praun, H. Hoppe, A. Finkelstein: "Robust Mesh Watermarking",
Proc. of ACM SIGGRAPH, 1999.
[117] M.M. Yeung and B-L. Yeo: "Fragile Watermarking of 3-D Objects",
Proc. of Inter. Conf. on Image Processing (ICIP), Chicago, IL, 1998.
References 203
[ Specialized Embedding: Visible Watermarking]
[118] G.W. Braudaway, KA. Magerlein, F. Mintzer: "Protecting Publicly-
Available Images With A Visible Image Watermark", SPIE Conf. on
Optical Security and Counterfeit Deterrence Techniques, vol. 2659, pp.
126-133, Feb. 1996.
[119] J. Meng, S-F. Chang: "Embedding Visible Video Watermarks in the
Compressed Domain" , Proc. of the IEEE International Conference on
Image Processing (ICIP'98), Chicago, IL, Oct. 1998.
[ Specialized Embedding: Non-Perceptual Data]
[120] G. Qu, J.L. Wong, and M. Potkonjak: "Optimization-Intensive
Watermarking Techniques for Decision Problems", Proc. of 36th
ACM/IEEE Design Automation Conference Proceedings, pp.33-36,
June 1999.
[121] J. Stern, G. Hachez, F. Koeun, J-J. Quisquater: "Robust Object Wa-
termarking: Application to Code", Proc. of the Third Info. Hiding
Workshop (IHW'99) , 1999.
[ (Semi-) Fragile Watermarking for Authentication/Annotation]
[122] Epson America, Inc.: http://www.epson.com/cam....scan/cam_extras/
ias/, Image Authentication System, 1999.
[123] J. Fridrich, M. Goljan: "Protection of Digital Images using Self Em-
bedding", Proc. of Symposium on Content Security and Data Hiding
in Digital Media, Newark, NJ, May 1999.
[124] G. L. Friedman: "The Trustworthy Digital Camera: Restoring Credi-
bility to the Photographic Image", IEEE Trans. on Consumer Elec-
tronics, vol.39, no.4, pp.905-91O, Nov. 1993.
[125] D. Kundur and D. Hatzinakos: "Digital Watermarking for Telltale
Tamper-Proofing and Authentication," Proceedings of the IEEE, Spe-
cial Issue on Identification and Protection of Multimedia Information,
vol. 87, no. 7, pp.1167-1180, July 1999.
[126] C.Y. Lin, S.F. Chang: "A Robust Image Authentication Method Sur-
viving JPEG Lossy Compression", Proc. of SPIE Storage and Re-
trieval of Image/Video Databases, Jan. 1998.
[127] C-Y. Lin and S-F. Chang: "Semi-Fragile Watermarking for Authenti-
cating JPEG Visual Content", Proc. of SPIE International Conf. on
Security and Watermarking of Multimedia Contents II (EI'OO), vol.
3971,2000.
204 References
[128] E.T. Lin, E.J. Delp: "A Review of Fragile Image Watermarks", Proc.
of the Multimedia and Security Workshop (ACM Multimedia '99),
pp25-29, 1999.
[129] C-S. Lu, H-Y.M. Liao: "Multipurpose Watermarking for Image Au-
thentication and Protection" , Technical Report, Institute of Informa-
tion Science, Academia Sinica, Taiwan, 2000.
[130] F. Mintzer, G. Braudaway: "If one watermark is good, are more
better? ," Proceedings of the International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), vol. 4, Phoenix, Arizona,
March 1999.
[131] F.A.P. Petitcolas, Examples of Least-significant-bit Embedding,
http://www.cl.cam.ac.uk/ "'fapp2/steganography/ image_ downgrad-
ing/, 1998.
[132] C. Rey and J-L. Dugelay: "A Survey of Watermarking Algorithms for
Image Authentication", EURASIP Journal on Applied Signal Pro-
cessing (JASP), vol.2002, no.6, June 2002.
[133] M. Schneider, S-F. Chang: "A Robust Content Based Digital Sig-
nature for Image Authentication", Proc. of the IEEE International
Conference on Image Processing (ICIP'96), Lausanne, Switzerland,
Sept. 1996.
[134] D. Storck: "A New Approach to Integrity of Digital Images", Proc. of
IFIP Conf. on Mobile Communication, 1996.
[135] S. Walton: " Image Authentication for a Slippery New Age",
Dr. Dobb's Journal, pp18-26, April, 1995.
[136] P. W. Wong: "A Watermark for Image Integrity and Ownership Ver-
ification", ISfJT PIC Conf. Proc., Portland, Oregon, 1998.
[137] M. Wu, B. Liu: "Watermarking for Image Authentication", IEEE In-
ternational Conference on Image Processing (ICIP'98), Chicago, IL,
1998.
[138] L. Xie, G. R. Arce: "Joint Wavelet Compression and Authentication
Watermarking" , Proc. of the IEEE International Conference on Image
Processing (ICIP'98), Chicago, IL, Oct. 1998.
[139] M. M. Yeung, F. Mintzer: "An Invisible Watermarking Technique for
Image Verification", Proc. of the IEEE International Conference on
Image Processing (ICIP'97), Santa Barbara, Oct. 1997.
References 205
[ Digital Fingerprinting ]
[140] D. Boneh and J. Shaw: "Collusion-Secure Fingerprinting for Digital
Data," IEEE Trans. on Info. Theory, vol. 44, Sept. 1998, 1897-1905.
[141] J. Dittmann, P. Schmitt, E. Saar, J. Schwenk, and J. Ueberberg:
"Combining digital watermarks and collusion secure fingerprints for
digital images," SPIE Journal of Electronic Imaging, vol. 9, pp. 456-
467,2000.
[142] W. 'Irappe, M. Wu, K.J.R. Liu: "Anti-collusion Fingerprinting for
Multimedia", IEEE Trans. on Signal Processing, Special issue on Sig-
nal Processing for Data Hiding in Digital Media & Secure Content
Delivery, to appear in 1st Quarter of 2003.
[143] W. 'Irappe, M. Wu, K.J.R. Liu, "Collusion-Resistant Fingerprinting
for Multimedia," Proc. of IEEE Intern. Conf. on Acoustics, Speech,
and Signal Processing (ICASSP), Orlando, FL, May 2002.
[ Miscellaneous Applications of Data Hiding]
[144] D.A. Silverstein, S.A. Klein: "Precomputing and Encoding Com-
pressed Image Enhancement Instructions", United States Patent
5,822,458, Oct. 1998.
[145] D.A. Silverstein, S.A. Klein: "Precomputing and Encoding Com-
pressed Image Enhancement Instructions" , in review for IEEE trans-
actions on image processing, http://www.best.com/..-.amnon/ Home-
pagel Research/Papers/ EncodingEnhance/EncodingEnhance.html ,
2000.
[146] J. Song, K.J.R. Liu: "A Data Embedding Scheme for H.263 Com-
patible Video Coding", Proc. of Inter. Symposium on Circuits and
Systems (ISCAS), vol.4, 1999.
[147] J. Song, R. Poovendran, W. 'Irappe, K.J.R. Liu: "A Dynamic Key
Distribution Scheme Using Data Embedding for Secure Multimedia
Multicast", Proc. of SPIE Electronic Imaging, 2001.
[148] P. Yin, B. Liu, H. Yu: "Error Concealment Using Information Hid-
ing", Proc. of IEEE Intern. Conf. on Acoustics, Speech, and Signal
Processing (ICASSP), Salt Lake City, UT, May 2001.
[149] P. Yin, M. Wu, B. Liu: "Video 'Iranscoding by Reducing Spatial Reso-
lution", Proc. of IEEE International Conference on Image Processing
(ICIP'OO), Vancouver, Canada, 2000.
206 References
[150] P. Yin, M. Wu, and B. Liu: "A Robust Error Resilient Approach for
MPEG Video Transmission Over Internet" , Proc. of Inter. Conference
on Visual Comm. & Image Processing (VCIP'02), San Jose, CA, Jan.
2002.
[ Attack and Security Issues of Watermarking]
[151] J. Boeufl, J.P. Stern: "An Analysis of One of the SDMI Candidates",
Technical Report, http://www.julienstern.org/ sdmi/files/sdmiF /sd-
miF.html, 2001.
[152] 1. Cox, J-P. Linnartz: "Some General Methods for Tampering
with Watermarks", IEEE Journal Selected Areas of Communications
(JSAC), vo1.16, no.4, May, 1998.
[153] S. Craver, N. Memon, B-L. Yeo, M. M. Yeung: "Can Invisible Water-
marks Resolve Rightful Ownerships?", IBM Research Report, 1996.
[154] S. Craver, N. Memon, B-L. Yeo, M. M. Yeung: "On the Invertibility of
Invisible Watermarking Techniques" , Proc. of the IEEE International
Conference on Image Processing (ICIP'97), Santa Barbara, Oct. 1997.
[155] S. Craver, N.Memon, B-L. Yeo, M.M. Yeung: "Resolving Rightful
Ownerships with Invisible Watermarking Techniques: Limitations, At-
tacks, and Implications", IEEE Journal on Selected Areas in Commu-
nication (JSAC), v.16, n.4, pp573-586, May 1998.
[156] S.A. Craver, M. Wu, B. Liu, A. Stubblefield, B. Swartzlander,
D.S. Wallach, D. Dean, E.W. Felten: "Reading Between the Lines:
Lessons from the SDMI Challenge", Proc. of 10th Usenix Security
Symposium, Aug. 2001. Also accepted by Proc. of 4th Info. Hiding
Workshop, Apr. 2001.
[157] F. Hartung, J.K. Su, B. Girod: "Spread Spectrum Watermarking: Ma-
licious Attacks and Counterattacks", Proc. of SPIE, Security and Wa-
termarking of Multimedia Contents, vol. 3657, Jan., 1999.
[158] M. Holliman, N. Memon, "Counterfeiting Attacks on Oblivious Block-
wise Independent Invisible Watermarking Schemes", IEEE Trans. on
Image Processing, vol.9, no.3, March 2000.
[159] F. Petitcolas, R. Anderson, M. Kuhn, "Attacks on Copyright Marking
Systems", Proc. of Second Workshop on Info. Hiding, 1998.
[160] Secure Digital Music Initiative (SDMI): http://www. sdmi.org, 2000.
[161] Secure Digital Music Initiative (SDMI): "SDMI Portable Device Spec-
ification", Part 1, ver 1.0, 1999.
References 207
[162J StirMark Watermark Testbed, http://www.cl.cam.ac.uk/ rvfapp2/
watermarking/ stirmark/, 1998.
[163J H. Stone, "Analysis of Attacks on Image Watermarks with Random-
ized Coefficients" , Technical Report 96-045, NEC Research Institute,
1996.
[164J Test result of International Evaluation Project for Digital Water-
mark Technology for Music: http://www.nrLco.jp/english/ news/
2000/001006.html, 2000.
[165J UnZign, http://www.altern.com/watermark/. A Watermark Robust-
ness Testing Software, 1997.
[166J M. Wu, B. Liu: "Attacks on Digital Watermarks" , Proc. of 33th Asilo-
mar Conference on Signals, Systems, and Computers, 1999.
[167J M. Wu, S.A. Craver, E.W. Felten, B. Liu: "Analysis of Attacks on
SDMI Audio Watermarks", Proc. of IEEE International Conference
on Acoustic, Speech, and Signal Processing (ICASSP'Ol), 2001.
[ Geometric-Distrotion Resilient Watermarking J
[168J M. Alghoniemy, A.H. Tewfik: "Self-synchronizing Watermarking Tech-
niques" , Proc. of Symposium on Content Security and Data Hiding in
Digital Media, NJ Center for Multimedia Research and IEEE, 1999.
[169J M. Alghoniemy and A.H. Tewfik: "Geometric Distortion Correction
Through Image Normalization", Proc. of Inter. Conf. on Multimedia
and Expo (ICME'OO), New York, NY, Aug. 2000.
[170] G. Csurka, F. Deguillaume, J.J.K. ORuanaidh, T. Pun: "A Bayesian
approach to Affine 'Transformation Resistant Image and Video Wa-
termarking", Proc. of the 3rd Information Hiding Workshop (IHW),
Lecture Notes in Computer Science, pp315-330, Springer-Verlag, 1999.
[171] N.F. Johnson, Z. Durie, S. Jajodia: "Recovery of Watermarks from
Distorted Images", Proc. of the 3rd Int. Information Hiding W ork-
shop, pp.361-375, 1999.
[172J C-Y. Lin, M. Wu, J.A. Bloom, M.L. Miller, I.J. Cox, Y-M. Lui: "Ro-
tation, Scale, and 'Translation Resilient Public Watermarking for Im-
ages", Proceedings of SPIE, vol. 3971, Electronic Imaging (EI'OO) Con-
ference on Security and Watermarking of Multimedia Contents, San
Jose, CA, 2000.
[173J C-Y. Lin, M. Wu, Y-M. Lui, J.A. Bloom, M.L. Miller, I.J. Cox: "Ro-
tation, Scale, and 'Translation Resilient Public Watermarking for Im-
ages", IEEE Transactions on Image Processing, voLlO, no.5, pp.767-
782, May 2001.
208 References
[174] J.J.K. ORuanaidh, T. Pun, "Rotation, Translation and Scale Invari-
ant Spread Spectrum Digital Image Watermarking", Signal Process-
ing, vol.66, no.3, 1998.
[175] S. Pereira, T. Pun: "Fast Robust Template Matching for Affine Resis-
tant Image Watermarks", Proc. of the 3rd Information Hiding Work-
shop (IHW), Lecture Notes in Computer Science, Springer-Verlag,
pp207-218, 1999.
[Multimedia Processing Useful to Designers or Adversaries]
[176] Corel Stock Photo Library, Corel Corporation, Canada.
[177] GoldWave: http://www.goldwave.com (audio editing software), 2000.
[178] H. Igehy, L. Pereira, "Image Replacement Through Texture Synthe-
sis" , Proc. of the IEEE International Conference on Image Processing
(ICIP'97), Santa Barbara, Oct. 1997.
[179] K. Jung, J. Chang, C. Lee, "Error Concealment Technique Using Pro-
jection Data for Block-based Image Coding", Proc. of SPIE Conf. on
Visual Communication and Image Processing, vol.2308, ppI466-1476,
1994.
[180] B. Liu, T. Chang, H. Gaggioni: "On the Accuracy of Transformation
Between Color Components in Standard and High Definition Televi-
sion", Proc. of HDTV'92 Workshop, pp57, 1992.
[181] M. McGuire: "An Image Registration Technique for Recovering Ro-
tation, Scale and Translation Parameters", Technical Report 98-018,
NEC Research Institute, 1998.
[182] Scalable Display Wall, http://www.cs.princeton. edu/ omnimedia ,
Princeton University, 1999.
[183] H.S. Stone, B. Tao, M. McGuire: "Analysis of Image Registration
Noise Due to Rotationally Dependent Aliasing" , Technical Report 99-
057R, NEC Research Institute, 1999.
[184] W. Zeng, B. Liu, "Geometric-structure-based Directional Filtering for
Error Concealment in Image/Video Transmission", SPIE Photonics
East'95, vol.2601, pp.145-156, Oct. 1995.
[ Efficient and Secure Multimedia Communications 1
[ See also:] [24]
[185] B. Liu, K-W. Chow, A. Zaccarin: "Simple Method to Segment Motion
Field for Video Coding," Proceeding of SPIE, vol. 1818, 1992.
References 209
[186] N. Merhav, V. Bhaskaran: "A Transform Domain Approach to Spatial
Domain Image Scaling," Proc. of the IEEE International Conference
on Acoustics, Speech, and Signal Processing (ICASSP), 1996.
[187] M.T. Orchard, G.J. Sullivan: "Overlapped Block Motion Compensa-
tion: An Estimation-theoretic Approach," IEEE Transaction on Im-
age Processing, vol. 3, NO.5, 1994.
[188] B. Shen, LK. Sethi, V. Bhaskaran: "Adaptive Motion Vector Resam-
pIing for Compressed Video Down-scaling," Proc. of the IEEE Inter-
national Conference on Image Processing (ICIP'97), Santa Barbara,
Oct. 1997.
[189] Y. Wang, Q. Zhu: "Error Control and Concealment for Video Com-
munication: A Review", Proc. of IEEE, v.86, pp.974-997, May, 1998.
[190] Y. Wang, S. Wenger, J. Wen, A. Katasggelos: "Error Resilient Video
Coding Techniques", IEEE Signal Processing Magazine, July, 2000.
[191] J. Wen, M. Muttrell, M. Severa: "Access Control of Standard Video
Bitstreams", Proc. of Inter. Conf. on Media Future, May 200l.
[192] M. Wu, R. Joyce, H-S. Wong, L. Guan, S-Y. Kung: "Dynamic Re-
source Allocation Via Video Content and Short-term Traffic Statis-
tics", IEEE Trans. on Multimedia, Special Issue on Multimedia over
IP, vol.3, no.2, pp.186-199, June 200l.
[193] N. Yeadon, F. Garcia, D. Hutchison, D. Shepherd: "Continuous Me-
dia Filters for Heterogeneous Internetworking," Proceedings of SPIE
- Multimedia Computing and Networking (MMCN'96) , 1996.
[194] W. Zeng, S. Lei: "Efficient Frequency Domain Video Scrambling for
Content Access Control", Proc. of ACM Multimedia, Nov. 1999.
Index
A conveying side information, 1,
9, 137-146
Additive embedding, 2D-21, see also copy control, 5, 93, 96, 175,
Type-I embedding 188
Additive noise device control, 5, 93, 96, 175,
robustness against, 183 188
sources, 27 fingerprinting, 5, 15, 93
statistical model, 27, 39 ownership protection, 1,4, 15,
Adversaries, see also Attacks; Se- 93, 96
curity rights management, 5, 96,146,
applicable applications, 10, 149- 175-189, 192
150 traitor tracing, see fingerprint-
concealing communications ing
against, 2 Attackers, see Adversaries
in SDMI public challenge, 175- Attacks, see also Adversaries; Se-
189 curity
incentive in authentication, 65 averaging multiple copies, 109,
measures against, 81-83 151, 157
Applications of data hiding blind attacks, 179
access control, 1, 5, 15, 93, block replacement attack, 10,
96, 115, 146, 175, 188 15D-155
annotation, 1,5,65, 74, 94 ciphertext attack, 179
content authentication, 1, 4, collusion attacks, 109, 151
15, 65, 75-76, 93, 119- concealment attack, 10, 150-
136, 168-173, 178, 188 155
212 Index
copy attack, 65, 184 embedded digital signature for,
countermeasures against, see 121
Countermeasures (against fragile watermarking for, 75-
attacks) 76, 121
double capturing attack, 10, semi-fragile watermarking for,
171-172 69
filtering attacks, 150-155 AWGN (additive white Gaussian
forgery attacks, 123, 188-189 noise), 23, 26-30
geometric distortion, 187, see
also jittering attacks; warp-
B
ing attack
geometric distortions, 151 Binary images
jittering attacks, 149, see also document images, 75-76
geometric distortion; warp- embedding mechanism for, 69-
ing attack 70
audio, 179, 180, 187 equalizing uneven embedding
images, 151 capacity for, 70-73
video frames, 109, 136 line drawing, 74
known data hiding algorithm, perceptual model, 67-69
against, 149-173 pixel fiippability, 68, 84-88
plaintext attack, 82, 179 recovering from printing and
SDMI attacks, see SDMI pub- scanning, 78-80, 88-91
lic challenge robustness for watermarking,
synchronization attack, see ge- 69, 77-80
ometric distortion; jitter- security for watermarking, 81-
ing attack; warping at- 83
tack signature images, 70, 74
unknown data hiding algorithm, watermark applications for, 65,
against, 175-189 74-76
warping attack, 151, 180, 187, Biorthogonal modulation, see Mod-
see also jittering attacks; ulation and multiplexing
geometric distortion Bit error
Audio combat, 70
attacks against audio water- probability, 29
marking, 10, 175-189 recover, in video communica-
golden ear, 183, 184 tions, 145
human auditory system, see Block concealment
Human auditory system attacks by, 10, 150-155
(HAS) edge directed interpolation for,
Authentication, see also Applica- 143, 151
tions of data hiding Block DCT transform, 48, 96, 121-
advantages of watermarks for, 123, 153-154, see also DCT
74, 119-120 Block-based data hiding, 67-73, 96-
attacks against, 171-173 109, 150-155
cryptographic, 121 Bursty errors, 145
Index 213
c JPEG, 96, 104, 108, 120, 121,
123, 124, 131, 133, 152,
166, 167
Capacity
JPEG 2000, 134
embedding, see Embedding ca-
lossy, 3-5, 9, 88, 120-122,133,
pacity
134, 149, 156, 176, 179,
generic channels, see Chan-
188
nel capacity
MP3, 178, 189
CDM, see Modulation and multi-
MPEG, 104, 115, 136
plexing, CDM
scalable, 94
Channel capacity, 5, 27, 28, 31
transcoding, see Transcoding
Channels, 5
wavelet, 134
additive white Gaussian noise
Content authentication, see Au-
(AWGN) channels, 23, 26-
thentication
30
Control data, 114
binary symmetric channel (BSC), Correlation-based detection, 23, 97,
27,30 158
capacity, see Channel capac- distribution of, 24, 98
ity weighted correlator, 98-99
colored Gaussian noise chan- with normalized variance, 23,
nels, 26, 162 97, 99
continuous input and contin- Countermeasures (against attacks)
uous output (CICO), 27 block replacement attack, against,
continuous-input continuous- 153-154
output (CICO), 29 concealment attack, against,
Costa's code, 19, 32 153-154
discrete-input continuous-output double capturing attack, against,
(DICO),28 10,172-173
discrete-input discrete-output forgery attacks, against, 81-
(DIDO), 27, 30 83, 123, 129-130
erasure channels, 50 geometric attack, against, 155-
with side information, 19, 21, 168
159 jittering attacks, against
Ciphers, see Cryptography, encryp- audio, 183
tion video frames, 8, 109-112,
Code division modulation/ multi- 114
plexing, see Modulation RST (rotation, scale, transla-
and multiplexing, CDM tion), against, 155-168
Compliance Cover media, defined, 15
coding standard compliant de- Cryptography
vice, 9, 33, 138-142 cryptanalysis, 179
rights management compliant encryption, 2, 3, 146, 150, 189
device, 178, 188 hash, 119, 121, 189
Compression public-key cryptography, 81,
H.26x,104 119, 121
214 Index
signature, see Digital signa- statistics, see also Correlation-
tures, cryptographic based detection
Customized media decoder, 9, 138- whitening, 98, 162-164
142 Digital Millennium Copyright Act
(DMCA),150
Digital rights management (DRM),
D
146, 175-189, 192
Data hiding, 2 Digital signatures
advantages of, 140 content feature based, 121
binary images, 65-83 cryptographic, 119, 121, 189
color images, 96, 134 digitized signature images, 70,
framework, 15-17, 122-123 74
grayscale images, 96-109, 155- handwritten, 65
168 signature in signature, 74
key elements of, 16-17
layered system structure, 16
video, 109-116 E
Data payloads, see Payloads
DCT (discrete cosine transform), ECC, see Error correction coding
see also Block DCT trans- Embedded data, define, 15
form Embedding capacity, 5, 19,26-33
block-DCT embedding, see Block Embedding distortion, defined, 15
DCT transform; Block- Embedding domain
based data hiding block DCT, 96-109,121-123,
DCT-domain visual model, 103- 153-154
107, 125-128 DFT, 155-168
quantized DCT coefficients, 9, Fourier-Mellin, 157, 168
123-125 pixel, 67, 121
whole-DCT embedding, 153- wavelet, 122, 134
154 Embedding mechanisms, 20-22
Detection additive, 20-21, see also Type-
Baysian rule, 24, 98 I embedding
blind detection, see non-coherent enforcement, 21-22, see also
detection Type-II embedding
coherent detection, 23, 151, spread spectrum, 23, 98, 103,
153 157, 187
detection statistics, 23-24, 151- table lookup, 34, 69, 123-129
156, 158, 166-167 Type-I, see Type-I embedding
hypothesis testing formulation, Type-II, see Type-II embed-
23-24, 97-101 ding
Neyman-Pearson rule, 24, 166- Embedding rate
167 CER (constant embedding rate),
non-coherent detection, 5, 22, 44-51, 112-113
23, 32, 97-101, 151, 155, VER (variable embedding rate),
156, 187 51-53, 112-113
Index 215
Enforcement embedding, 21-22, see Human auditory system (HAS),
also Type-II embedding 182, 184
Error analysis Human visual system (HVS)
detection probability, 167 binary images, 67-69, 84-88
false alarm probability, 24, 165, DCT-domain visual model, 103-
167 107
false negative, see miss detec- grayscale images, 103-107
tion probability Hypothesis testing, 23-24, 97-101
false positive, see false alarm
probability
miss detection probability, 24
I
quantization, 24-26 Imperceptibility, 16, 17, 31, 125,
Receiver operating character- see also Human auditory
istics (ROC) curves, 166- system (HAS); Human vi-
167 sual system (HVS)
ROC curves, see Receiver op- Interleaving, 47,146, see also Shuf-
erating characteristics fling
Error concealment, see also Block
concealment
attacks on data hiding, used J
as, 10, 15D-155
data hiding for, 9, 143-146 Just noticeable difference (JND),
Error correction coding (ECC) 24, 36, 103-107, see also
BCH code, 132 Human auditory system
Extracted data, defined, 15 (HAS); Human visual sys-
tem (HVS)
embeddable components, 42,
F 126
unembeddable components, 31,
False negative, see Error analysis, 41,42, 103
miss detection probabil-
ity
False positive, see Error analysis,
K
false alarm probability Key element of data hiding, see
Fingerprinting, see Applications of Data hiding, key elements
data hiding, fingerprint- of
ing Keys, 54-55
Flippability, for binary image pix-
els, 68, 84-88
L
H Layered structure, for data hiding
system, 16
Hearing, see Human auditory sys- Least significant bit (LSB) embed-
tem (HAS) ding, see Type-II embed-
Host media, defined, 15 ding, LSB embedding
216 Index
Lookup tables Non-coherent detection, see De-
flippable scores, for, 87 tection, non-coherent de-
Type-II embedding, 34, 69, 123- tection
128 Non-compliant media decoder, see
Customized media decoder
M
o
Marked media, defined, 15
Modulation and multiplexing, 33- Original signal, see Host signal
37 Orthogonal modulation, see Mod-
amplitude modulation, 33 ulation and multiplexing,
amplitude modulo modulation, orthogonal modulation
33-34 Ownership protection, see Appli-
biorthogonal modulation, 34, cations, ownership protec-
53 tion
code division modulation/ mul-
tiplexing (CDM), 35-37,
p
50,52
comparison, 35-37
Payloads, 94, see also Embedding
orthogonal modulation, 34, 36-
capacity
37
control data, 94, 114
spatial division, 35, see also
user data, 94, 113
time division modulation/
Perceptual model, see Human au-
multiplexing
ditory system (HAS); Hu-
time division modulation/ mul-
man visual system (HVS)
tiplexing (TDM), 34-37,
Performance
50, 52, 103, 114, 123
ROC curves, see Receiver op-
Moment-based approach, for ana-
erating characteristics
lyzing shuffling, 47, 56
Permutation, see Shuffling
Multilevel embedding
Pseudo-random number, 54, 126
advantages of, 93-94
basic idea of, 94-96
for images, 96-109 Q
for video, 109-116
spectrum partition for, 97-101 Quantization
Multiple-bit embedding, 33-37, see error analysis, 24-26
also Modulation and mul- JPEG quantization table, 101
tiplexing QIM (quantization index mod-
ulation), 25
N
R
Noise
compression, by, 167 Random numbers, see Pseudo-random
defined, 15 number
Index 217
Receiver operating characteristics Signature in signature, 74
(ROC), 166-167, see also Spread spectrum, see Embedding
Detection, Neyman-Pearson mechanisms, spread spec-
rule trum
Registration, see Images, registra- Steganograph~ 2, 191
tion Strength of embedding, see also
Human auditory system
s (HAS); Human visual sys-
tem (HVS)
SDMI (Secure Digital Music Ini-
tiative)
T
background, 175
challenge, see SDMI public chal- TDM, see Modulation and multi-
lenge plexing, TDM
SDMI public challenge Test media, defined, 15
attacks Time division modulation/ multi-
against fragile watermarks, plexing, see Modulation
188-189 and multiplexing, TDM
against robust watermarks,
Transcoding, 8, 9, 94, 109, 137-
179-188
142, see also Compres-
background, 176 sion; Video, downsizing
setup, 176-178 Type-I embedding
Security, see also Adversaries; At- define, 20
tacks embedding capacity, 27, 29
authentication, data hiding for, examples, 21, 101, 114, 157-
126-127, 129-130, 171-
159
173
properties, 23-24
binary images, data hiding for, spread spectrum, 23, 98, 103,
81-83
157, 187
SDMI systems, 178-189 Type-II embedding
Seeds, see Keys
define, 21
Shuffling, see also Uneven embed-
embedding capacity, 27-29
ding capacity
examples, 22, 69-70, 101, 123-
analysis, 47-49, 55-62 128
equalizing uneven embedding
LSB embedding, 22
capacity, for, 46-51 properties, 24-26
examples, 70-73
generation algorithm, 54-55
Side information, see also Control u
data
available to sender in commu- Uneven embedding capacity
nications, 19, 21, 159 difficulty introduced by, 41-
convey to receiver, 52-53, 114, 42
140 equalizing, 46-51, see also Shuf-
Signature, see Digital signatures fling
218 Index
examples, 70-73, 112-113
solutions, 42, see also Embed-
ding rate, CER & VER;
Shuffling
User data, 113
v
Video
authentication, 136
downsizing, 138-142
MPEG compression, 104, 136,
138
Visible watermark, 4
w
Watermark
digital watermark, 3, 15
imperceptible, 3, 4
paper watermark, 2
visible, 4
Watermarking, 2
About the Authors
Min Wu received the B.E. degree in electrical engineering and the B.A.
degree in economics from Tsinghua University, Beijing, China, in 1996 (both
with the highest honors), and the M.A. degree and Ph.D. degree in electrical
engineering from Princeton University in 1998 and 2001, respectively. She
was with NEC Research Institute and Signafy, Inc. Princeton, NJ, in 1998,
and with the Media Security Group, Panasonic Information & Networking
Laboratories, Princeton, NJ, in 1999. Since 2001, she has been with the Fac-
ulty of the Electrical and Computer Engineering Department, the Institute
of Advanced Computer Studies, and the Institute of Systems Research at
the University of Maryland, College Park. Her research interests include in-
formation security, signal processing, and multimedia communications. She
received an NSF CAREER award for her research on information security
and protection in 2002 and holds three U.S. patents on multimedia data
hiding. She is a member of the IEEE and the IEEE Technical Committee
on Multimedia Signal Processing. More information about her research can
be found at http://www.ece.umd.edu/ "'minwu/ research.html .
Bede Liu received the B.S. degree in E.E. from National Taiwan Uni-
versity and the M.E.E. and D.E.E. degrees from the Polytechnic Institute
of Brooklyn. Prior to joining the Princeton University Faculty in 1962, his
industrial associations included Bell Laboratories, Allen B. DuMont Labora-
tory, and Western Electric Company. His current research interests include
signal and image processing, video coding and analysis. He was the IEEE
President of the Circuits and Systems Society and a member of the IEEE
Board of Directors. He received the IEEE Centennial Medal (1984), the
IEEE Acoustic, Speech, and Signal Processing Society Technical Achieve-
ment Award (1985), the IEEE Circuits and Systems Society Education
Award (1988), two Best Paper Awards from IEEE Transactions of Cir-
cuits and Systems on Video Technology (1994 and 1996), the IEEE Mac-
Van Valkenburg Award (1997), the IEEE Third Millennium Medal (2000),
the IEEE Circuits and Systems Society Golden Jubilee Award (2000), and
the IEEE Signal Processing Society Award (2000). He is an IEEE Fellow
and a member of the U.S. National Academy of Engineering. More infor-
mation about his research can be found at http://www.ee.princeton.edu/
people/Liu.php .