Visual CAPTCHAs For Document Authentication
Visual CAPTCHAs For Document Authentication
Visual CAPTCHAs For Document Authentication
Abstract- Visual CAPCTCHAs, like gimpy images, are com- writing a contract, in which you agree to pay, say, 1000
monly used to screen human users from automated computer USD to Bob. You let your personal smartcard, attached to
scripts. We propose using them for direct, visual authentication the PC, digitally sign the contract and you send it to Bob.
of digital documents. The basic assumption is that if it is hard
or impossible for the computer to recognize a document, then Unfortunately, you have a malicious program in your PC
it cannot manipulate it. If such a document is still recognizable (maybe even planted by Bob, who knows?) which changes the
by a human, he or she can be confident that it is authentic, number 1000 into 2000 in the document as it travels towards
or at least that no automated process has manipulated it. the smartcard. The smartcard will produce a perfectly valid
Such authentication is highly desirable in the context of digital digital signature. Because a digital signature is simply a large,
signatures, which, due to the computational complexity, have to
be produced by a computer. It can be a specialized, trusted random-looking binary number, as a human you cannot see
cryptographic computer, e.g. a smartcard, but the path to it is that it corresponds to a different document than the one you
generally not secure. Visual CAPTCHAs can be used for securing wrote. As you send the document along with the signature to
the path without the need for additional hardware. Bob, the malicious program again changes 1000 into 2000.
Now Bob is happy, he receives a contract, digitally signed by
I. INTRODUCTION
you, where you agree to pay double of what he has hoped for.
Digital signatures [1], made public almost three decades Three solutions have been proposed: equipping the external
ago, have a number of advantages compared to their classical, hardware with display capabilities, making the whole PC
hand-written counterparts: they are much harder to forge, can secure, or using visual cryptography for checking that the
be detached from the document, guarantee the integrity of document arrived unchanged to the module performing the
every bit even in a 100-page document, and maybe the cryptography.
most practical advantage leave the document in the digital The first solution is typically a smartcard reader, produced
form, allowing for its transmission by electronic means and for by some trusted entity, with a display and possibly a keypad.
further processing. It can be safely assumed that in today's de- Such devices are commonly used in electronic payment: the
veloped world the majority of documents letters, contracts, price and possibly other information is shown on the display
school transcripts etc. is produced entirely or mainly on and the user authorizes the payment by inserting the payment
computers. Still, most documents today are authenticated in card and typing in his or her PIN. The method could, in prin-
the classical way, by hand-signing the hardcopy. The reason ciple, be used for digitally signing documents: before signing,
is, we believe, that in most cases the whole setup for producing the document would be shown on the trusted reader's display
and signing a digital document cannot be trusted. and the user would have to authorize the signature by typing
Being a cryptographic method, digital signatures require in the PIN. In practice this can work only for very simple
complex and cumbersome computations on large numbers. and short documents, due to limited displaying capabilities
The computations are almost impossible to be performed by of common smartcard readers. Equipping the readers with a
humans and need to be done by computers. On the other hand, big display, just for the purpose of secure digital signing, is
common, general purpose personal computers are susceptible technically possible, but not economical.
to various attacks viruses, Trojan horses, worms, phishing An economically more viable solution is envisioned by
and other hacker attacks and generally cannot be trusted. the Trusted Computing Group (TCG). The idea is to make
Attaching a trusted hardware to the PC, like a smartcard, to the whole PC secure, including its peripherals. On the other
perform the cryptography, does not solve the problem, either. hand, it should remain a general-purpose computer, where the
As long as the PC is vulnerable, the document can be tampered user can still install new software and hardware. To remain
with inside it, on its way to the smartcard, and the user has affordable and compatible with the existing computers, the
no way of noticing it. It is perfectly possible to produce a computer must not differ much from today's PCs, otherwise
document on the PC and it is also possible to produce a valid it would not be accepted by the market. The solution includes
and secure digital signature of a document, once the smartcard a minor hardware modification, adding a specialized crypto-
(or other secure module) receives it. The problem is to ensure graphic chip called "trusted platform module" (TPM) to the
that it is the same document in both cases. mainboard, and adjusting the system software (BIOS, OS, ...)
Suppose, for example, that you are using your PC for to use the TPM to check the system integrity. The system
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:21:22 UTC from IEEE Xplore. Restrictions apply.
works through so-called chain of trust: the first program authentication if the attacker knows the message, as can be
executed after power-on (typically from the BIOS) has to assumed in the case of a vulnerable PC. From the message and
be unconditionally trusted. It computes hash values of the the cyphertext he or she can deduce the key and completely
next programs to be executed (e.g. the operating system) forge the document. Instead, it was proposed to enrich the
and of measurable properties of the hardware to be used, cyphertext with information not contained in the message,
like hard disk ID etc. It produces their digital signatures and but known to the user, for example by enlarging the key and
checks them using the TPM. If the checks pass, the control is the cyphertext and requiring that the decrypted image appears
passed to the next program (e.g. OS loader), which repeats the in a predefined area. A more recent approach [7] proposes
process for its successors. The chain unfolds up to application keeping the physical size unchanged, but incorporating some
programs. secret information, a "watermark" into the cyphertext. Such
Trusted computing does not guarantee that the installed watermark is unknown and thus not noticeable by the attacker,
hard- and software are not malicious, but it notifies the user of but, in the decrypted document, visually recognizable by the
any changes to the system. If the computer has been delivered user. The drawbacks of visual authentication are that it is
in a trusted state, it will remain in it as long as the user does applicable only to black-and-white documents, requires a new
not install anything new or trusts everything he or she installs. "key" (transparency) for every document page and needs the
This trust can be established through signatures by trusted cyphertext and the key be precisely physically aligned for
entities, which are checked by the TPM. TPMs are already the "plaintext" to reappear. In practice, this usually requires
available, but whole systems based on them are not. It is not printing the cyphertext on the paper.
yet known how the actual implementation will look like, since The method proposed here is also a type of visual document
potential customers have expressed some doubts. They fear, for authentication and secures the path from the cryptographic
example, that through TC manufacturers might coerce users module to the user's eye. But, instead of encrypting the
into using or not using some software or hardware, simply document and requiring the user to use an auxiliary device
by declaring it "not trusted". TCG best practice manual [2] (like the key transparency) for decrypting it, the document
denounces such abuses of the technology, but cannot prevent is transformed in a way that makes it practically impossible
it. Also, the flow of personal information in such systems is to recognize for a computer, but not for a human observer.
not transparent and it is feared that they might be unwillingly Unlike the above one, this method is not perfectly (i.e.
disclosed [3]. It is also not clear how backups would work theoretically proven) secure, but sufficiently secure for all
and what happens in the case of a hardware failure. practical purposes, considering the state-of-the-art of available
The third solution attempts to secure the path between the pattern recognition technologies. The method is much easier
secure cryptographic module and the human visual perception. to use, works directly on the screen and does not need external
It is remarkably low-tech, requiring almost no additional hard- tools, like transparencies.
ware. Visual authentication [4] is a method that was initially
II. PROPOSED METHOD
introduced for secure electronic payments over non-trusted
terminals. The method is based on visual cryptography [5], In this section we
which is generally a visual implementation of secret sharing 1) Review CAPTCHAs as a method for obtaining security,
[6], but for the authentication purposes can be looked at as a 2) Describe the assumed setup for applying the proposed
symmetric cryptographic method. The "plaintext" is a black- method, and
and-white image. This image is first oversampled, typically 3) Describe a typical usage scenario of the method.
by a factor of 2 along both axes, and then transformed, The proposed method is envisioned to secure visual doc-
so that each 2 x 2 square (corresponding to a pixel in the uments on their way from a trusted module (a smartcard, a
original image) is made completely black if the corresponding remote trusted computer, a tamper-resistant software [8] etc.)
original pixel was black, or, if the original pixel was white, to the human observer against tampering. It is based on a
contains two random black pixels and the other two white. visual CAPTCHA [9], which exploits the fact that some prob-
Such squares visually appear grey. The actual encryption lems, which are easily solved by humans, cannot be solved
is performed by splitting the transformed image, square-by- by current algorithms. A CAPTCHA is basically a program
square, into two half-images ("shares"). Each grey square is that can generate and grade tests based on such problems and
split into two identical squares, and each black square into thus distinguish between human and computer provers. The
two complementary. Each share then looks like a uniform idea is widely used on the Internet to block automated access
distribution of black and white pixels. However, if the shares to services, like e-mail accounts, search engines, and similar.
were printed on transparencies, their superposition would A very common such test is EZ-gimpy, a word displayed in
produce the transformed image, from which the original one a distorted way, which a human can easily recognize, but
is still visually recognizable. One of the shares can be fixed computers presumably cannot. A more complicated version,
in advance among the communication parties and considered Gimpy, contains several (typically eight) words, from which
"key", and the other, "cyphertext" computed from the key and the prover has to recognize a subset (typically three). Both
the document. problems can in the meantime be automatically solved: Gimpy
Visual cryptography cannot be used directly for message in about 1/3 of the cases and EZ-gimpy even in 92% [10], but
472
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:21:22 UTC from IEEE Xplore. Restrictions apply.
it still remains an accepted fact that humans by far outperform has arrived unchanged to the module. He or she authorizes the
computers in recognition of visual patterns. signing of the document by typing in the authorization code
In the proposed method, we apply more complex visual associated with the transformation. The module checks if the
CAPTCHAs for obscuring visual documents, which, to our authorization code corresponds to the applied transformation
knowledge, have not yet been broken. The trusted module and, if yes, signs the document and sends the signature back
transforms the document's appearance, so that the human can to the user's computer. By deferring the signing until the user
still recognize it, but not the computer. The transformation authorizes the document it is prevented that an attacker obtains
can include inclining the document in 3D, dropping shadows, a valid signature even with user's knowledge. This is necessary
projecting it onto uneven and possibly morphing surfaces, because if the compromised computer is networked the user
letting moving spotlights and magnifying lenses cover it, might not be able to stop the forged signature from being
displaying patterns in the background or semi-transparent distributed even if he or she notices the forgery.
in front of the document, and similar. Automated attackers
(computer programs) cannot tamper with the document be- III. IMPLEMENTATION AND DISCUSSION
cause of the high pattern recognition complexity involved in In our prototype, the "trusted module" is implemented as
reconstructing the document transformation and separating it a program running on the user's computer. For real-world
from the document. Human attackers might be able to roughly application the program can be made tamper-resistant or
recognize the transformation (e.g. estimate the inclination), but even moved to external hardware. It receives text documents,
not to forge it in the limited time. The primary application renders them as images and visually transforms them before
of the method is in digital signatures, but it can be used displaying them. The transformations are computationally
generally for a kind of steganography, hiding information from complex and include coloring, 3D rendering, and animation.
computers. In digital signatures, the purpose is to ensure that Animations are especially practical, because they introduce
the trusted module, which performs the signing, has received not only one more degree of freedom, but also the constraint
the document unmodified. of smoothness. To forge a smooth animation, the attacker
The assumed setup is the following: The user (a human) pro- must be capable of solving the CAPTCHA on-the-fly, between
duces a document on an ordinary general-purpose computer. the frames, or to delay the whole animation something a
The computer itself is not trusted to safeguard the integrity cautious user would certainly notice.
of the document, but there is a trusted module attached to it Due to required computational power (our implementation
or included in it. The module has no input/output capabili- consumes 3/4 of a 3.4 GHz Pentium processor) and transmis-
ties except for electronic communication with the computer. sion bandwidth for the animations, smartcards are hardly an
Beside its primary function, like producing digital signatures, option as external trusted devices. We therefore assume that
the module is capable of transforming the visual appearance such devices will be more sophisticated and will communicate
of the documents it receives. Actually, the module produces a over a high-speed link, like USB, Ethernet, or wireless.
digitized appearance, e.g. a digital video stream, which is sent We have implemented several types of transformations
back to the computer for displaying on its screen. In case of for conveying the document. As the most promising we
multi-page documents, each page is processed separately. consider the "Deforming Surface". The document, seen as
For security reasons, each specific transformation is used black-on-transparent image, is first passed through several
only for one document, like a one-time pad. The user and image-processing filters (shadow-dropping, water ripple, fish-
the module share an enumerated list, agreed in advance, of eye lens ...) and projected onto a 3D surface. The surface itself
transformations which the module will be applying for the is morphing between different shapes, which contain some
documents. Each transformation has a short alphanumerical simple alphanumeric codes. The whole surface is inclined
code associated with it, by which the user authorizes the in 3D (see Figure 1). Another implemented transformation
signing. The code, which is also used only once, has roughly includes making the document semi-transparent, with an os-
the meaning of the Transaction Authorization Number (TAN), cillating transparency, and superimposing it with an animated
commonly used for online banking in Europe. In practice, background, where occasionally some code appears. Yet an-
such a list would be distributed by a trusted authority, e.g. other implementation cuts the document itself into pieces and
electonically and encrypted for the module and by mail for lets them fly and rotate in 3D, so that they only occasionally
humans. come together and produce the original document (Figure 2).
To digitally sign a document, the user sends it to the trusted The only competing method we know about, which works
module. The module transforms the visual appearance of the for larger documents, is Visual Authentication (Trusted Com-
document as it received it (remember that the document could puting is not yet available and certified smartcard readers
have been tampered with on the way). It sends the transformed have too small displays). The advantage of Visual Authen-
appearance back to the computer, which displays it on the tication is that it is perfectly secure. On the other hand, it is
screen. The user looks at the transformed document, checks if clumsy to use, requires a staple of pre-printed transparencies
its content has not been modified and if the transformation is as and perfectly aligning each transparency with the encrypted
expected, i.e. the next one from the transformations list. If both document. We argue that perfect security that is, one that
conditions are satisfied, the user is confident that the document cannot be broken, ever is not needed for our application.
473
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:21:22 UTC from IEEE Xplore. Restrictions apply.
4) so that humans can check the integrity of documents
received by the module. The method also includes
5) a simple challenge-response protocol to prevent the at-
tacker from posing as the human user to obtain services
from the module.
The security of the method is currently not quantifiable.
Future work is intended in two directions. One is to explore
how complex the transformations can get before they prevent
humans from recognizing the document. This question can
only be solved empirically, by testing the method using dif-
ferent CAPTCHAs on large number of human observers. The
complexity itself is a subjective value, depending on the human
observer. Ideally, a kind of perceptual model would evolve,
which would allow for quantifying the CAPTCHA complexity
in terms of resolution, distortion, animation speed, and similar,
and relating it to the ratio of humans capable of solving it. The
complementary research is in the field of pattern recognition,
Fig. 1. A snapshot of the "Deforming Surface" animated captcha. The trying to develop algorithms for breaking the CAPTCHAs and
distorted text is projected onto a morphing surface. At the instant, the code quantifying their performance as a function of CAPTCHA
"1234" can be recognized on the surface. complexity. Combined, they should lead to a relationship
between human-perceived transformation complexity and the
security of the method.
ACKNOWLEDGMENT
The authors would like to thank Jerry Huxtable for his im-
age filters and Sarmad Hussain for help with implementation.
REFERENCES
[1] R. Rivest, A. Shamir, and L. Adleman, "A method for obtaining digital
signatures and public-key cryptosystems," Communications of the ACM,
vol. 21, pp. 120-126, February 1978.
[2] TCG Best Practices Committee, "Design, implementation, and
usage principles for TPM-based platforms," 2005. [Online]. Avail-
able: https://www.trustedcomputinggroup.org/downloads/bestpractices-
/Best-Practices-Principles-DocumentLvl.0.pdf
[3] F. Chiachiarella, U. Fasting, T. Fey, S. Leppler, G. Lux, P. Lubb,
A. Moser, G. Otten, J. Schlattmann, S. Schumann, L. Schweizer,
and F.-J. Souren, "Das Risiko Trusted Computing fur die deutsche
Fig. 2. A snapshot of the "Flying Pieces" animated captcha. The distorted Versicherungswirtschaft," Schriftenreihe des Betriebswirtschaftlichen
text is projected onto a color patched surface which is cut into stripes. The
Institutes des GDV, vol. 13, 2004. [Online]. Available: http://www.gdv-
stripes rotate in 3D and at the instant are close to coming together, so that online.de/tcg/pos-tcg.pdf
the document can be recognized [4] M. Naor and B. Pinkas, "Visual authentication and identification," in
CRYPTO '97: Proceedings of the 17th Annual International Cryptology
Conference on Advances in Cryptology. London, UK: Springer-Verlag,
It suffices if the attacker cannot forge the document in the 1997, pp. 322-336.
[5] M. Naor and A. Shamir, "Visual cryptography," Lecture Notes in
couple of seconds or, in the worst case, minutes the human Computer Science, vol. 950, pp. 1-12, 1995. [Online]. Available:
user needs to check the document integrity and initiate the citeseer.ist.psu.edu/naor95visual.html
signing. We are not aware of a pattern recognition algorithm [6] A. Shamir, "How to share a secret," Communications of the ACM,
vol. 22, no. 11, pp. 612-613, November 1979.
which is capable of breaking an animated CAPTCHA as [7] I. Fischer and T. Herfet, "Visual document authentication using human-
ours, especially not given the time constraint. Even as pattern recognizable watermarks," in Proceedings of ETRICS 2006, LNCS 3995.
recognition algorithms advance, we expect new CAPTCHAs Springer-Verlag, June 2006, pp. 509-521.
[8] D. Aucsmith, "Tamper resistant software: An implementation," in Pro-
to be invented. ceedings of the First International Workshop on Information Hiding.
London, UK: Springer-Verlag, 1996, pp. 317-333.
IV. CONCLUSIONS AND FUTURE WORK [9] L. von Ahn, M. Blum, N. Hopper, and J. Langford, "CAPTCHA: Using
We have presented a method, which: hard A! problems for security," in Proceedings of Eurocrypt, 2003, pp.
294-311. [Online]. Available: citeseer.ist.psu.edu/vonahnO3captcha.html
1) uses animated visual CAPTCHAs [10] G. Mori and J. Malik, "Recognizing objects in adversarial
2) for obfuscating human-recognizable text, so that it is not clutter - breaking a visual captcha," in Proc. Conf. Computer
Vision and Pattern Recognition, vol. 1. Madison, USA: IEEE
recognizable by computers, Computer Society, June 2003, pp. 134-141. [Online]. Available:
3) with the purpose of securing the transmission path from citeseer.ist.psu.edu/moriO3recognizing.html
a trusted module to the human eye,
474
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:21:22 UTC from IEEE Xplore. Restrictions apply.