[go: up one dir, main page]

0% found this document useful (0 votes)
72 views13 pages

DeepTextMark A Deep Learning-Driven Text Watermark

Deep TextMark watermarking

Uploaded by

usmankhamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views13 pages

DeepTextMark A Deep Learning-Driven Text Watermark

Deep TextMark watermarking

Uploaded by

usmankhamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3376693

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2023.0322000

DeepTextMark: A Deep Learning-Driven Text


Watermarking Approach for Identifying Large
Language Model Generated Text
TRAVIS MUNYER1 , ABDULLAH TANVIR1 , ARJON DAS1 , and XIN ZHONG1
1
Department of Computer Science, University of Nebraska Omaha, Omaha, NE 68182 USA (e-mails: tjmunyer@gmail.com; atanvir@unomaha.edu;
arjondas@unomaha.edu; xzhong@unomaha.edu)
Corresponding author: Xin Zhong (e-mail: xzhong@unomaha.edu).

ABSTRACT The rapid advancement of Large Language Models (LLMs) has significantly enhanced the
capabilities of text generators. With the potential for misuse escalating, the importance of discerning whether
texts are human-authored or generated by LLMs has become paramount. Several preceding studies have
ventured to address this challenge by employing binary classifiers to differentiate between human-written
and LLM-generated text. Nevertheless, the reliability of these classifiers has been subject to question. Given
that consequential decisions may hinge on the outcome of such classification, it is imperative that text source
detection is of high caliber. In light of this, the present paper introduces DeepTextMark, a deep learning-
driven text watermarking methodology devised for text source identification. By leveraging Word2Vec
and Sentence Encoding for watermark insertion, alongside a transformer-based classifier for watermark
detection, DeepTextMark epitomizes a blend of blindness, robustness, imperceptibility, and reliability. As
elaborated within the paper, these attributes are crucial for universal text source detection, with a particular
emphasis in this paper on text produced by LLMs. DeepTextMark offers a viable "add-on" solution to
prevailing text generation frameworks, requiring no direct access or alterations to the underlying text
generation mechanism. Experimental evaluations underscore the high imperceptibility, elevated detection
accuracy, augmented robustness, reliability, and swift execution of DeepTextMark.

INDEX TERMS Text Source Detection, Large Language Model Text Detection, Text Watermarking, Deep
Learning

I. INTRODUCTION

Large Language Models (LLMs), such as ChatGPT [1],


have recently achieved notable success. The advancements
in LLMs can be advantageous across various domains, yet
there also lies the potential for inappropriate applications.
A prevailing concern regarding publicly accessible LLMs is
the challenge in distinguishing between machine-generated
and human-written text, a difficulty that persists even in
instances of misuse [2]. For instance, students might utilize
automatically generated texts as their own submissions for as- FIGURE 1. Overall idea of DeepTextMark.

signments, evading conventional "plagiarism" detection. The


high fidelity of the text generated by LLMs exacerbates the Various classifiers have been developed to differentiate
challenge of detection, marking a significant hurdle. Again, between LLM-generated text and human-written text [2], [7].
there exist advanced text augmentation methods capable of However, the efficacy of these classifiers remains somewhat
effortlessly modifying any given text [3] [4] [5] [6]. There- constrained at present. Numerous studies have explored the
fore, devising a method to ascertain the origin of text could accuracy of these classifiers [8], along with techniques to cir-
serve as a valuable approach to curtail similar misapplications cumvent classifier detection [9]. A reliable source detection
of LLMs. mechanism that is challenging to bypass is crucial, given its

VOLUME 11, 2023 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3376693

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

potential applications in identifying plagiarism and misuse. architecture [14], to discern watermarked text, enhancing de-
Therefore, employing text watermarking for text source de- tection accuracy and robustness. This classifier can accurately
tection appears to be a prudent approach, as it is both reliable differentiate between marked and unmarked sentences based
and challenging to circumvent. solely on the content and features extracted from the text,
Text watermarking entails the covert embedding of infor- without altering its appearance or readability in any notice-
mation (i.e., the watermark) into cover texts, such that the able way. This imperceptibility ensures that the watermark
watermark is only discernible by authorized detectors. While remains covert and undetectable to human observers, thereby
watermarking is more conventionally applied to images [10], preserving its effectiveness for authentication or tracking pur-
its application to text can enable the identification of text poses without alerting potential infringers to its existence.
originating from specific sources, such as an LLM (refer This amalgam of pre-trained models for substitution word
to figure 1 for the proposed source detection mechanism). selection and the transformer-based watermark detector un-
However, conventional text watermarking techniques often derscore the novel contributions of this paper. Being deep
necessitate manual intervention by linguists, exhibit a lack learning-driven, the watermarking and detection techniques
of robustness, and do not possess the blindness property. are scalable and fully automatic. The classifier necessitates
Specifically, these traditional techniques are prone to minor only the watermarked text for highly accurate classification,
modifications of the watermarked text (lacking robustness), epitomizing the technique’s blindness. Furthermore, the pa-
and necessitate the original text for the extraction or detection per elucidates an extension of this technique to multiple
of the watermark (lacking blindness). For a watermarking sentences, like essays, accentuating a primary application.
technique to be practically viable in detecting LLM-generated Empirical evidence is provided demonstrating near-perfect
text, the method should be scalable (i.e., automatic). More- accuracy as text length increases, enriching the method’s
over, since the watermark detector may not have access to reliability, especially with a modest sentence count.
the original text at the time of detection, it should not require The primary contributions encapsulate: (1) an "add-on"
it (i.e., blind). Additionally, the detection process should be text watermarking method for detecting generated text with-
highly reliable, aiming to achieve superior classification accu- out necessitating access to the LLMs’ generation phase; (2)
racy. Ideally, the watermarked text should remain impercep- an automatic and imperceptible watermark insertion method;
tible, ensuring the natural preservation of the text’s meaning. and (3) a robust, high-accuracy deep learning-based text wa-
Lastly, the classification mechanism should be resilient to termark detection method, rendering DeepTextMark a valu-
minor alterations of the text (i.e., robust). able asset in the realm of text authenticity verification.
A nascent method has been proposed for embedding wa- The rest of this paper is organized as follows. We discuss
termarks into LLMs [11]. However, a notable limitation of related works in section II. The watermark insertion and de-
this method is its requisite access to the text generation phase tection process is discussed in section III. Experiments show-
of the LLMs, a requirement that may not be practical in ing the reliability, imperceptibility, robustness, and empirical
real-world applications, particularly when the source models runtime are shown in section IV followed by a conclusion of
of the LLMs are not open-source. This dependency poses the work in section V.
a significant challenge as many LLMs are proprietary or
their internals are not publicly disclosed, thereby restricting II. RELATED WORK
the applicability of such watermarking techniques. Moreover, Our contributions are summarized as robust detection of
without the requisite access to the text generation phase, im- LLM-generated text, a novel method for text watermarking
plementing watermark-based source detection mechanisms insertion, and a novel approach for text watermarking de-
becomes inherently challenging. This highlights the necessity tection; the following sections provide a review of related
for developing alternative watermarking techniques that are work in these domains. Section II-A offers a concise review
both effective and adaptable to varying levels of access to the of state-of-the-art methods for LLM-generated text detection,
LLMs’ internal workings. while Section II-B delves into classical text watermarking
This paper introduces DeepTextMark, a robust and blind techniques.
deep learning-based text watermarking method principally
aimed at detecting LLM-generated text. DeepTextMark A. TEXT SOURCE DETECTION FOR LARGE LANGUAGE
employs word substitution, utilizing a pre-trained amal- MODELS
gam of Universal Sentence Encoder embeddings [12] and Recent endeavors have been directed towards developing
Word2Vec [13] to identify semantically congruent substitu- classifiers aimed at differentiating between LLM-generated
tion words. The inserted watermark is invisible to the naked text and human-written text. The prevailing approach entails
eye, and the alterations made to the text, such as substitut- the collection and labeling of LLM-generated and human-
ing words with synonyms while keeping the grammatical written texts, followed by the training of a binary classi-
structure intact, are designed to ensure that the watermark fier through supervised learning. Although the efficacy of
remains undetectable to readers. Therefore, it preserves the these classifiers has yet to be fully established, some pre-
imperceptible nature of the watermark within the text. More- liminary analyses have been reported [8], [9]. One study [9]
over, we propose a novel classifier, grounded in transformer elucidated three distinct methods, substantiated with exam-
2 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3376693

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

ples, to circumvent the GPTZero [7] classifier detection. B. TRADITIONAL TEXT WATERMARKING
Another investigation [8] conducted a direct assessment of Common classical text watermarking methods can be cate-
GPTZero’s accuracy, uncovering inconsistencies in its abil- gorized into open space, syntactic, semantic, linguistic, and
ity to detect human-written text. Moreover, classifier-based structural techniques. A brief summary of each of these tech-
LLM-generated text detectors commonly necessitate a sub- niques is provided below.
stantial character count to perform detection accurately. For Open Space: The open space method embeds a watermark
instance, GPTZero [7] required a minimum of 250 characters into text data by adding extra white space characters or spaces
to initiate detection. Looking ahead, OpenAI is planning at specific locations in the text [18]. For instance, extra white
a cryptography-based watermarking system for ChatGPT- space between words or lines could be encoded as a 1, while
generated text detection [15], although no definitive work has normal white space could encode as a 0. The strategy for
been disclosed as of yet. Zero-shot learning-based methods adding extra white space and its encoding is subject to the
have also demonstrated some advancement. For example, Cer implementation. Although the open space method can be
et al. [16] reported an increment in AUROC from 1% to simple to implement and automate, it may be susceptible to
14% compared to other zero-shot detection strategies across watermark removal without altering the text’s meaning, as an
various datasets; however, the accuracy might still fall short in individual could easily eliminate the extra white space.
real-world applications concerning text generated by models. Syntactic: Certain word orders can be altered without chang-
A method has been proposed for detecting LLM-generated ing the meaning or grammatical correctness of a sentence.
texts based on text watermarking [11], which involves water- The syntactic method watermarks text by modifying the order
marking the text by modifying the LLMs (sensitive tokens of words in sentences [19]. For example, "this and that" could
are defined and excluded from the output of the LLMs). encode to 1, and "that and this" could encode to 0. However,
In contrast, our proposed DeepTextMark does not necessi- this method may not scale well since many sentences do not
tate access to or modifications of the LLM. Distinct from have sequences of words that qualify for reordering. Addi-
model-dependent methods, DeepTextMark exhibits a model- tionally, this method might necessitate manual intervention
independent feature, enabling its application to any text. by a linguist, as developing an automated system to detect
Moreover, DeepTextMark employs a substantially more com- reorderable words could be challenging.
pact architecture with about 50 million parameters, whereas Semantic: Semantic text watermarking techniques embed the
the method in [11] necessitates billions of parameters to watermark by substituting words with synonyms [19]. While
implement the watermarking process. the semantic method can be automated, as briefly discussed
in this paper, classical implementation requires the original
A pertinent topic in text watermarking for identifying gen- text to detect the watermark (i.e., classical semantic text wa-
erated text is the potential use of paraphrasing attacks to termarking is non-blind). Moreover, determining which word
bypass AI-detectors, as elaborated in a study by Sadasivan et to replace, and selecting an appropriate synonym, presents a
al. [17]. This concern is not relevant to our target scenario, as non-trivial challenge.
DeepTextMark focuses solely on the detection of text output Linguistic: The linguistic category of text watermarking
by an LLM. Should a human writer meticulously rewrite the amalgamates semantic and syntactic techniques, embedding
text generated by an LLM, the resultant paraphrased text may watermarks into text through a blend of word rearrangement
not be subject to "plagiarism" detection in our scenario. and synonym replacement [19].
Relative to existing state-of-the-art methods, our pro- Structural: The structural technique replaces certain sym-
posal exhibits several advantages: (1) Our watermarking bols with visually similar letters and punctuation, albeit with
method renders detection bypass challenging unless the different Unicode representations [20]. It may be relatively
LLM-generated text is rewritten, as the watermark is embed- straightforward to detect these symbols either manually due
ded in undisclosed locations, necessitating a rewrite for its to minor visual differences, or automatically by identifying
removal. Once rewritten, the text is deemed as distinct human- characters from uncommon Unicode sets. Reverting the wa-
written text; (2) The method demonstrates high detection termarking without altering the text’s meaning could also
accuracy, nearing 100%, which significantly elevates with be straightforward. Due to these limitations, structural tech-
an increasing number of sentences, substantiated through niques do not align with our primary objective of watermark-
binomial distribution analysis. Even on a single sentence, a re- ing text generated by LLMs.
liable detection rate of 86.52% is achieved; (3) To our knowl- Contrastingly, we employ word2vec [13] and the Universal
edge, this is the inaugural LLM-independent, deep learning- Sentence Encoder [12] for watermark insertion, and devise
based general text watermarking method; (4) Unlike some a transformer-based model for watermark detection. This
methods necessitating access to text generation processes, approach aligns well with our target application of text source
our approach requires no access to the LLM’s original text detection, as it facilitates blindness while enhancing imper-
generation, allowing our watermarker to function as an "add- ceptibility, robustness, and reliability. Our watermark inser-
on" to the LLM system (see Figure 1). tion and detection methodology is rooted in deep learning,
distinguishing our method from traditional text watermarking
techniques.
VOLUME 11, 2023 3

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3376693

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 2. Watermark insertion details.

III. THE PROPOSED DeepTextMark word replacements within each sentence which is denoted by
This section presents the details of DeepTextMark. The pro- multiple word synonyms substitution. In the terminal phase
posed watermark insertion and detection schemes are respec- of our experimentation, we embraced a flexible approach,
tively discussed in Sections III-A and III-B. This discussion permitting the substitution of any candidate word with an
shows the automatic and blindness traits achieved by Deep- available synonym in a sentence.
TextMark. Section III-C analyzes the application scenario of Sentence Encoding: At this juncture, each sentence proposal
DeepTextMark to multiple sentences. is evaluated solely based on word-level quality. We ascertain
that the quality of the watermarked sentence is enhanced
A. WATERMARK INSERTION when the architecture is allowed to consider sentence-level
In contemporary settings, individuals employ extensive lan- quality. To facilitate this, we employ a pretrained Universal
guage models to produce textual content and subsequently Sentence Encoder [12] to score the quality of each sentence
rephrase it using synonymous words as a strategy to circum- proposal. This encoder transposes a sentence into a high-
vent plagiarism. This serves as the rationale behind our intro- dimensional vector representation. Initially, both the origi-
duced watermark insertion model, aiming to detect alterations nal sentence and each sentence proposal are transposed into
in text even when someone attempts to paraphrase content vector representations using the Universal Sentence Encoder.
generated by large language models in order to evade plagia- Subsequently, we compute the similarity score for each sen-
rism detection. The watermark insertion process is presented tence proposal by measuring the cosine similarity between the
in Figure 2. vector representation of the original sentence and that of the
Word Selection: Given a sentence, we initially segregate sentence proposal. The sentence proposal exhibiting the high-
candidate words from punctuation, stopwords [21], and est similarity score is identified as the potential watermarked
whitespace, preserving these elements to retain the original sentence. Given that the watermarking process necessitates no
sentence structure. Each candidate word is then transposed human intervention, the methodology is rendered automatic.
to an embedding vector utilizing a pre-trained Word2Vec Grammatical Adjustment: In pursuit of mitigating gram-
model [13]. A roster of replacement words is engendered matical inaccuracies, essential measures have been under-
by identifying the n nearest vectors to the candidate word taken. Our methodology encompasses word substitution with
vector in the Word2Vec embedding space, where n is a pre- synonymous counterparts, whilst steadfastly preserving the
defined integer, and reconverting these vectors back into original sentence structure. In this vein, we have eschewed
words. We engender a list of sentence proposals by sub- the elimination of stopwords or the alteration of punctuation,
stituting each candidate word with its list of replacement thereby safeguarding sentence integrity.
words, thereby fabricating unique sentence variations. The The process of synonym selection is meticulously de-
loci of the watermark in each sentence proposal are indirectly signed to favor optimal replacements. Nevertheless, chal-
ascertained by Word2Vec. Each unique variation is deemed lenges emerge in instances where the most apt synonym
a sentence proposal, representing a potential watermarked diverges in grammatical structure or meaning. For instance,
sentence. Empirically, employing a larger corpus of nearest replacing the term ’elections,’ a plural noun, with ’election,’
vectors allows for the consideration of an augmented set its singular counterpart, could engender grammatical incon-
of replacement words and consequently more sentence pro- gruity. To forestall such scenarios, a preliminary determina-
posals, potentially ameliorating imperceptibility albeit at the tion of the grammatical number of the target word is initiated
expense of elevated processing time. We also delved into with a class engine [22] which employs diverse methods to
various word-level watermarking techniques. Initially, a sole facilitate plural and singular inflections, the selection of "a"
word within each sentence was substituted with its synonyms or "an" for English words based on pronunciation, and the
which we denote as single word synonym substitution. This manipulation of numbers represented as words. This module
scope was subsequently broadened to encompass multiple- comprehensively provides plural forms for nouns, most verbs,
4 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3376693

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

and select adjectives, including "classical" variants like trans- A few examples of sentence candidates with varied syn-
forming "brother" to "brethren" or "dogma" to "dogmata." onyms selections are presented in Table 2. An analysis of the
Singular forms of nouns are also available, allowing the term primary reveals that the closest synonyms are typically
choice of gender for singular pronouns, such as transforming adverbs like first, thereby deviating from the grammatical
"they" to "it," "she," "he," or "they." Pronunciation-based condition, as the original term is a proper noun (NNP). Given
"a" or "an" selection is extended to all English words and the categorical distinction that our original word falls into
most initialisms. It is crucial to note that when using plural the proper noun category (NNP), our focus is exclusively
inflection methods, the word to be inflected should be the first on synonyms that share this grammatical property. This ra-
argument, expecting the singular form; passing a plural form tionale informs our decision to replace the word primary
may yield undefined and likely incorrect results. Similarly, with leading instead of first. We have implemented the type
the singular inflection method anticipates the plural form of of parts of speech by using the POS-tagger provided by the
the word. The plural inflection methods also offer an optional NLTK [23]. Specifically, we employed the Penn Treebank
second argument indicating the grammatical "number" of POS tagger. The tagging process involved tokenization of
the word or another word for agreement. Subsequently, syn- input text, breaking it into individual words or sentences, and
onyms congruent with the grammatical form of the original subsequently assigning part-of-speech tags to each word. The
word are curated. POS tagging was conducted using a Hidden Markov Model
(HMM), trained on a large annotated corpus, such as the Penn
TABLE 1. Example Sentence Candidates of Correct and Incorrect Synonym Treebank corpus, wherein the model learned the probabilities
Selections
of transitions between different POS tags and the probabilities
1. The September-October term jury had been charged by Fulton of observing specific words given a certain POS tag. The
Superior Court judge Durwood Pye to investigate reports of possi-
ble "irregularities" in the hard-fought primary which was won by
Viterbi algorithm was employed during the tagging of new
mayor-nominate Ivan Allen Jr. text to identify the most likely sequence of POS tags given the
2. The September-October terms jury had been charged by Fulton observed words and the learned probabilities. This approach
Superior Court judge Durwood Pye to investigate reports of possi-
ble "irregularities" in the hard-fought primary which was won by
proved effective for obtaining accurate and contextually rel-
mayor-nominate Ivan Allen Jr. evant part-of-speech annotations in diverse textual datasets.
3. The September-October condition jury had been charged by Algorithm 1 outlines the entire operational process.
Fulton Superior Court judge Durwood Pye to investigate reports of
possible "irregularities" in the hard-fought primary which was won
by mayor-nominate Ivan Allen Jr.
Algorithm 1 Watermark Insertion
1: function WatermarkInsertion(input_text)
2: word_embedder ← Word2Vec
A few examples of sentence candidates with correct and
3: sentence_encoder ← SentenceEncoder
incorrect synonym selections are presented in Table 1. It is
4: input_embeddings ← Encode(word_embedder, in-
imperative to note that when we scrutinize the word term, we
put_text)
encounter the closest synonyms, some of which contravene
5: sentence_proposals ← GeneratePropos-
the grammatical criteria due to their distinct grammatical
als(input_text)
numbers, with one being singular and the other plural. Con-
6: proposals_embeddings ← En-
sequently, given that our initial word is in the singular form,
code(sentence_encoder, sentence_proposals)
our consideration is limited exclusively to synonyms in the
7: best_proposal ← ComputeCosineSimilar-
singular form. Consequently, in lieu of employing terms, we
ity(input_embeddings, proposals_embeddings)
opt to substitute it with condition.
8: marked_text ← GrammaticalAdjust-
Analogous complexities arise concerning parts of speech,
ment(best_proposal)
as certain words harbor synonyms across diverse lexical cat-
9: return marked_text
egories. To adeptly navigate this intricacy, integration of the
classic POS (Part of Speech) tagger [23] has been effected.
Post identification of the word’s grammatical number, the
B. WATERMARK DETECTION
endeavor to pinpoint synonyms aligning with its specific part
The watermark detector operates as a binary classifier
of speech is undertaken. This bifurcated approach underpins
categorizing inputs into "watermarked" and "unmarked"
both syntactic and grammatical consistency in our synonym
classes, leveraging network architectures inherent in trans-
substitution process.
formers [14]. We have used the Bidirectional Encoder Rep-
resentations from Transformers (BERT) pre-trained model
TABLE 2. Example Sentence Candidates with Varied Synonyms
which is capable of capturing the contextual meaning of
The primary thing she did was to take off her hat and then as she words in a sentence. Unlike traditional methods that treat
had no other covering she. each word as independent, BERT considers the entire context
The first thing she did was to take off her hat and then as she had
no other covering she. of the sentence, including the relationships between words.
The leading thing she did was to take off her hat and then as she Hence, it will possess the capability to recognize sentence
had no other covering she. modifications and distinguish between marked and unmarked
VOLUME 11, 2023 5

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3376693

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 3. Watermark detection details.

sentences. Furthermore, BERT serves as a powerful feature Algorithm 2 Watermark Detection


extractor, automatically extracting high-dimensional repre- 1: function WatermarkDetection(test_text)
sentations of text at various levels of granularity. It’s scalabil- 2: embeddings ← BERT_Encode(test_text)
ity and generalization capabilities enable it to handle diverse 3: decoder_output ← TransformerDe-
datasets and adapt to different domains and languages with coderBlock(embeddings)
minimal additional training. The architecture of this classifier 4: pooled_output ← Pooling(decoder_output)
is delineated in Figure 3. 5: dropout_output ← Dropout(pooled_output)
The watermark detection classifier endeavors to minimize 6: fc_20_output ← FullyConnected(dropout_output,
the ensuing binary cross-entropy loss: 20)
7: dropout_fc ← Dropout(fc_20_output)
8: watermark_score ← FullyConnected(dropout_fc, 1)
L = yi · log(p(yi )) + (1 − yi ) · log(1 − p(yi )), (1) 9: return watermark_score
where yi denotes the label, and p(yi ) represents the predicted
probability. The parameters of the BERT encoder are initially
frozen, allowing the loss to converge with the transformer The proof underpinning this claim is articulated as follows:
block being trainable. Upon convergence of the loss with a Presume the probability of accurately classifying a sentence
frozen BERT, the parameters of BERT are unfrozen, the learn- as watermarked or not is denoted by p, and remains consistent
ing rate of Adam is attenuated, and training is recommenced across all sentences. In a scenario where at least half of the
until loss convergence is reattained. This iterative training sentences in a text comprising n sentences are accurately
paradigm can precipitate a notable enhancement in prediction classified, the entire text is deemed correctly classified. It can
performance training solely the transformer block. The out- be substantiated that the probability of accurately classifying
comes of the training regimen are elaborated in section IV-B. exactly x sentences can be encapsulated by the binomial prob-
This architecture, post convergence, embodies the watermark ability, denoted as P(x). Hence, the probability P(x > ⌈n/2⌉)
detector. Given that the detector necessitates no access to the can be formulated as the summation in Equation (2):
original data for prediction execution, the methodology is
n  
characterized as blind. X n
P(x > ⌈n/2⌉) = × pi × (1 − p)n−i . (2)
i
i=⌈n/2⌉
C. WATERMARK DETECTION FOR MULTIPLE SENTENCES
A prominent application scenario for the proposed water-
IV. EXPERIMENTS
marking technique is its deployment on a collection of sen-
tences. Consequently, the classification outcome is contin- This section illustrates the effectiveness of DeepTextMark
gent on the majority classification rendered for each indi- by analyzing its properties in regard of text watermarking.
vidual sentence. Employing the binomial distribution, it can Dataset preparation is explained in section IV-A. The reliabil-
be demonstrated that the likelihood of accurately classifying ity of the watermark detection is shown in section IV-B. Sec-
a sentence collection converges to near perfection as the tion IV-C explains the ablation study. Section IV-D provides a
volume of sentences in the collection escalates, provided summary of the imperceptibility, and the imperceptibility and
the probability of accurately classifying a single sentence is detection accuracy trade-off. Comparisons are made between
reasonably high (> 85%). Notwithstanding, a superior prob- DeepTextMark and traditional text watermarking methods.
ability of correct classification for a single sentence implies Section IV-E provides an analysis of the experiments used
a reduced sentence count is requisite to attain near-perfect to test robustness, which is followed by an evaluation of the
accuracy. Algorithm 2 comprehensively outlines the entire empirically observed runtime in section IV-G.
working procedure.
6 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3376693

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

A. DATASET insertion process. Assuming the likelihood of accurately clas-


Training Data: A dataset comprising 34,489 sentences was sifying a single sentence aligns closely with the validation
assembled from the Dolly ChatGPT Dataset [24]. This ap- accuracy computed during training, and that this likelihood
proach aims to underscore the generalization capability of the remains consistent across all sentences, we can forecast the
proposed DeepTextMark. Robust performance across diverse probability of accurately classifying a collection of sentences
textual genres exemplifies the model’s aptitude for generaliz- utilizing the summation outlined in Eq. (2). Under this as-
ing to arbitrary text. Evaluations have been also conducted on sumption, the probability of correct prediction corresponding
texts engendered by expansive language models such as Chat- to varying sentence counts is tabulated in Table 3 and Table 4.
GPT, as depicted in an instance in Figure 4. Within the train- Table 3 and Table 4 underscore the reliability of the method,
ing set, half of the sentences are watermarked employing the highlighting an increased likelihood of accurate detection as
methodology delineated in section III-A, whilst the remain- the number of sentences rises. This trend is observed for both
der are retained unaltered. This yields a dataset encompass- single and multiple synonyms substitution, encompassing
ing nearly 17,000 watermarked samples and approximately both dolly and C4 datasets.
17,000 unmarked samples. The corpus of watermarked and TABLE 3. Sentence Count on Detection Accuracy (%) (Single Synonym)
unmarked sentence samples are randomly amalgamated, with
75% earmarked for training, and the residual 25% allocated Num Sentences Dolly C4
(%) (%)
for validation—this composition underpins the training of the 1 86.52 76.30
detector. To facilitate the assessment of imperceptibility in 5 98.02 90.97
section IV-D, a dataset encapsulating all 34,489 sentences as 10 99.92 98.49
20 99.99 99.75
original and watermarked pairs is retained. 30 100.00 99.96
Testing Data: We assessed the performance of our model 50 100.00 99.99
by subjecting it to testing using C4 datasets [25] containing 60 100.00 100.00
multiple sentences. To evaluate its performance, we system-
atically extracted 100 tokens at a time, aggregating them into
TABLE 4. Sentence Count on Detection Accuracy(%) (Multiple Synonyms)
a unified dataset featuring numerous sentences. This process
yielded a total of 8,800 datasets. Subsequently, we conducted Num Sentences Dolly C4
rigorous testing on these datasets, incorporating both single (%) (%)
1 94.87 95.72
and multiple synonym substitutions to gauge the model’s 5 99.88 99.92
adaptability and effectiveness. 10 100.00 100.00

B. WATERMARK DETECTION ACCURACY


C. ABLATION STUDY
The proposed watermark detection classifier is trained using
the dataset discussed in Section III-B. We train the architec- To evaluate the effectiveness of our proposed method, we
ture with the parameters of the pre-trained BERT encoder conducted an ablation study by systematically removing com-
frozen for 6 epochs, with an Adam learning rate set to 0.0001. ponents from our model and observing their impact on per-
Then, we unfreeze the pre-trained BERT architecture, reduce formance. Specifically, we conducted four experiments de-
the learning rate of Adam [26] to 0.000001, and train for 50 noted as A, B, C, and D, each representing a variant of our
more epochs. In our training model, 148 million parameters model with varying degrees of complexity. Experiment D,
have been used. The result validation accuracy, which repre- which incorporates all proposed enhancements, achieved the
sents the sentence-level detection accuracy on the dolly vali- highest accuracy among the tested configurations. This result
dation dataset, is 86.52% for single synonyms and 94.87% for suggests that the additional components in experiment D
multiple synonyms. And for C4 datasets, its 76.30% for single contribute positively to the overall performance of the model.
synonyms and 95.72% for multiple synonyms substitution. Furthermore, by comparing the accuracy of experiment D
Additionally, we conduct this training process on several with those of experiments A, B, and C, as shown in Table 5,
versions of the dataset, each with an increasing number of we can pinpoint the specific contributions of each component
sentences. We observe that as we continually increase the to the model’s effectiveness. Our findings underscore the
size of the dataset, the validation accuracy improves. Training importance of the incorporated enhancements and highlight
with an increasing number of sentences could further improve the significance of their inclusion in our proposed approach.
the sentence-level prediction accuracy. We find that the cur- TABLE 5. Ablation Study
rent training is balanced on table 3, table 4 and Section IV-D,
as this validation yields near-perfect prediction accuracy with Experiment A B C D
Single Synonyms ✓ ✓
only a small collection of sentences. Multiple Synonyms ✓ ✓
As elucidated by the binomial distribution in Section III-C, Handling Singular/Plural Number ✓ ✓
the probability of accurately classifying a collection of sen- Handling Parts of Speech ✓ ✓
Detection Accuracy (%) 88.36 86.52 92.74 94.87
tences markedly increases with the augmentation of the sen-
tence count in the text, attributable to our sentence-level
VOLUME 11, 2023 7

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3376693

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

D. IMPERCEPTIBILITY OF WATERMARK INSERTION rendered the sentence nonsensical, as this method did not
A sentence bearing an imperceptible watermark should main- account for sentence structure. Some sentence examples are
tain grammatical correctness and retain the same meaning illustrated in Table 6 (additional sentence examples can be
as the original sentence. Thus, the imperceptibility of text found in the supplementary documents). While the traditional
watermarking should be gauged by sentence meaning simi- method can be effectively detected by our detection network
larity. The Universal Sentence Encoder [12] encapsulates the for a single sentence, as depicted in Table 10, the mSMS on the
semantic meaning of sentences into an embedding vector, collected dataset significantly improves with DeepTextMark
enabling the measurement of sentence meaning similarity multiple synonyms and not that much far from single syn-
through the computation of cosine similarity between two onyms as well.
sentence embeddings. Hence, we propose to quantify the
TABLE 6. A few example sentences: 1. the original text; 2. watermarked
imperceptibility of text watermarking using the Sentence text by the traditional method; 3. watermarked text by the DeepTextMark.
Meaning Similarity (SMS): with single synonym substitution; and 4. watermarked text by the
DeepTextMark with multiple synonyms substitution

SMS = S(encode(o), encode(m)), (3) 1. Which episodes of season four of game of thrones did michelle
maclaren direct.
where o denotes the original text, m denotes the watermarked 2. Which installment of season four of game of thrones did
text, encode(·) represents a neural network that computes a michelle maclaren direct.
3. Which sequence of season four of game of thrones did michelle
semantic embedding (e.g., the Universal Sentence Encoder), maclaren direct.
and S is a function that computes the similarity between the 4. Which sequence of season quaternity of game of thrones did
vectors (cosine similarity is utilized in this paper). Computing michelle maclaren direct.
the mean SMS (mSMS) over a dataset provides an average 1. Who saved andromeda from the sea monster?
measure of text watermark imperceptibility. We have per- 2. Who saved andromeda from the ocean monster?
formed our experiment for the test dataset discussed in Sec- 3. Who saved andromeda from the ocean monster?
tion III-B and we are able to achieve 0.9765 mSMS for single 4. Who saved andromeda from the ocean monstrosity?
synonyms and 0.9892 mSMS for multiple synonyms while the
traditional method provides 0.9794 mSMS. The high mSMS E. ROBUSTNESS
value exemplifies the imperceptible watermarking of texts by Robustness in the domain of image watermarking implies
DeepTextMark. An illustration of watermarking a text pro- that the watermark must remain invariant to malicious attacks
duced by ChatGPT is presented in Figure 4, with additional or unintentional modifications [28]. Translating this notion
examples of original and watermarked paragraphs available of robustness to text watermarking is fairly straightforward.
in the supplementary documents. A robust text watermarking method should ensure that re-
moving the watermark is challenging, whether the removal
attempts are unintentional, arising from normal processing,
or intentional attacks targeting the watermark. For watermark
detection to fail, the watermarked text should need to be
altered beyond recognition.
Given that this is an emerging area, no standard method
exists to measure robustness for text watermarking [19].
Therefore, we propose a metric named Mean Impact of Attack
(mIOA) to measure robustness. The IOA is defined as follows:
IOA(x, y) = (1−|detect(x)−y|)−(1−|detect(xa )−y|), (4)
where x represents the target data (text of one or more sen-
tences in this paper), xa denotes the attacked data obtained by
arbitrarily attacking x, y signifies the label for x (watermarked
or unmarked), and detect(·) denotes the utilization of the
detection network to output the predicted label of the input.
IOA gauges the change in accuracy following an attack on
the data. A positive IOA indicates a detrimental effect on
FIGURE 4. A watermarked example from ChatGPT with prompt "Give me
a short essay about deep learning".
prediction performance due to the attack, while a negative
For analytical purposes, we implement traditional text IOA indicates improved prediction performance post-attack
watermarking and test the proposed watermark detection (which should be rare). An IOA further from 0 (either less
network at a single-sentence level. Specifically, we create than or greater than) signifies a higher impact from the attack.
an implementation of semantic watermarking using Word- An IOA of 0 indicates the attack did not affect the prediction
Net [27] to select synonyms. Although this traditional method accuracy. Calculating the mean IOA over a dataset yields the
achieved some success, the replacement word occasionally mIOA.
8 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3376693

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

Data for Robustness Test. We have prepared two sets of data: text data by replacing n existing sentences in the text with
one with watermarked text and one with unmarked text. Each randomly selected sentences of the opposite type, where n
set contains 1000 collections, with each collection comprising is a specified integer. Table 9 presents the watermarked and
20 sentences. These sentences are randomly selected from the unmarked mIOA for several values of n.
testing set described in Section III-B. TABLE 9. Replace Sentences Attack
We then define several attacks and compute the mIOA
for each attack to gauge the robustness of our watermarking Replaced Watermarked Unmarked
technique. These attacks are designed to progressively mod- Sentences mIOA mIOA
ify the text, with the severity of each attack increasing the 1 0.001 0.013
3 0.008 0.058
dissimilarity between the modified and original texts. Each 5 0.040 0.126
attack also represents a common interaction with the text. By 7 0.103 0.210
attacking both watermarked and unmarked data, we aim to
evaluate the detection accuracy for both types of data, which The mIOA increases as n increases, and DeepTextMark
helps ensure that our system is equally effective at detecting remains close to 0, indicating a minimal impact on detection
watermarks and identifying unmarked data. performance. Since the modified text becomes increasingly
Remove Sentences Attack. We remove a selected number of dissimilar to the original text post-attack, an escalating per-
n sentences from the text. This action reduces the watermark formance impact is expected and acceptable as the sever-
presence in the text, thus challenging the robustness of the ity of each attack intensifies. These experiments affirm that
detection. Table 7 presents the mIOA on both watermarked DeepTextMark is robust to text modifications stemming from
and unmarked datasets for several values of n. In all cases, common text interactions.
the total number of sentences is 20.
TABLE 7. Remove Sentences Attack
F. COMPARATIVE ANALYSIS
Removed Watermarked Unmarked To start our comparative analysis, we have used three methods
Sentences mIOA mIOA and their combinations. The first approach is the Traditional
1 0.000 0.022 method, involving a simple single-word modification without
3 0.000 0.033
5 0.001 0.028 any components of the proposed DeepTextMark. Following
10 0.029 0.030 that, we introduce DeepTextMark, which encompasses both
15 0.058 0.169 single and multiple synonym substitutions while rectifying
17 0.110 0.191
19 0.206 0.280 grammatical errors. We perform the comparison in terms
of imperceptibility and detection accuracy which has been
shown in table 10 where we can see that our DeepTextMark
The results show that the mIOA increases as the severity
with multiple synonyms performed very well in terms of both
of the attack intensifies (i.e., more sentences are removed),
imperceptibility and detection accuracy.
yet the performance remains commendable as the mIOA stays
close to 0. Interestingly, the mIOA is consistently higher on TABLE 10. Comparative analysis in terms of mSMS and Detection
Accuracy
the watermarked data.
Add Sentences Attack. In this attack, a specified number Method mSMS Detection Accuracy
of sentences (represented by n) with the opposite label are DeepTextMark (Single Synonyms) 0.9765 0.8652
DeepTextMark (Multiple Synonyms) 0.9892 0.9487
randomly added to the text. For instance, watermarked sen- Traditional 0.9794 0.8836
tences are added to unmarked text. Increasing the value of
We have also performed a deeper comparative analysis
n challenges the robustness of the detection, as it dilutes the
between our approach and the method proposed by Kirchen-
percentage of text that corresponds to the expected label. Ta-
bauer et al. [11]. For clarity within the context of this pa-
ble 8 illustrates the mIOA on the watermarked and unmarked
per, we will refer to their method as the Watermark Logit
datasets for several values of n.
Processor (WLP) method, to prevent any naming confusion.
TABLE 8. Add Sentences Attack
It’s important to highlight that the WLP method necessitates
Added Watermarked Unmarked access to LLMs, specifically utilizing them as a logit pro-
Sentences mIOA mIOA cessor to favor the selection of "green" tokens during text
1 0.000 0.026
3 0.001 0.061 generation. On the other hand, our proposed method operates
5 0.003 0.115 independently and does not require access to LLMs.
10 0.069 0.269 To ensure a fair comparison, it is imperative that both meth-
ods are evaluated using the same source of text, specifically an
The data shows that the mIOA increases as n increases. LLM. Consequently, for text generation, we have employed
DeepTextMark maintains a high performance, as the mIOA the Open Pre-trained Transformer (OPT-2.7B). The primary
remains close to 0 for a reasonable n. objective of this experiment is to apply our method, alongside
Replace Sentences Attack. This attack adopts a similar data the WLP method, to watermark the content generated by
dilution approach as the add sentences attack. It distorts the OPT-2.7B and subsequently evaluate the detection accuracy
VOLUME 11, 2023 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3376693

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

for comparison purposes. To generate a substantial amount of post-generation, while text deletion removes tokens from the
text content, we utilized a subset of the C4 dataset, comprising generated output, potentially diminishing text quality by re-
22k text samples, as the source of prompts for the LLM in a ducing the effective language model (LM) context width. Text
seeded environment to yield deterministic outputs with a set substitution attacks involve replacing one token with another,
of 500 sequences of length T = 200 token sequences which which can be automated through dictionary or LM techniques
is similar to the WLP paper. The authors of WLP papers but may degrade text quality.
proposed two different methods which we denote WLP- Our comparative analysis, summarized in Table 12, reveals
multinomial sampling and WLP-beam search to avoid con- the robustness of DeepTextMark and WLP. The WLP study
fusion. With this setup, upon inputting text (prompt) samples involved meticulous parameter adjustments to optimize their
from the C4 dataset into the base LLM, we obtain blocks model’s performance. Despite being trained on the Dolly
of text, which we term the "Original Generated Content." Dataset, our model exhibited superior performance when
Subsequently, we apply our proposed method to the "Origi- tested on the C4 dataset produced by the LLM, outperform-
nal Generated Content" to produce the DeepTextMark-based ing in most scenarios for watermark detection accuracy. For
watermarked content. Conversely, when we incorporate the robustness evaluation, we introduced new metrics while also
WLP logit processor with the base LLM, the identical input using the True Positive Rate (TPR) and False Negative Rate
text samples yield the WLP Watermarked Content. Figure 5 (FNR) metrics from the WLP paper to ensure a fair assess-
illustrates the text generation methodology employed for the ment.
comparative evaluation between DeepTextMark and WLP. In The Area Under the Receiver Operating Characteristic
this configuration, our model demonstrates a notable detec- (AUC) curve and True Positive Rate (TPR) are key metrics in
tion rate of 90.66%. This outcome, achieved despite training binary classification. AUC illustrates the trade-off between
on the distinct Dolly dataset, underlines the robust general- sensitivity (TPR) and 1 - specificity (False Positive Rate)
ization capability of our approach, affirming its effectiveness across different thresholds, ranging from 0 to 1. A value of 0.5
across diverse datasets. implies no discriminative ability, whereas 1 indicates perfect
classification. Higher AUC values denote superior model per-
formance. TPR, or sensitivity/recall, is the ratio of correctly
identified positive instances to all actual positives, defined
as: TPR = TruePositives/(TruePositives + FalseNegatives).
Conversely, the False Negative Rate (FNR) quantifies the
proportion of positives incorrectly classified as negatives:
FNR = FalseNegatives/(Positives + FalseNegatives)
A superior TPR, signifying DeepTextMark’s proficiency
in correctly identifying positive instances while minimiz-
ing false negatives, underscores its efficacy in capturing the
FIGURE 5. Sample generation process for testing and comparing majority of actual positive cases. Concurrently, the smaller
watermark detection accuracy of DeepTextMark and WLP. FNR suggests a reduced probability of overlooking positive
instances, highlighting DeepTextMark’s competence in avert-
In our experimental evaluation, we utilized a subset of 500
ing false negatives and precisely identifying positive cases. In
data points to assess the watermark detection performance of
light of our model’s outperformance compared to WLP, it can
both models. Despite the distinct datasets employed in train-
be inferred that DeepTextMark demonstrates a heightened
ing our model DeepTextMark, it demonstrates a commend-
capability in detecting watermarked sentences, surpassing the
able detection rate, only marginally lower than that reported in
performance of WLP in this regard. This substantiates the
the WLP paper. Specifically, DeepTextMark achieved an ac-
conclusion that our model excels in discerning watermarked
curacy of 90.74%, closely approaching the 92.42% accuracy
content more effectively.
of the WLP model. This proximity in performance is notewor-
thy, considering the differences in training datasets. Table 11
TABLE 12. Robustness comparison of DeepTextMark and WLP
presents a detailed comparative analysis of the watermark
detection accuracies between DeepTextMark and WLP. Model ϵ TPR FNR
multinomial sampling 0.1 0.819 0.181
multinomial sampling 0.3 0.353 0.647
TABLE 11. Detection accuracy of DeepTextMark and WLP with smaller multinomial sampling 0.5 0.094 0.906
datasets multinomial sampling 0.7 0.039 0.961
beam search 0.1 0.834 0.166
Model Detection Accuracy (%)
beam search 0.3 0.652 0.348
WLP 92.43
beam search 0.5 0.464 0.536
DeepTextMark 90.74
beam search 0.7 0.299 0.701
DeepTextMark - 0.830 0.170
We conducted a robustness comparison between the two
models, considering three attack types: text insertion, dele- Figure 6 delineates the interplay between TPR and FNR
tion, and substitution. Text insertion attacks add extra tokens for our proposed method about the established method, WLP.
10 VOLUME 11, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3376693

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

V. CONCLUSION
Recently, the use of LLMs has surged significantly in both
industry and academia, mainly for text generation tasks.
Nevertheless, in certain scenarios, it is crucial to ascertain
the source of text—whether it is generated by an LLM or
crafted by a human. Addressing this requirement, we in-
troduce a deep learning-based watermarking technique de-
signed for text source identification, which can seamlessly
integrate with existing LLM-driven text generators. Our pro-
posed method, DeepTextMark, stands out due to its blind,
robust, reliable, automatic, and imperceptible characteristics.
Unlike common direct classification techniques [7] for source
detection that demand a substantial amount of characters for
accurate prediction, our watermarking technique enables both
watermark insertion and detection at the sentence level. Our
findings demonstrate that with the insertion of watermarks,
the accuracy of our detection classifier can approach near-
perfection with merely a small set of sentences. Given that
FIGURE 6. TPR and FNR Trade-offs
the watermark is embedded in each sentence individually,
the robustness and reliability of the watermark enhance with
Each data point on the plot encapsulates the performance of an increasing number of sentences. The core advantages of
a method at distinct decision thresholds. The visual exami- our work include: an "add-on" text watermarking method
nation of the scatter plot underscores that our method does facilitating the detection of generated text without requiring
not lag behind WLP in terms of TPR and FNR characteristics access to the LLMs’ generation phase; an automatic and
across various operational points. This observation is crucial imperceptible method for watermark insertion; and a robust,
in establishing the efficacy of our method, aligning it favor- high-accuracy, deep learning-based text watermark detection
ably with the performance benchmarks set by WLP. methodology.
While DeepTextMark introduces a significant advance-
G. EMPERICAL RUNNING SPEED ment in text watermarking using deep learning, we recognize
This section evaluates the running speed of DeepTextMark. a few areas where future enhancements could be beneficial.
The experiments concerning running speed are conducted First, the effectiveness of DeepTextMark is closely tied to the
on an Intel i9-13900k CPU. We measure the time taken for representativeness of the training data. Efforts to diversify this
watermark insertion across 1000 unmarked sentences and data could further improve its applicability across various text
compute the sentence-level average watermark insertion time. styles and languages. Second, as DeepTextMark functions in
Similarly, we time the watermark detection process on 1000 a ’plug-in’ manner, its utility is contingent on the initial wa-
watermarked sentences and compute the average detection termarking of the generated text. Without pre-watermarking,
time. The average times for watermark insertion and detec- detection capabilities are limited, pointing to a dependency
tion, in seconds, are provided in Table 13. that may affect its applicability in certain scenarios. Lastly,
while the method currently shows promising results in wa-
TABLE 13. DeepTextMark Runtime on a single CPU core termarking texts of standard lengths, we are exploring ways
to adapt it more effectively for very short or stylistically
Component Time per Sentence (seconds) diverse texts. These limitations represent opportunities for
Insertion 0.27931 ongoing research and underscore the potential for continuous
Detection 0.00188
improvement in the field of AI-driven text watermarking.
In conclusion, our study has successfully introduced Deep-
As demonstrated, both the insertion and detection pro- TextMark, a novel deep learning-driven approach for text
cesses run quickly, serving as efficient "add-on" components watermarking, offering a robust solution for distinguishing
for text source detection. The insertion process incurs a higher between human-authored texts and those generated by large
overhead compared to the detection process. It is important language models. As we look toward the future, several
to note that these experiments were conducted using only a promising directions can further enhance and expand the
single core of the CPU. By parallelizing the implementation, utility of our approach. We envision enhancing the robustness
the overhead from the insertion process could be significantly of DeepTextMark against more advanced text manipulation
reduced, especially on server-level machines, which are typi- techniques, especially those using AI-based rewriting tools,
cally employed to implement LLMs in our target application to maintain its effectiveness in increasingly sophisticated dig-
scenario. ital environments. Moreover, exploring scalability to manage
larger and more diverse datasets will be crucial in adapt-
VOLUME 11, 2023 11

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3376693

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

ing our method for big data applications. Another signifi- [15] ‘‘How the ChatGPT Watermark Works and Why it Could be Defeated,’’
cant direction involves extending the compatibility of Deep- https://www.searchenginejournal.com/chatgpt-watermark/475366/#close,
2023, last accessed on Jul 10, 2023.
TextMark with various large language models, broadening its [16] E. Mitchell, Y. Lee, A. Khazatsky, C. D. Manning, and C. Finn, ‘‘Detectgpt:
applicability across different AI-generated text scenarios. De- Zero-shot machine-generated text detection using probability curvature,’’
veloping real-time applications, such as content management 2023.
[17] V. S. Sadasivan, A. Kumar, S. Balasubramanian, W. Wang, and S. Feizi,
system plugins, will also be pivotal in dynamically detecting ‘‘Can ai-generated text be reliably detected?’’ 2023.
and managing AI-generated content. Lastly, we acknowledge [18] C. Ou, ‘‘Text watermarking for text document copyright protection,’’ Com-
the importance of addressing the ethical and legal implica- puter Science, vol. 725, 2003.
[19] N. S. Kamaruddin, A. Kamsin, L. Y. Por, and H. Rahman, ‘‘A review of text
tions surrounding text watermarking, particularly in terms of watermarking: Theory, methods, and applications,’’ IEEE Access, vol. 6,
privacy and data security in the age of AI. This aspect is pp. 8011–8028, 2018.
critical to ensuring that our methodologies align with societal [20] S. G. Rizzo, F. Bertini, and D. Montesi, ‘‘Fine-grain watermarking
for intellectual property protection,’’ EURASIP Journal on Information
norms and legal standards. As we continue to build upon Security, vol. 2019, no. 1, p. 10, Jul 2019. [Online]. Available:
the foundation laid by DeepTextMark, these future endeavors https://doi.org/10.1186/s13635-019-0094-2
will undoubtedly contribute to the evolving landscape of text [21] K. V. Ghag and K. Shah, ‘‘Comparative analysis of effect of stopwords
removal on sentiment classification,’’ in 2015 International Conference on
watermarking and AI-generated content detection, reinforc- Computer, Communication and Control (IC4), 2015, pp. 1–6.
ing the importance of authenticity and integrity in digital [22] P. Dyson, ‘‘Inflect,’’ https://pypi.org/project/inflect/, accessed: February
communications. 22, 0
[23] S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python.
O’Reilly Media, 2009. [Online]. Available: https://www.nltk.org/book/
REFERENCES [24] databrickslab. (2023) Dolly 15k dataset. [Online]. Available: https:
//github.com/databrickslabs/dolly/tree/master/data
[1] ‘‘ChatGPT,’’ https://openai.com/blog/chatgpt, 2023, last accessed on Jul [25] Hugging Face, ‘‘Hugging face datasets,’’ https://huggingface.co/datasets/
10, 2023. c4, accessed: February 22, 0
[2] ‘‘New AI classifier for indicating AI-written text,’’ [26] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic optimization,’’
https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text, in 3rd International Conference on Learning Representations, ICLR 2015,
2023, last accessed on May 02, 2023. San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings,
[3] A. Onan, ‘‘Gtr-ga: Harnessing the power of graph-based neural networks Y. Bengio and Y. LeCun, Eds., 2015. [Online]. Available: http://arxiv.org/
and genetic algorithms for text augmentation,’’ Expert Systems with Appli- abs/1412.6980
cations, p. 120908, 2023. [27] G. A. Miller, ‘‘Wordnet: A lexical database for english,’’ Commun.
[4] A. Onan and K. F. Balbal, ‘‘Improving turkish text sentiment classification ACM, vol. 38, no. 11, p. 39–41, nov 1995. [Online]. Available:
through task-specific and universal transformations: an ensemble data https://doi.org/10.1145/219717.219748
augmentation approach,’’ IEEE Access, 2024. [28] W. Wan, J. Wang, Y. Zhang, J. Li, H. Yu, and J. Sun, ‘‘A comprehensive
[5] A. Onan, ‘‘Srl-aco: A text augmentation framework based on semantic role survey on robust image watermarking,’’ Neurocomputing, vol. 488,
labeling and ant colony optimization,’’ Journal of King Saud University- pp. 226–247, 2022. [Online]. Available: https://www.sciencedirect.com/
Computer and Information Sciences, p. 101611, 2023. science/article/pii/S0925231222002533
[6] ——, ‘‘Bidirectional convolutional recurrent neural network architecture
with group-wise enhancement mechanism for text sentiment classifica-
tion,’’ Journal of King Saud University-Computer and Information Sci-
ences, vol. 34, no. 5, pp. 2098–2117, 2022.
[7] ‘‘GPTZero,’’ https://gptzero.me/, 2023, last accessed on Jul 10, 2023.
[8] ‘‘Is gptzero accurate? can it detect chatgpt? here’s what our tests
revealed,’’ https://nerdschalk.com/is-gptzero-accurate-detect-chat-gpt-
detector-tested/, 2023, last accessed on Jul 10, 2023.
[9] ‘‘Testing gptzero: A trending CHATGPT detection tool,’’
https://michaelsheinman.medium.com/testing-gptzero-a-trending-
chatgpt-detection-tool-3ee14a056543, 2023, last accessed on Jul 10, TRAVIS MUNYER received the B.S. degree in
2023. Computer Science and the B.S. degree in Cyberse-
[10] H. Fang, Z. Jia, Z. Ma, E.-C. Chang, and W. Zhang, ‘‘Pimog: An effec- curity from the University of Nebraska at Omaha
tive screen-shooting noise-layer simulation for deep-learning-based water- in May 2023. He is currently pursuing an M.S. de-
marking network,’’ in Proceedings of the 30th ACM International Confer- gree in Computer Science with a specialization in
ence on Multimedia, 2022, pp. 2267–2275. Interactive Intelligence from the Georgia Institute
[11] J. Kirchenbauer, J. Geiping, Y. Wen, J. Katz, I. Miers, and T. Goldstein, of Technology.
‘‘A watermark for large language models,’’ in Proceedings of the 40th From 2020 to 2023, he was an Undergraduate
International Conference on Machine Learning, ser. Proceedings of Ma-
Researcher with the Machine Learning and Com-
chine Learning Research, A. Krause, E. Brunskill, K. Cho, B. Engelhardt,
puter Vision group at the University of Nebraska at
S. Sabato, and J. Scarlett, Eds., vol. 202. PMLR, 23–29 Jul 2023, pp.
17 061–17 084. Omaha. He is currently a Software Engineer at a well-known tech company
[12] D. Cer, Y. Yang, S. yi Kong, N. Hua, N. Limtiaco, R. S. John, N. Con-
headquartered in Olathe, Kansas. His research interests include computer
stant, M. Guajardo-Cespedes, S. Yuan, C. Tar, Y.-H. Sung, B. Strope, and vision, natural language processing, image and text watermarking, and ap-
R. Kurzweil, ‘‘Universal sentence encoder,’’ 2018. plications of machine learning to cybersecurity.
[13] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, ‘‘Dis- Mr. Munyer was a recipient of the Outstanding Cybersecurity Graduate
tributed representations of words and phrases and their compositionality,’’ Award from the University of Nebraska at Omaha.
Advances in neural information processing systems, vol. 26, 2013.
[14] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
L. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proceedings
of the 31st International Conference on Neural Information Processing
Systems, ser. NIPS’17. Red Hook, NY, USA: Curran Associates Inc.,
2017, p. 6000–6010.

12 VOLUME 11, .

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3376693

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

ABDULLAH ALL TANVIR is currently pursuing a


Ph.D. degree from the University of Nebraska at
Omaha in the research area of Machine Learning
and Computer Vision.
Prior to his Ph.D. pursuits, Mr. Tanvir applied
his expertise as a Machine Learning Engineer in
a renowned IT company. In this role, he actively
contributed to the development of innovative solu-
tions, leveraging Machine Learning techniques to
solve complex problems. His practical experience
in industry has complemented his academic endeavors, providing him with
a holistic perspective on the application of theoretical concepts in real-
world scenarios. His research interests lie on artificial intelligence, machine
learning, computer vision, natural language processing.
Mr. Tanvir was awarded the GRACA fund in recognition of his exceptional
research achievements at the University of Nebraska at Omaha.

ARJON DAS received a B.S. degree in Computer


Science and Engineering from Chittagong Univer-
sity of Engineering and Technology, Bangladesh,
in 2018 and an M.S. degree in Computer Science
from the University of Nebraska at Omaha, NE, in
2023.
From 2021 to 2023, he served as a Research
Assistant in the RNA Lab at the University of Ne-
braska at Omaha. His research interests encompass
computer vision, self-supervised learning, as well
as image and text watermarking.

XIN ZHONG received his Ph.D. from New Jer-


sey Institute of Technology, Newark, New Jersey,
U.S.A., in 2018. He is presently an assistant pro-
fessor in the Department of Computer Science,
University of Nebraska Omaha. His research in-
terests include digital image processing and anal-
ysis, computer vision, pattern recognition, compu-
tational intelligence, machine learning, deep learn-
ing, and image watermarking.

VOLUME 11, . 13

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4

You might also like