Detecting Deceptive Language in Crime Interrogation

Yi-Ying Kao¹⁰,
Po-Han Chen¹⁰,
Chun-Chiao Tzeng¹⁰,
Zi-Yuan Chen¹⁰,
Boaz Shmueli¹⁰ &
…
Lun-Wei Ku¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12204))

Included in the following conference series:

International Conference on Human-Computer Interaction

7947 Accesses
1 Citations

Abstract

Deception detection is a vital research problem studied by fields as diverse as psychology, forensic science, sociology. With cooperation with National Investigation Bureau, we have 496 transcript files, each of which contains a conversation of an interrogator and a subject of a real-world polygraph test during interrogation. Researchers have explored the possibility of natural language process techniques in gaming, news articles, interviews, and criminal narratives. In this paper, we explore the effect of the frontier natural language process technique to detect deceptiveness in these conversations. We regard this task as a binary classification problem. We utilize four different methods, inclusive of part-of-speech extraction, one-hot-encoding, means of embedding vectors, and BERT pre-trained model, to capture hidden information of transcript files into vectors. After that, we take these vectors as training samples of a hierarchy neural network, which is constructed by a fully-connected layer and/or an LSTM layer. After training, our system can take a transcript file as its input and classify whether the subject is deceptive or not. An F1 score 0.733 is achieved from our system.

You have full access to this open access chapter, Download conference paper PDF

Deep Neural Network Architectures for Speech Deception Detection: A Brief Survey

Use of Natural Language Processing to Identify Inappropriate Content in Text

A comprehensive review on automatic hate speech detection in the age of the transformer

Article Open access 09 October 2024

Keywords

1 Introduction

Deception detection is a vital research problem studied by fields involving human interaction. The discrimination between truth and lies has drawn significant attention from fields as diverse as psychology, forensic science, sociology.

In this paper, we target the detection of deception in the interviews of real-world crime interrogation. With cooperation with National Investigation Bureau, we explore the effect of the frontier natural language process technique for deceptive language detection. Specifically, we utilize the transcript files of the polygraph test during interrogation. The polygraph test procedure has three phases:

1.
Pre-test phase

In the pre-test phase, the polygraph is not used. The pre-test starts with the interrogator having an interview with the subject. Here, the interrogator will ask the subject questions that are directly related to the case (“related questions”) and questions that are not directly related to the case (“control questions”). Examples of a related question and a control question are “ ” (In this case, did you attack the defendant first?) and “ ” (In your experience, have you ever made any mistake but lied and did not admit it?). These questions may be open-ended or close-ended. With the conversation, the interrogator examines the behavior, body language, and speech of the subject. During the phase, the interrogator also decides what questions should be asked when the polygraph is connected. This phase can take from 30 min to 90 min.
2.
Test phase

The pre-test phase leads to the test phase, during which the actual polygraph participates. The interrogator first explains to the subject that how the test will be conducted, introducing how the polygraph works and the closed-end questions to which the subject will be questioned later. The subject is then moved to the polygraph room and connected to the polygraph machine. The polygraph machine measures the subject’s respiration, heartbeat rate, blood pressure, and perspiration. Following the hook-up, the interrogator asks the subject a series of close-ended (i.e., yes/no) questions, for example, “Did you steal the money?”. The subject is expected to answer with either a “yes” or a “no”. The polygraph records the physiological responses during this phase. Once the questioning is over, the subject is disconnected from the polygraph machine.
3.
Post-test phase

The last phase is the post-test phase. During this phase, the results of the pre-test and test phases are analyzed by the interrogator, and he/she makes a decision regarding the truthfulness of the subject.

For methodology, we define our task as a binary classification problem to predict whether a subject is deceptive or innocent based on his/her interviewing transcript with the interrogator. With the aid of CKIP parser [1], fastText pre-trained word vectors^{Footnote 1} and BERT pre-trained model [2], we propose four different modeling system with fully-connected and/or LSTM [3] neural networks to perform prediction based on the encoded transcript data.

CKIP is an open-source library that is capable of performing natural language process tasks on Chinese sentences, such as word segmentation, part-of-speech tagging. fastText is a lightweight library for text representation. Its pre-trained model, trained on Common Crawl and Wikipedia corpus, has the ability to capture hidden information about a language such as word analogies or semantic. As for BERT, which stands for Bidirectional Encoder Representations from Transforms, is the state-of-the-art contextual embedding model that can turn a sentence into its corresponding vector representation. LSTM (Long Short Term Memory) is a special kind of recurrent neural network (RNN) structure that is capable of learning long-term dependencies. It is very suitable for processing sequence data such as conversations since the meaning of a word in a sentence usually depends on previous words.

We apply four different methods including (1) part-of-speech extraction, (2) one-hot-encoding, (3) mean of word embedding vectors and (4) BERT model to each utterance of interrogator and subjects. After that, we use a hierarchical method to aggregate the hidden representations of them, and then generate a single prediction label which indicates the deceptiveness or honesty.

2 Background

Even though the literature indicates that many types of deception can be identified because the liar’s verbal and non-verbal behavior varies considerably from that of the truth teller’s [4], the reported performance of human lie detectors rarely achieves at a level above chance [5]. The challenge for people to distinguish lies from truths leads to the design that the annotators are the people who express instead of the people who receive, which causes the lack of real data.

Recent advances in natural language processing motivate the attempt to recognize the deceptive language automatically. Researchers have explored its possibility in gaming [6, 7], news articles [8], interviews [9], and criminal narratives [10]. However, some of the previous works conducted experiments in pseudo experiments or were required hand-craft features, which might include human bias. This issue may make the developed models unable to be applied to real situations.

In this paper, we use the data from real-world crime interrogation as well as modern natural language process techniques to address the task of predicting deceptiveness and honesty in transcript files during interrogation.

3 Dataset

With cooperation with the National Investigation Bureau, we have 496 transcript files of polygraph tests during interrogation. Each of the transcript files consists of a field indicating the case is judged as lying or not and 220 conversation entries on average. For each conversation entry, in addition to the utterance of the interrogator and the subject, it contains two additional numbers. One of them indicates whether the entry is a question or an answer while the other denotes whether the entry belongs to the related question or control question. Table 1 illustrates two sample conversation entries in a transcript file.

Table 1. Two sample conversation entries of a transcript file. It’s originally written in Chinese. We translate it into English for demonstration purpose.

Full size table

Note that when we say a “transcript file,” it stands for a file that contains “sentences,” each of which consists of “words.” A transcript file corresponds to an interrogation case. The dataset contains 496 transcript files comprising 226 deceptive cases and 270 honest cases. In total, there are about two million characters. To parse the Chinese content, we utilize CKIP parser to extract the part-of-speech information and segment each sentence into word-level tokens. There are 24853 unique Chinese and English words, numbers, and punctuation in our dataset.

The part-of-speech information can be put into a neural network classifier (detailed in the following “Part-of-speech Extraction” section). As for sentence segmentation, we can convert each token-formatted sentence into a vector representation. These representation vectors can then be put into our LSTM model to encode the transcript file. We also utilize BERT pre-trained embedding model to encode transcript files directly. Finally, we use the encoding of each transcript file to predict whether a subject is deceptive or not. Details of how we convert sentences into vectors are elaborated in the next section.

4 System Overview

4.1 Part-of-Speech Extraction

In this method, we make a hypothesis that if a person lies in a conversation, he/she will use more words to express contrast, e.g. “however,” “but,” “nevertheless.” Furthermore, people who are deceptive will have more chances describing an event in third-person point to keep themselves away from it. In short, we believe that if a person lies, there may exist some patterns in the words he/she uses. We extract part-of-speech with CKIP parser from each transcript file and count the number of each part-of-speech tag. Then we take them as entities of input features and feed them into a fully-connected linear binary classifier. Table 2 shows a sample result of part-of-speech extraction.

Table 2. A sample result of part-of-speech extraction. It shows pairs of a Chinese word and its corresponding part-of-speech tag. The Chinese sentence means: “In this case, did you attack the defendant first?”

Full size table

Except for the “Part-of-speech Extraction” method mentioned above, in general, we take the following steps on each of the structures we propose to address the task:

1.
Embed each sentence in transcript files into a vector representation,
2.
Encode each transcript file into vector format based on the sentence-level encoded vector obtained from the previous step, and
3.
Train the binary classifier by leveraging vectors obtained from step two.

For step one mentioned above, we utilize three different methods (detailed in the following sections) to embed hidden information into representation vectors:

One-hot-encoding
Mean of embedding vectors
BERT model

Despite the difference between these methods to encode a transcript file, we use the same hierarchical neural-network structure, as depicted in Fig. 1, to perform classification and prediction. Additionally, we take the same steps to train neural networks. The followings are details about embedding sentences.

4.2 One-Hot-Encoding

We apply one-hot-encoding process, to encode each sentence of a transcript file into a one-hot vector as described below:

1.
Extract all the words that appear in transcript files, inclusive of questions from interrogator and answers from the subject, into a vocabulary set.
2.
For each transcript file, prepare a vector which has elements as many as the number of words in the vocabulary set. Each of the elements corresponds to a word in the vocabulary set and is assigned to 0 initially.
3.
Assign the element to 1 if its corresponding word appears in the sentence.
4.
Finally, we take the vector containing zeros and ones as the representation vector of a sentence.

For example, assume we have a vocabulary set containing words: this, is, an, apple, a, pen, and assume each word corresponds to index 0 to 5 of a vector. Then we can encode a sentence “this is an apple” to a vector containing {1, 1, 1, 1, 0, 0} while the sentence “this is a pen” will be encoded to a vector with the value {1, 1, 0, 0, 1, 1}.

After converting all the sentences in transcript files with the process mentioned above, we feed these result vectors into an LSTM network sequentially, taking the last hidden state as the encoding of a transcript file. As for the binary classifier, we use a fully-connected linear neural network, followed by a sigmoid activation function. We input the last hidden state of the LSTM network to the classifier and take the output to be the prediction of the system. Figure 1 depicts the process.

4.3 Mean of Embedding Vectors

For each sentence in a transcript, we collect all the words appearing in it, using the fastText pre-trained model to encode these words into vectors. Next, we calculate the mean of these vectors element-wisely, then concatenating the result with two other numbers, which respectively indicate whether the entry is a question or answer and whether it belongs to relation question or control question. We assign 1.0 to the first number if the entry is an answer or 0.0 if it’s a question. Likewise, we assign 1.0 to the second number if the entry is a related question or 0.0 if it’s a control question. Figure 2 illustrates the concept. After converting each sentence to a vector, we input them into the neural network described in Fig. 1.

4.4 BERT Model

BERT is a contextual embedding model. It captures both meanings of the word and the information of its surrounding context. Unlike the fastText pre-trained model, which addresses the embedding task in word-level, the BERT model can process sentence-level embedding. Therefore, we can use the BERT pre-trained model to encode each sentence of a transcript directly.

Next, we take the same methods as the previous structure to concatenate two additional numbers, get the encoding of transcripts, and train the linear classifier. Figure 3 illustrates how we obtain a sentence vector, which is the input of the LSTM network.

5 Experiments

We use pairs of transcript representation vector and the corresponding label as the ground truth to train this classifier. There are 496 cases consisting of 226 deceptive cases and 270 honest cases in our dataset.

We split our dataset into a training set, a validation set, and a testing set. To make our experiment more reliable, we use cross-validation with stratification based on class labels (deceptive and honest). We split our dataset into ten splits, taking one of them to be the validation set, another to be the testing set, while the splits left out are aggregated to be the training set. With stratified sampling, training and validation sets contain approximately the same percentage of deceptive/honesty cases.

Besides, we train our models with transcripts that contain (1) control questions only (2) related questions only (3) both control and related questions to compare the impact of various types of conversation entries. What’s more, we randomly generate embedding vectors as a baseline to perform sanity checks assuring our embedding vectors actually extract hidden information from conversations of our dataset.

To train the classifier, we use one of following optimizers: Adadelta [11], Adam [12], RMSprop [13], SGD with momentum [14], with binary cross-entropy loss and apply dropout [15]. We perform grid search based on the validation error to pick the best hyper-parameters and optimizer. The test result is tested on the testing set.

6 Results

We measure each of the settings described in the experiments section with metrics including precision, recall, and F1 score. The result is showed in Table 3. According to the result, we have findings listed below.

1.
All of the methods we propose in this paper have a higher F1 score than the randomly initialized vectors setting. It indicates that these methods indeed extract some hidden information from our data, and the classifier has learned some underlying pattern of deceptive language.
2.
One-hot-encoding has a higher F1 score

Much to our surprise, the “One-hot-encoding” method has a better F1 score than any other method. In the setting of using both related and control questions, it is about 33% higher than the average F1 score. We don’t expect the result because we think that the BERT pre-trained model, which can extract not only the meaning of words but contextual information of a sentence, should be more powerful and have a better performance. On the other hand, the one-hot-encoding process can only annotate whether the word exists in a transcript file.
3.
Using control questions only gets a higher F1 score

From Table 3, we can see that all methods except “Part-of-speech Extraction” have a higher F1 score in the scenario of using control questions only. On average, the F1 score of using control questions only is about 6% higher than using both questions, 23% higher than using related questions only.

Table 3. The metric result of each of the methods we propose and the average. The average value is calculated based on results of four methods listed above, not including the random initialized vector setting.

Full size table

7 Discussion

We are curious about why the one-hot-encoding method has a better F1 score. To further investigate what our model learns in the one-hot-encoding setting, we generate vectors whose elements all assigned to 0 except one element to be 1 to be inputs of the model. These vectors can be thought of as a sentence with only one word. The followings are some one-hot-encoding-format words that are generated with the method mentioned above. Our model considered these words to have more possibility being deceptive: (Taipei, a location name), (Taichung, a location name), (mobile phone), (mention), (April). Most of them are related to locations. On the contrary, these words are considered to have more possibility being honest: (monitor), (girlfriend), (come back home), (for example), (touch). However, we can’t say the sentence containing words above has more possibility to be deceptive/honest due to the complexity of the deep learning model. Computation in a neural network is not linear. Minor change to the input may lead to a significant change to the output. It just gives us a direction to a more in-depth investigation.

As for the reason why using control question only has a higher F1 score, we guess it’s because that sentences belonging to control questions have more words while one belonging to related questions, in which the subject just responses either yes or no, has less. The representation vectors of sentences belong to control questions hold more hidden information than that belong to related questions. As for the reason why the setting of using both related and control questions has a lower f1 score than using control questions only, we guess that related questions might be noise due to the short answers, which often only have one word from subjects.

8 Conclusion

In this paper, we utilize four different methods, including part-of-speech extraction, one-hot-encoding, means of embedding vectors, and BERT model, to capture the hidden information of real-world transcript files which contain conversations from interrogators and subjects. Besides, we use a hierarchical neural network to detect whether the conversation is deceptive or not. Finally, we compare the metric of each method and have a discussion.

After training, our system can classify the deceptive case and honest. However, we still can make our system more robust and reliable by collecting more training samples and combining some deep learning techniques such as transfer learning and multitask learning. Although improvements can be made, we believe that our methods can be the basis of more complicated neural network structures, which may be additional aids in the fields such as psychology, forensic science, and sociology someday. Moreover, the methods and structures mentioned in the paper are not restricted to Chinese transcripts. They can be applied to any other language and even other scenarios.

Notes

1.
https://fasttext.cc/docs/en/crawl-vectors.html.

References

Ma, W.-Y., Chen, K.-J.: Introduction to CKIP Chinese word segmentation system for the first international Chinese word segmentation bakeoff, pp. 168–171. Association for Computational Linguistics (2003)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
DePaulo, B.M., Lindsay, J.J., Malone, B.E., Muhlenbruck, L., Charlton, K., Cooper, H.: Cues to deception. Psychol. Bull. 129(1), 74 (2003)
Article Google Scholar
Vrij, A.: Detecting Lies and Deceit: The Psychology of Lying and Implications for Professional Practice. Wiley Series on the Psychology of Crime, Policing and Law (2000)
Google Scholar
Azaria, A., Richardson, A., Kraus, S.: An agent for deception detection in discussion based environments. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work and Social Computing, pp. 218–227 (2015)
Google Scholar
de Ruiter, B., Kachergis, G.: The mafiascum dataset: a large text corpus for deception detection. arXiv preprint arXiv:1811.07851 (2018)
Pisarevskaya, D.: Deception detection in news reports in the Russian language: Lexics and discourse. In: Proceedings of the 2017 EMNLP Workshop: Natural Language Processing Meets Journalism, pp. 74–79 (2017)
Google Scholar
Levitan, S.I., Maredia, A., Hirschberg, J.: Linguistic cues to deception and perceived deception in interview dialogues. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1941–1950 (2018)
Google Scholar
Bachenko, J., Fitzpatrick, E., Schonwetter, M.: Verification and implementation of language-based deception indicators in civil and criminal narratives. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 41–48. Association for Computational Linguistics (2008)
Google Scholar
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. CoRRabs/1212.5701 (2012). http://arxiv.org/abs/1212.5701
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
Hinton, G.: Neural Networks for Machine Learning - Lecture 6a - Overview of Mini-Batch Gradient Descent (2012)
Google Scholar
Qian, N.: On the momentum term in gradient descent learning algorithms. Neural Netw. 12(1), 145–151 (1999)
Article MathSciNet Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Science, Academia Sinica, Taipei, Taiwan
Yi-Ying Kao, Po-Han Chen, Chun-Chiao Tzeng, Zi-Yuan Chen, Boaz Shmueli & Lun-Wei Ku

Authors

Yi-Ying Kao
View author publications
You can also search for this author in PubMed Google Scholar
Po-Han Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chun-Chiao Tzeng
View author publications
You can also search for this author in PubMed Google Scholar
Zi-Yuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Boaz Shmueli
View author publications
You can also search for this author in PubMed Google Scholar
Lun-Wei Ku
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Po-Han Chen .

Editor information

Editors and Affiliations

Missouri University of Science and Technology, Rolla, MO, USA
Fiona Fui-Hoon Nah
Missouri University of Science and Technology, Rolla, MO, USA
Keng Siau

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kao, YY., Chen, PH., Tzeng, CC., Chen, ZY., Shmueli, B., Ku, LW. (2020). Detecting Deceptive Language in Crime Interrogation. In: Nah, FH., Siau, K. (eds) HCI in Business, Government and Organizations. HCII 2020. Lecture Notes in Computer Science(), vol 12204. Springer, Cham. https://doi.org/10.1007/978-3-030-50341-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-50341-3_7
Published: 10 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50340-6
Online ISBN: 978-3-030-50341-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Detecting Deceptive Language in Crime Interrogation

Abstract

Similar content being viewed by others

Deep Neural Network Architectures for Speech Deception Detection: A Brief Survey

Use of Natural Language Processing to Identify Inappropriate Content in Text

A comprehensive review on automatic hate speech detection in the age of the transformer

Keywords

1 Introduction

2 Background

3 Dataset