[go: up one dir, main page]

skip to main content
research-article

Constructing Spatio-Temporal Graphs for Face Forgery Detection

Published: 22 May 2023 Publication History

Abstract

Recently, advanced development of facial manipulation techniques threatens web information security, thus, face forgery detection attracts a lot of attention. It is clear that both spatial and temporal information of facial videos contains the crucial manipulation traces, which are inevitably created during the generation process. However, most existing face forgery detectors only focus on the spatial artifacts or the temporal incoherence, and they are struggling to learn a significant and general kind of representations for manipulated facial videos. In this work, we propose to construct spatial-temporal graphs for fake videos to capture the spatial inconsistency and the temporal incoherence at the same time. To model the spatial-temporal relationship among the graph nodes, a novel forgery detector named Spatio-Temporal Graph Network (STGN) is proposed, which contains two kinds of graph-convolution-based units, the Spatial Relation Graph Unit (SRGU) and the Temporal Attention Graph Unit (TAGU). To exploit spatial information, the SRGU models the inconsistency between each pair of patches in the same frame, instead of focusing on the low-level local spatial artifacts which are vulnerable to samples created by unseen manipulation methods. And, the TAGU is proposed to model the long-distance temporal relation among the patches at the same spatial position in different frames with a graph attention mechanism based on the inter-node similarity. With the SRGU and the TAGU, our STGN can combine the discriminative power of spatial inconsistency and the generalization capacity of temporal incoherence for face forgery detection. Our STGN achieves state-of-the-art performances on several popular forgery detection datasets. Extensive experiments demonstrate both the superiority of our STGN on intra manipulation evaluation and the effectiveness for new sorts of face forgery videos on cross manipulation evaluation.

References

[1]
[2022]. Deepfakes. https://github.com/deepfakes/faceswap. Accessed January 26, 2022.
[2]
[2022]. FaceSwap. https://github.com/MarekKowalski/FaceSwap. Accessed January 26, 2022.
[3]
Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. Mesonet: A compact facial video forgery detection network. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 1–7.
[4]
Irene Amerini, Leonardo Galteri, Roberto Caldelli, and Alberto Del Bimbo. 2019. Deepfake video detection through optical flow based cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 0–0.
[5]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299–6308.
[6]
Lucy Chai, David Bau, Ser-Nam Lim, and Phillip Isola. 2020. What makes fake images detectable? Understanding properties that generalize. In Proceedings of the European Conference on Computer Vision. Springer, 103–120.
[7]
Beijing Chen, Xingwang Ju, Bin Xiao, Weiping Ding, Yuhui Zheng, and Victor Hugo C de Albuquerque. 2021. Locally GAN-generated face detection based on an improved Xception. Information Sciences 572 (2021), 16–28.
[8]
Beijing Chen, Xin Liu, Yuhui Zheng, Guoying Zhao, and Yun-Qing Shi. 2022. A robust GAN-generated face detection method based on dual-color spaces and an improved xception. IEEE Transactions on Circuits and Systems for Video Technology 32, 6 (2022), 3527–3538. DOI:
[9]
Beijing Chen, Weijin Tan, Gouenou Coatrieux, Yuhui Zheng, and Yun-Qing Shi. 2020. A serial image copy-move forgery localization scheme with source/target distinguishment. IEEE Transactions on Multimedia 23 (2020), 3506–3517.
[10]
Kaixuan Chen, Lina Yao, Dalin Zhang, Xianzhi Wang, Xiaojun Chang, and Feiping Nie. 2019. A semisupervised recurrent convolutional attention model for human activity recognition. IEEE Transactions on Neural Networks and Learning Systems 31, 5 (2019), 1747–1756.
[11]
Mo Chen, Jessica Fridrich, Miroslav Goljan, and Jan Lukás. 2008. Determining image origin and integrity using sensor noise. IEEE Transactions on Information Forensics and Security 3, 1 (2008), 74–90.
[12]
Yunpeng Chen, Marcus Rohrbach, Zhicheng Yan, Yan Shuicheng, Jiashi Feng, and Yannis Kalantidis. 2019. Graph-based global reasoning networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 433–442.
[13]
François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1251–1258.
[14]
Hao Dang, Feng Liu, Joel Stehouwer, Xiaoming Liu, and Anil K. Jain. 2020. On the detection of digital face manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5781–5790.
[15]
Lingwei Dang, Yongwei Nie, Chengjiang Long, Qing Zhang, and Guiqing Li. 2021. MSR-GCN: Multi-scale residual graph convolution networks for human motion prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11467–11476.
[16]
Sanmay Das, Allen Lavoie, and Malik Magdon-Ismail. 2016. Manipulation among the arbiters of collective intelligence: How wikipedia administrators mold public opinion. ACM Transactions on the Web 10, 4, Article 24 (Dec.2016), 25 pages. DOI:
[17]
Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. 2020. Retinaface: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5203–5212.
[18]
Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton-Ferrer. 2019. The deepfake detection challenge (dfdc) preview dataset. arXiv preprint arXiv:1910.08854, 2019.
[19]
Xinzhi Dong, Chengjiang Long, Wenju Xu, and Chunxia Xiao. 2021. Dual graph convolutional networks with transformer and curriculum learning for image captioning. In Proceedings of the 29th ACM International Conference on Multimedia. 2615–2624.
[20]
Ricard Durall, Margret Keuper, and Janis Keuper. 2020. Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7890–7899.
[21]
Hany Farid. 2006. Digital Image Ballistics from JPEG Quantization. Technical Report TR2006-583, Department of Computer Science at Dartmouth College.
[22]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. In Proceedings of Advances Neural Information Processing Systems Conference. 2672–2680.
[23]
Zhihao Gu, Yang Chen, Taiping Yao, Shouhong Ding, Jilin Li, Feiyue Huang, and Lizhuang Ma. 2021. Spatiotemporal inconsistency learning for DeepFake video detection. In Proceedings of the 29th ACM International Conference on Multimedia. 3473–3481.
[24]
Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis, and Maja Pantic. 2021. Lips don’t lie: A generalisable and robust approach to face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5039–5049.
[25]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[26]
Yinan He, Bei Gan, Siyu Chen, Yichun Zhou, Guojun Yin, Luchuan Song, Lu Sheng, Jing Shao, and Ziwei Liu. 2021. ForgeryNet: A versatile benchmark for comprehensive forgery analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4360–4369.
[27]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.
[28]
Ziheng Hu, Hongtao Xie, Yuxin Wang, Jiahong Li, Zhongyuan Wang, and Yongdong Zhang. 2021. Dynamic inconsistency-aware deepfake video detection. In Proceedings of the International Joint Conference on Artificial Intelligence.
[29]
Liming Jiang, Ren Li, Wayne Wu, Chen Qian, and Chen Change Loy. 2020. Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2889–2898.
[30]
Micah K. Johnson and Hany Farid. 2005. Exposing digital forgeries by detecting inconsistencies in lighting. In Proceedings of the 7th Workshop on Multimedia and Security. 1–10.
[31]
Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations.
[32]
Sihyung Lee. 2015. Detection of political manipulation in online communities through measures of effort and collaboration. ACM Transactions on the Web 9, 3, Article 16 (Jun.2015), 24 pages. DOI:
[33]
Jiaming Li, Hongtao Xie, Jiahong Li, Zhongyuan Wang, and Yongdong Zhang. 2021. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6458–6467.
[34]
Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. 2020. Face x-ray for more general face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5001–5010.
[35]
Xiaodan Li, Yining Lang, Yuefeng Chen, Xiaofeng Mao, Yuan He, Shuhui Wang, Hui Xue, and Quan Lu. 2020. Sharp multiple instance learning for deepfake video detection. In Proceedings of the 28th ACM International Conference on Multimedia. 1864–1872.
[36]
Yuezun Li, Ming-Ching Chang, and Siwei Lyu. 2018. In ictu oculi: Exposing ai created fake videos by detecting eye blinking. In Proceedings of the IEEE international workshop on information forensics and security.
[37]
Yuezun Li and Siwei Lyu. 2019. Exposing deepFake videos by detecting face warping artifacts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.
[38]
Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3207–3216.
[39]
Zhaoyang Liu, Donghao Luo, Yabiao Wang, Limin Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, and Tong Lu. 2020. Teinet: Towards an efficient architecture for video recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11669–11676.
[40]
Jan Lukas, Jessica Fridrich, and Miroslav Goljan. 2006. Digital camera identification from sensor pattern noise. IEEE Transactions on Information Forensics and Security 1, 2 (2006), 205–214.
[41]
Minnan Luo, Xiaojun Chang, Liqiang Nie, Yi Yang, Alexander G. Hauptmann, and Qinghua Zheng. 2017. An adaptive semisupervised feature analysis for video semantic recognition. IEEE Transactions on Cybernetics 48, 2 (2017), 648–660.
[42]
Iacopo Masi, Aditya Killekar, Royston Marian Mascarenhas, Shenoy Pratik Gurudatt, and Wael AbdAlmageed. 2020. Two-branch recurrent network for isolating deepfakes in videos. In Proceedings of the European Conference on Computer Vision. Springer, 667–684.
[43]
Shentong Mo and Xin Miao. 2021. OsGG-Net: One-step graph generation network for unbiased head pose estimation. In Proceedings of the 29th ACM International Conference on Multimedia. 2465–2473.
[44]
Huy H. Nguyen, Fuming Fang, Junichi Yamagishi, and Isao Echizen. 2019. Multi-task learning for detecting and segmenting manipulated facial images and videos. IEEE 10th International Conference on Biometrics Theory, Applications and Systems. 1–8.
[45]
Hao Peng, Jianxin Li, Senzhang Wang, Lihong Wang, Qiran Gong, Renyu Yang, Bo Li, S. Yu Philip, and Lifang He. 2019. Hierarchical taxonomy-aware and attentional graph capsule RCNNs for large-scale multi-label text classification. IEEE Transactions on Knowledge and Data Engineering 33, 6 (2019), 2505–2519.
[46]
Hao Peng, Hongfei Wang, Bowen Du, Md Zakirul Alam Bhuiyan, Hongyuan Ma, Jianwei Liu, Lihong Wang, Zeyu Yang, Linfeng Du, Senzhang Wang, and Philip S. Yu. 2020. Spatial temporal incidence dynamic graph neural networks for traffic flow forecasting. Information Sciences 521 (2020), 277–290.
[47]
Hao Peng, Ruitong Zhang, Yingtong Dou, Renyu Yang, Jingyi Zhang, and Philip S. Yu. 2021. Reinforced neighborhood selection guided multi-relational graph neural networks. ACM Transactions on Information Systems (TOIS) 40, 4 (2021), 1–46.
[48]
Hao Peng, Ruitong Zhang, Shaoning Li, Yuwei Cao, Shirui Pan, and Philip Yu. 2023. Reinforced, incremental and cross-lingual event detection from social messages. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 1 (2023), 980–998.
[49]
Alin C. Popescu and Hany Farid. 2005. Exposing digital forgeries in color filter array interpolated images. IEEE Transactions on Signal Processing 53, 10 (2005), 3948–3959.
[50]
Hua Qi, Qing Guo, Felix Juefei-Xu, Xiaofei Xie, Lei Ma, Wei Feng, Yang Liu, and Jianjun Zhao. 2020. DeepRhythm: Exposing deepfakes with attentional visual heartbeat rhythms. In Proceedings of the 28th ACM International Conference on Multimedia. 4318–4327.
[51]
Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. 2020. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In Proceedings of the European Conference on Computer Vision. Springer, 86–103.
[52]
Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE International Conference on Computer Vision. 1–11.
[53]
Ekraam Sabir, Jiaxin Cheng, Ayush Jaiswal, Wael AbdAlmageed, Iacopo Masi, and Prem Natarajan. 2019. Recurrent convolutional strategies for face manipulation detection in videos. Interfaces (GUI) 3 (2019), 1.
[54]
Ekraam Sabir, Jiaxin Cheng, Ayush Jaiswal, Wael AbdAlmageed, Iacopo Masi, and Prem Natarajan. 2019. Recurrent convolutional strategies for face manipulation detection in videos. Interfaces (GUI) 3, 1 (2019), 80–87.
[55]
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.
[56]
Fenfen Sheng, Zhineng Chen, and Bo Xu. 2019. NRTR: A no-recurrence sequence-to-sequence model for scene text recognition. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 781–786.
[57]
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7912–7921.
[58]
Saniat Javid Sohrawardi, Akash Chintha, Bao Thai, Sovantharith Seng, Andrea Hickerson, Raymond Ptucha, and Matthew Wright. 2019. Poster: Towards robust open-world detection of deepfakes. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 2613–2615.
[59]
Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1–12.
[60]
Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2387–2395.
[61]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489–4497.
[62]
Gengxing Wang, Jiahuan Zhou, and Ying Wu. 2020. Exposing deep-faked videos by anomalous co-motion pattern detection. arXiv:2008.04848. Retrieved from https://arxiv.org/abs/2008.04848.
[63]
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, and Bin Xiao. 2020. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 10 (2021), 3349–3364.
[64]
Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A. Efros. 2020. CNN-generated images are surprisingly easy to spot... for now. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8695–8704.
[65]
Xiaolong Wang and Abhinav Gupta. 2018. Videos as space-time region graphs. In Proceedings of the European Conference on Computer Vision (ECCV). 399–417.
[66]
Yongxin Wang, Kris Kitani, and Xinshuo Weng. 2021. Joint object detection and multi-object tracking with graph neural networks. IEEE International Conference on Robotics and Automation. 13708–13715.
[67]
Chang Xu and Jie Zhang. 2017. Collusive opinion fraud detection in online reviews: A probabilistic modeling approach. ACM Transactions on the Web 11, 4, Article 25 (Jul.2017), 28 pages. DOI:
[68]
Xin Yang, Yuezun Li, and Siwei Lyu. 2019. Exposing deep fakes using inconsistent head poses. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8261–8265.
[69]
Peipeng Yu, Jianwei Fei, Zhihua Xia, Zhili Zhou, and Jian Weng. 2022. Improving generalization by commonality learning in face forgery detection. IEEE Transactions on Information Forensics and Security 17 (2022), 547–558.
[70]
Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, and Chuang Gan. 2019. Graph convolutional networks for temporal action localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7094–7103.
[71]
Dalin Zhang, Lina Yao, Kaixuan Chen, Sen Wang, Xiaojun Chang, and Yunhao Liu. 2019. Making sense of spatio-temporal preserving representations for EEG-based human intention recognition. IEEE Transactions on Cybernetics 50, 7 (2019), 3033–3044.
[72]
Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, and Fang Wen. 2021. Exploring temporal coherence for more general video face forgery detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15044–15054.
[73]
Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, and Yu-Gang Jiang. 2020. Wilddeepfake: A challenging real-world dataset for deepfake detection. In Proceedings of the 28th ACM International Conference on Multimedia. 2382–2390.

Cited By

View all
  • (2024)Where Deepfakes Gaze at? Spatial–Temporal Gaze Inconsistency Analysis for Video Face Forgery DetectionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.338182319(4507-4517)Online publication date: 25-Mar-2024
  • (2023)Deepfake Attacks: Generation, Detection, Datasets, Challenges, and Research DirectionsComputers10.3390/computers1210021612:10(216)Online publication date: 23-Oct-2023
  • (2023)RAIRNet: Region-Aware Identity Rectification for Face Forgery DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612321(1455-1464)Online publication date: 26-Oct-2023
  • Show More Cited By

Index Terms

  1. Constructing Spatio-Temporal Graphs for Face Forgery Detection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on the Web
      ACM Transactions on the Web  Volume 17, Issue 3
      August 2023
      302 pages
      ISSN:1559-1131
      EISSN:1559-114X
      DOI:10.1145/3597636
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 May 2023
      Online AM: 30 January 2023
      Accepted: 20 October 2022
      Revised: 27 June 2022
      Received: 31 January 2022
      Published in TWEB Volume 17, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Spatio-temporal graph
      2. forgery detection
      3. spatial inconsistency
      4. temporal incoherence

      Qualifiers

      • Research-article

      Funding Sources

      • National Nature Science Foundation of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)274
      • Downloads (Last 6 weeks)21
      Reflects downloads up to 09 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Where Deepfakes Gaze at? Spatial–Temporal Gaze Inconsistency Analysis for Video Face Forgery DetectionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.338182319(4507-4517)Online publication date: 25-Mar-2024
      • (2023)Deepfake Attacks: Generation, Detection, Datasets, Challenges, and Research DirectionsComputers10.3390/computers1210021612:10(216)Online publication date: 23-Oct-2023
      • (2023)RAIRNet: Region-Aware Identity Rectification for Face Forgery DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612321(1455-1464)Online publication date: 26-Oct-2023
      • (2023)DeepFake on Face and Expression Swap: A ReviewIEEE Access10.1109/ACCESS.2023.332440311(117865-117906)Online publication date: 2023

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media