Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization

Zhixi Cai, Kalin Stefanov, Abhinav Dhall, Munawar Hayat

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

36 Citations (Scopus)

Abstract

Due to its high societal impact, deepfake detection is getting active attention in the computer vision community. Most deepfake detection methods rely on identity, facial attributes, and adversarial perturbation-based spatio-temporal modifications at the whole video or random locations while keeping the meaning of the content intact. However, a sophisticated deepfake may contain only a small segment of video/audio manipulation, through which the meaning of the content can be, for example, completely inverted from a sentiment perspective. We introduce a content-driven audio-visual deepfake dataset, termed Localized Audio Visual DeepFake (LAV-DF), explicitly designed for the task of learning temporal forgery localization. Specifically, the content-driven audio-visual manipulations are performed strategically to change the sentiment polarity of the whole video. Our baseline method for benchmarking the proposed dataset is a 3DCNN model, termed as Boundary Aware Temporal Forgery Detection (BA-TFD), which is guided via contrastive, boundary matching, and frame classification loss functions. Our extensive quantitative and qualitative analysis demonstrates the proposed method's strong performance for temporal forgery localization and deepfake detection tasks.

Original languageEnglish
Title of host publication2022 International Conference on Digital Image Computing
Subtitle of host publicationTechniques and Applications (DICTA)
Place of PublicationUnited States
PublisherInstitute of Electrical and Electronics Engineers
Number of pages8
ISBN (Electronic)978-1-6654-5642-5
ISBN (Print)978-1-6654-5643-2
DOIs
Publication statusPublished - 10 Feb 2023
Externally publishedYes
Event2022 International Conference on Digital Image Computing: Techniques and Applications - Sydney, Australia
Duration: 30 Nov 20222 Dec 2022

Conference

Conference2022 International Conference on Digital Image Computing: Techniques and Applications
Abbreviated titleDICTA 2022
Country/TerritoryAustralia
CitySydney
Period30/11/222/12/22

Keywords

  • Deepfake
  • Artificial intelligence
  • Deepfake detection
  • Audio-video manipulation

Fingerprint

Dive into the research topics of 'Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization'. Together they form a unique fingerprint.

Cite this