Not made for each other: Audio-Visual Dissonance-based Deepfake Detection and Localization

Komal Chugh, Parul Gupta, Abhinav Dhall, Ramanathan Subramanian

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

186 Citations (Scopus)

Abstract

We propose detection of deepfake videos based on the dissimilarity between the audio and visual modalities, termed as the Modality Dissonance Score (MDS). We hypothesize that manipulation of either modality will lead to dis-harmony between the two modalities, e.g., loss of lip-sync, unnatural facial and lip movements, etc. MDS is computed as the mean aggregate of dissimilarity scores between audio and visual segments in a video. Discriminative features are learnt for the audio and visual channels in a chunk-wise manner, employing the cross-entropy loss for individual modalities, and a contrastive loss that models inter-modality similarity. Extensive experiments on the DFDC and DeepFake-TIMIT Datasets show that our approach outperforms the state-of-the-art by up to 7%. We also demonstrate temporal forgery localization, and show how our technique identifies the manipulated video segments.

Original languageEnglish
Title of host publicationMM 2020 - Proceedings of the 28th ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages439-447
Number of pages9
ISBN (Electronic)9781450379885
DOIs
Publication statusPublished - 12 Oct 2020
Externally publishedYes
Event28th ACM International Conference on Multimedia - Virtual, Online, United States
Duration: 12 Oct 202016 Oct 2020
Conference number: 28th

Publication series

NameMM 2020 - Proceedings of the 28th ACM International Conference on Multimedia

Conference

Conference28th ACM International Conference on Multimedia
Abbreviated titleMM '20
Country/TerritoryUnited States
CityVirtual, Online
Period12/10/2016/10/20
OtherThis year's conference, for the first time in the history, will be running as an online large-scale meeting due to COVID-19. Face-to-face gatherings have been replaced by all-hands live sessions, parallel live question-answering (Q&A) sessions, and presentations from the authors via pre-recorded videos. However, the distance needed to prevent the spreading of the virus does not cool down enthusiasm in participants, presenters, chairs, reviewers, and volunteers. Under today’s challenging circumstance, we are pleased that we are still able to deliver the most cutting-edge research results, covering the latest findings in the field, for which the ACM Multimedia conference series is widely known.

Keywords

  • contrastive loss
  • deepfake detection and localization
  • modality dissonance
  • neural networks

Fingerprint

Dive into the research topics of 'Not made for each other: Audio-Visual Dissonance-based Deepfake Detection and Localization'. Together they form a unique fingerprint.

Cite this