TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
DeepFake Detection	LAV-DF	BA-TFD	AUC	0.990	# 1
Temporal Forgery Localization	LAV-DF	BA-TFD	AR@100	66.9	# 3
Temporal Forgery Localization	LAV-DF	BA-TFD	AR@50	64.08	# 3
Temporal Forgery Localization	LAV-DF	BA-TFD	AR@20	60.77	# 3
Temporal Forgery Localization	LAV-DF	BA-TFD	AR@10	58.42	# 3
Temporal Forgery Localization	LAV-DF	BA-TFD	AP@0.5	76.9	# 3
Temporal Forgery Localization	LAV-DF	BA-TFD	AP@0.75	38.5	# 3
Temporal Forgery Localization	LAV-DF	BA-TFD	AP@0.95	0.25	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/do-you-really-mean-that-content-driven-audio/deepfake-detection-on-lav-df)](https://paperswithcode.com/sota/deepfake-detection-on-lav-df?p=do-you-really-mean-that-content-driven-audio)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/do-you-really-mean-that-content-driven-audio/temporal-forgery-localization-on-lav-df)](https://paperswithcode.com/sota/temporal-forgery-localization-on-lav-df?p=do-you-really-mean-that-content-driven-audio)`

Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization

13 Apr 2022 · Zhixi Cai, Kalin Stefanov, Abhinav Dhall, Munawar Hayat ·

Due to its high societal impact, deepfake detection is getting active attention in the computer vision community. Most deepfake detection methods rely on identity, facial attributes, and adversarial perturbation-based spatio-temporal modifications at the whole video or random locations while keeping the meaning of the content intact. However, a sophisticated deepfake may contain only a small segment of video/audio manipulation, through which the meaning of the content can be, for example, completely inverted from a sentiment perspective. We introduce a content-driven audio-visual deepfake dataset, termed Localized Audio Visual DeepFake (LAV-DF), explicitly designed for the task of learning temporal forgery localization. Specifically, the content-driven audio-visual manipulations are performed strategically to change the sentiment polarity of the whole video. Our baseline method for benchmarking the proposed dataset is a 3DCNN model, termed as Boundary Aware Temporal Forgery Detection (BA-TFD), which is guided via contrastive, boundary matching, and frame classification loss functions. Our extensive quantitative and qualitative analysis demonstrates the proposed method's strong performance for temporal forgery localization and deepfake detection tasks.

PDF Abstract

Code

Add Remove Mark official

ControlNet/LAV-DF official

Tasks

Add Remove

Benchmarking

DeepFake Detection

Temporal Forgery Localization

Datasets

Introduced in the Paper:

LAV-DF

Used in the Paper:

DFDC

Results from the Paper

Edit

Ranked #1 on DeepFake Detection on LAV-DF

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
DeepFake Detection	LAV-DF	BA-TFD	AUC	0.990	# 1	Compare
Temporal Forgery Localization	LAV-DF	BA-TFD	AR@100	66.9	# 3	Compare
			AR@50	64.08	# 3	Compare
			AR@20	60.77	# 3	Compare
			AR@10	58.42	# 3	Compare
			AP@0.5	76.9	# 3	Compare
			AP@0.75	38.5	# 3	Compare
			AP@0.95	0.25	# 3	Compare

Methods

Add Remove

AWARE

Edit Social Preview

Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove