TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text Retrieval	Image-Chat	PaCE	R@1	51.9	# 1
Text Retrieval	Image-Chat	PaCE	R@5	76.8	# 1
Text Retrieval	Image-Chat	PaCE	Sum(R@1,5)	128.7	# 1
Response Generation	MMConv	PaCE	Inform	34.5	# 1
Response Generation	MMConv	PaCE	Success	13.9	# 1
Response Generation	MMConv	PaCE	BLEU	22	# 1
Response Generation	MMConv	PaCE	Comb.	44.7	# 1
Dialogue State Tracking	MMConv	PaCE	Categorical Accuracy	92.2	# 1
Dialogue State Tracking	MMConv	PaCE	Non-Categorical Accuracy	43.4	# 1
Dialogue State Tracking	MMConv	PaCE	Overall	39.2	# 1
Multimodal Intent Recognition	MMDialog	PaCE	F1	77.6	# 1
Multimodal Intent Recognition	PhotoChat	PaCE	F1	63.8	# 1
Multimodal Intent Recognition	PhotoChat	PaCE	Precision	63.3	# 1
Multimodal Intent Recognition	PhotoChat	PaCE	Recall	68	# 1
Image Retrieval	PhotoChat	PaCE	R1	15.2	# 1
Image Retrieval	PhotoChat	PaCE	R@5	36.7	# 1
Image Retrieval	PhotoChat	PaCE	R@10	49.6	# 1
Image Retrieval	PhotoChat	PaCE	Sum(R@1,5,10)	101.5	# 1
Dialogue State Tracking	SIMMC2.0	PaCE	Slot F1	87.0	# 2
Dialogue State Tracking	SIMMC2.0	PaCE	Act F1	97.1	# 1
Response Generation	SIMMC2.0	PaCE	BLEU	34.1	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pace-unified-multi-modal-dialogue-pre/text-retrieval-on-image-chat)](https://paperswithcode.com/sota/text-retrieval-on-image-chat?p=pace-unified-multi-modal-dialogue-pre)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pace-unified-multi-modal-dialogue-pre/response-generation-on-mmconv)](https://paperswithcode.com/sota/response-generation-on-mmconv?p=pace-unified-multi-modal-dialogue-pre)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pace-unified-multi-modal-dialogue-pre/dialogue-state-tracking-on-mmconv)](https://paperswithcode.com/sota/dialogue-state-tracking-on-mmconv?p=pace-unified-multi-modal-dialogue-pre)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pace-unified-multi-modal-dialogue-pre/multimodal-intent-recognition-on-mmdialog)](https://paperswithcode.com/sota/multimodal-intent-recognition-on-mmdialog?p=pace-unified-multi-modal-dialogue-pre)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pace-unified-multi-modal-dialogue-pre/multimodal-intent-recognition-on-photochat)](https://paperswithcode.com/sota/multimodal-intent-recognition-on-photochat?p=pace-unified-multi-modal-dialogue-pre)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pace-unified-multi-modal-dialogue-pre/image-retrieval-on-photochat)](https://paperswithcode.com/sota/image-retrieval-on-photochat?p=pace-unified-multi-modal-dialogue-pre)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pace-unified-multi-modal-dialogue-pre/dialogue-state-tracking-on-simmc2-0)](https://paperswithcode.com/sota/dialogue-state-tracking-on-simmc2-0?p=pace-unified-multi-modal-dialogue-pre)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pace-unified-multi-modal-dialogue-pre/response-generation-on-simmc2-0)](https://paperswithcode.com/sota/response-generation-on-simmc2-0?p=pace-unified-multi-modal-dialogue-pre)`

PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

24 May 2023 · Yunshui Li, Binyuan Hui, Zhichao Yin, Min Yang, Fei Huang, Yongbin Li ·

Perceiving multi-modal information and fulfilling dialogues with humans is a long-term goal of artificial intelligence. Pre-training is commonly regarded as an effective approach for multi-modal dialogue. However, due to the limited availability of multi-modal dialogue data, there is still scarce research on multi-modal dialogue pre-training. Yet another intriguing challenge emerges from the encompassing nature of multi-modal dialogue, which involves various modalities and tasks. Moreover, new forms of tasks may arise at unpredictable points in the future. Hence, it is essential for designed multi-modal dialogue models to possess sufficient flexibility to adapt to such scenarios. This paper proposes \textbf{PaCE}, a unified, structured, compositional multi-modal dialogue pre-training framework. It utilizes a combination of several fundamental experts to accommodate multiple dialogue-related tasks and can be pre-trained using limited dialogue and extensive non-dialogue multi-modal data. Furthermore, we propose a progressive training method where old experts from the past can assist new experts, facilitating the expansion of their capabilities. Experimental results demonstrate that PaCE achieves state-of-the-art results on eight multi-modal dialog benchmarks.

PDF Abstract

Code

Add Remove Mark official

AlibabaResearch/DAMO-ConvAI official

977

Tasks

Add Remove

Dialogue State Tracking

Image Retrieval

Multimodal Intent Recognition

Response Generation

Text Retrieval

Visual Dialog

Datasets

MS COCO

Image-Chat

PhotoChat

MMDialog

SIMMC2.0

MMConv

Results from the Paper

Edit

Ranked #1 on Response Generation on SIMMC2.0

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text Retrieval	Image-Chat	PaCE	R@1	51.9	# 1	Compare
			R@5	76.8	# 1	Compare
			Sum(R@1,5)	128.7	# 1	Compare
Response Generation	MMConv	PaCE	Inform	34.5	# 1	Compare
			Success	13.9	# 1	Compare
			BLEU	22	# 1	Compare
			Comb.	44.7	# 1	Compare
Dialogue State Tracking	MMConv	PaCE	Categorical Accuracy	92.2	# 1	Compare
			Non-Categorical Accuracy	43.4	# 1	Compare
			Overall	39.2	# 1	Compare
Multimodal Intent Recognition	MMDialog	PaCE	F1	77.6	# 1	Compare
Multimodal Intent Recognition	PhotoChat	PaCE	F1	63.8	# 1	Compare
			Precision	63.3	# 1	Compare
			Recall	68	# 1	Compare
Image Retrieval	PhotoChat	PaCE	R1	15.2	# 1	Compare
			R@5	36.7	# 1	Compare
			R@10	49.6	# 1	Compare
			Sum(R@1,5,10)	101.5	# 1	Compare
Dialogue State Tracking	SIMMC2.0	PaCE	Slot F1	87.0	# 2	Compare
Dialogue State Tracking	SIMMC2.0	PaCE	Act F1	97.1	# 1	Compare
Response Generation	SIMMC2.0	PaCE	BLEU	34.1	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove