TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Interactive Evaluation of Dialog	DSTC9 Track 3 - Task 2	PLATO-2	Overall Human Rating	4.15	# 1
Interactive Evaluation of Dialog	DSTC9 Track 3 - Task 2	PLATO-2	Coherent	2.8017	# 1
Interactive Evaluation of Dialog	DSTC9 Track 3 - Task 2	PLATO-2	Error Recovery	2.7518	# 1
Interactive Evaluation of Dialog	DSTC9 Track 3 - Task 2	PLATO-2	Consistent	0.9390	# 1
Interactive Evaluation of Dialog	DSTC9 Track 3 - Task 2	PLATO-2	Diversity	2.7441	# 1
Interactive Evaluation of Dialog	DSTC9 Track 3 - Task 2	PLATO-2	Topic Depth	2.7678	# 1
Interactive Evaluation of Dialog	DSTC9 Track 3 - Task 2	PLATO-2	Likeable	2.7878	# 1
Interactive Evaluation of Dialog	DSTC9 Track 3 - Task 2	PLATO-2	Understanding	2.8285	# 1
Interactive Evaluation of Dialog	DSTC9 Track 3 - Task 2	PLATO-2	Flexible	2.8000	# 1
Interactive Evaluation of Dialog	DSTC9 Track 3 - Task 2	PLATO-2	Informative	2.7881	# 1
Interactive Evaluation of Dialog	DSTC9 Track 3 - Task 2	PLATO-2	Inquisitive	2.7949	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-unified-pre-training-framework-for/interactive-evaluation-of-dialog-on-dstc9)](https://paperswithcode.com/sota/interactive-evaluation-of-dialog-on-dstc9?p=a-unified-pre-training-framework-for)`

A Unified Pre-training Framework for Conversational AI

6 May 2021 · Siqi Bao, Bingjin Chen, Huang He, Xin Tian, Han Zhou, Fan Wang, Hua Wu, Haifeng Wang, Wenquan Wu, Yingzhan Lin ·

In this work, we explore the application of PLATO-2 on various dialogue systems, including open-domain conversation, knowledge grounded dialogue, and task-oriented conversation. PLATO-2 is initially designed as an open-domain chatbot, trained via two-stage curriculum learning. In the first stage, a coarse-grained response generation model is learned to fit the simplified one-to-one mapping relationship. This model is applied to the task-oriented conversation, given that the semantic mappings tend to be deterministic in task completion. In the second stage, another fine-grained generation model and an evaluation model are further learned for diverse response generation and coherence estimation, respectively. With superior capability on capturing one-to-many mapping, such models are suitable for the open-domain conversation and knowledge grounded dialogue. For the comprehensive evaluation of PLATO-2, we have participated in multiple tasks of DSTC9, including interactive evaluation of open-domain conversation (Track3-task2), static evaluation of knowledge grounded dialogue (Track3-task1), and end-to-end task-oriented conversation (Track2-task1). PLATO-2 has obtained the 1st place in all three tasks, verifying its effectiveness as a unified framework for various dialogue systems.

PDF Abstract

Code

Add Remove Mark official

PaddlePaddle/Knover official

672

Tasks

Add Remove

Chatbot

Interactive Evaluation of Dialog

Response Generation

Datasets

MultiWOZ Topical-Chat

Results from the Paper

Add Remove

Ranked #1 on Interactive Evaluation of Dialog on DSTC9 Track 3 - Task 2

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Interactive Evaluation of Dialog	DSTC9 Track 3 - Task 2	PLATO-2	Overall Human Rating	4.15	# 1	Compare
			Coherent	2.8017	# 1	Compare
			Error Recovery	2.7518	# 1	Compare
			Consistent	0.9390	# 1	Compare
			Diversity	2.7441	# 1	Compare
			Topic Depth	2.7678	# 1	Compare
			Likeable	2.7878	# 1	Compare
			Understanding	2.8285	# 1	Compare
			Flexible	2.8000	# 1	Compare
			Informative	2.7881	# 1	Compare
			Inquisitive	2.7949	# 1	Compare

Methods

Add Remove

PLATO-2

Edit Social Preview

A Unified Pre-training Framework for Conversational AI

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove