TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Motion Synthesis	HumanML3D	T2M-GPT (τ = 0.5)	FID	0.116	# 11
Motion Synthesis	HumanML3D	T2M-GPT (τ = 0.5)	Diversity	9.761	# 4
Motion Synthesis	HumanML3D	T2M-GPT (τ = 0.5)	Multimodality	1.856	# 12
Motion Synthesis	HumanML3D	T2M-GPT (τ = 0.5)	R Precision Top3	0.775	# 15
Motion Synthesis	HumanML3D	T2M-GPT (τ ∈ U[0, 1])	FID	0.141	# 14
Motion Synthesis	HumanML3D	T2M-GPT (τ ∈ U[0, 1])	Diversity	9.722	# 6
Motion Synthesis	HumanML3D	T2M-GPT (τ ∈ U[0, 1])	Multimodality	1.831	# 13
Motion Synthesis	HumanML3D	T2M-GPT (τ ∈ U[0, 1])	R Precision Top3	0.775	# 15
Motion Synthesis	HumanML3D	T2M-GPT (τ = 0)	FID	0.140	# 13
Motion Synthesis	HumanML3D	T2M-GPT (τ = 0)	Diversity	9.844	# 2
Motion Synthesis	HumanML3D	T2M-GPT (τ = 0)	Multimodality	3.285	# 1
Motion Synthesis	HumanML3D	T2M-GPT (τ = 0)	R Precision Top3	0.685	# 21
Motion Synthesis	KIT Motion-Language	T2M-GPT (τ ∈ U[0, 1])	FID	0.514	# 12
Motion Synthesis	KIT Motion-Language	T2M-GPT (τ ∈ U[0, 1])	R Precision Top3	0.745	# 9
Motion Synthesis	KIT Motion-Language	T2M-GPT (τ ∈ U[0, 1])	Diversity	10.921	# 8
Motion Synthesis	KIT Motion-Language	T2M-GPT (τ ∈ U[0, 1])	Multimodality	1.570	# 12
Motion Synthesis	KIT Motion-Language	T2M-GPT (τ = 0)	FID	0.737	# 15
Motion Synthesis	KIT Motion-Language	T2M-GPT (τ = 0)	R Precision Top3	0.716	# 16
Motion Synthesis	KIT Motion-Language	T2M-GPT (τ = 0)	Diversity	11.198	# 1
Motion Synthesis	KIT Motion-Language	T2M-GPT (τ = 0)	Multimodality	2.309	# 4
Motion Synthesis	KIT Motion-Language	T2M-GPT (τ = 0.5)	FID	0.717	# 14
Motion Synthesis	KIT Motion-Language	T2M-GPT (τ = 0.5)	R Precision Top3	0.737	# 13
Motion Synthesis	KIT Motion-Language	T2M-GPT (τ = 0.5)	Diversity	10.862	# 11
Motion Synthesis	KIT Motion-Language	T2M-GPT (τ = 0.5)	Multimodality	1.912	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/t2m-gpt-generating-human-motion-from-textual/motion-synthesis-on-humanml3d)](https://paperswithcode.com/sota/motion-synthesis-on-humanml3d?p=t2m-gpt-generating-human-motion-from-textual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/t2m-gpt-generating-human-motion-from-textual/motion-synthesis-on-kit-motion-language)](https://paperswithcode.com/sota/motion-synthesis-on-kit-motion-language?p=t2m-gpt-generating-human-motion-from-textual)`

T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations

15 Jan 2023 · Jianrong Zhang, Yangsong Zhang, Xiaodong Cun, Shaoli Huang, Yong Zhang, Hongwei Zhao, Hongtao Lu, Xi Shen ·

In this work, we investigate a simple and must-known conditional generative framework based on Vector Quantised-Variational AutoEncoder (VQ-VAE) and Generative Pre-trained Transformer (GPT) for human motion generation from textural descriptions. We show that a simple CNN-based VQ-VAE with commonly used training recipes (EMA and Code Reset) allows us to obtain high-quality discrete representations. For GPT, we incorporate a simple corruption strategy during the training to alleviate training-testing discrepancy. Despite its simplicity, our T2M-GPT shows better performance than competitive approaches, including recent diffusion-based approaches. For example, on HumanML3D, which is currently the largest dataset, we achieve comparable performance on the consistency between text and generated motion (R-Precision), but with FID 0.116 largely outperforming MotionDiffuse of 0.630. Additionally, we conduct analyses on HumanML3D and observe that the dataset size is a limitation of our approach. Our work suggests that VQ-VAE still remains a competitive approach for human motion generation.

PDF Abstract

Code

Add Remove Mark official

Mael-zys/T2M-GPT official

↳ Quickstart in

Colab

Spaces

512

Tasks

Add Remove

Motion Synthesis

Datasets

HumanML3D KIT Motion-Language

Results from the Paper

Edit

Ranked #11 on Motion Synthesis on HumanML3D

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Motion Synthesis	HumanML3D	T2M-GPT (τ = 0.5)	FID	0.116	# 11	Compare
			Diversity	9.761	# 4	Compare
			Multimodality	1.856	# 12	Compare
			R Precision Top3	0.775	# 15	Compare
Motion Synthesis	HumanML3D	T2M-GPT (τ ∈ U[0, 1])	FID	0.141	# 14	Compare
			Diversity	9.722	# 6	Compare
			Multimodality	1.831	# 13	Compare
			R Precision Top3	0.775	# 15	Compare
Motion Synthesis	HumanML3D	T2M-GPT (τ = 0)	FID	0.140	# 13	Compare
			Diversity	9.844	# 2	Compare
			Multimodality	3.285	# 1	Compare
			R Precision Top3	0.685	# 21	Compare
Motion Synthesis	KIT Motion-Language	T2M-GPT (τ ∈ U[0, 1])	FID	0.514	# 12	Compare
			R Precision Top3	0.745	# 9	Compare
			Diversity	10.921	# 8	Compare
			Multimodality	1.570	# 12	Compare
Motion Synthesis	KIT Motion-Language	T2M-GPT (τ = 0)	FID	0.737	# 15	Compare
			R Precision Top3	0.716	# 16	Compare
			Diversity	11.198	# 1	Compare
			Multimodality	2.309	# 4	Compare
Motion Synthesis	KIT Motion-Language	T2M-GPT (τ = 0.5)	FID	0.717	# 14	Compare
			R Precision Top3	0.737	# 13	Compare
			Diversity	10.862	# 11	Compare
			Multimodality	1.912	# 9	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • Attention Dropout • AutoEncoder • BPE • Cosine Annealing • Dense Connections • Discriminative Fine-Tuning • Dropout • GELU • GPT • Label Smoothing • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer • VQ-VAE • Weight Decay

Edit Social Preview

T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove