TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Motion Synthesis	HumanML3D	MMM (predict length)	FID	0.080	# 5
Motion Synthesis	HumanML3D	MMM (predict length)	Diversity	9.411	# 15
Motion Synthesis	HumanML3D	MMM (predict length)	Multimodality	1.164	# 20
Motion Synthesis	HumanML3D	MMM (predict length)	R Precision Top3	0.794	# 6
Motion Synthesis	HumanML3D	MMM (gt length)	FID	0.089	# 6
Motion Synthesis	HumanML3D	MMM (gt length)	Diversity	9.577	# 11
Motion Synthesis	HumanML3D	MMM (gt length)	Multimodality	1.226	# 19
Motion Synthesis	HumanML3D	MMM (gt length)	R Precision Top3	0.804	# 2
Motion Synthesis	KIT Motion-Language	MMM (gt length)	FID	0.316	# 5
Motion Synthesis	KIT Motion-Language	MMM (gt length)	R Precision Top3	0.744	# 11
Motion Synthesis	KIT Motion-Language	MMM (gt length)	Diversity	10.910	# 9
Motion Synthesis	KIT Motion-Language	MMM (gt length)	Multimodality	1.232	# 15
Motion Synthesis	KIT Motion-Language	MMM (predict length)	FID	0.429	# 8
Motion Synthesis	KIT Motion-Language	MMM (predict length)	R Precision Top3	0.718	# 15
Motion Synthesis	KIT Motion-Language	MMM (predict length)	Diversity	10.633	# 18
Motion Synthesis	KIT Motion-Language	MMM (predict length)	Multimodality	1.105	# 17

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mmm-generative-masked-motion-model/motion-synthesis-on-humanml3d)](https://paperswithcode.com/sota/motion-synthesis-on-humanml3d?p=mmm-generative-masked-motion-model)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mmm-generative-masked-motion-model/motion-synthesis-on-kit-motion-language)](https://paperswithcode.com/sota/motion-synthesis-on-kit-motion-language?p=mmm-generative-masked-motion-model)`

MMM: Generative Masked Motion Model

6 Dec 2023 · Ekkasit Pinyoanuntapong, Pu Wang, Minwoo Lee, Chen Chen ·

Recent advances in text-to-motion generation using diffusion and autoregressive models have shown promising results. However, these models often suffer from a trade-off between real-time performance, high fidelity, and motion editability. To address this gap, we introduce MMM, a novel yet simple motion generation paradigm based on Masked Motion Model. MMM consists of two key components: (1) a motion tokenizer that transforms 3D human motion into a sequence of discrete tokens in latent space, and (2) a conditional masked motion transformer that learns to predict randomly masked motion tokens, conditioned on the pre-computed text tokens. By attending to motion and text tokens in all directions, MMM explicitly captures inherent dependency among motion tokens and semantic mapping between motion and text tokens. During inference, this allows parallel and iterative decoding of multiple motion tokens that are highly consistent with fine-grained text descriptions, therefore simultaneously achieving high-fidelity and high-speed motion generation. In addition, MMM has innate motion editability. By simply placing mask tokens in the place that needs editing, MMM automatically fills the gaps while guaranteeing smooth transitions between editing and non-editing parts. Extensive experiments on the HumanML3D and KIT-ML datasets demonstrate that MMM surpasses current leading methods in generating high-quality motion (evidenced by superior FID scores of 0.08 and 0.429), while offering advanced editing features such as body-part modification, motion in-betweening, and the synthesis of long motion sequences. In addition, MMM is two orders of magnitude faster on a single mid-range GPU than editable motion diffusion models. Our project page is available at \url{https://exitudio.github.io/MMM-page}.

PDF Abstract

Code

Add Remove Mark official

exitudio/MMM official

Tasks

Add Remove

Motion Synthesis

Datasets

HumanML3D KIT Motion-Language

Results from the Paper

Add Remove

Ranked #5 on Motion Synthesis on KIT Motion-Language

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Motion Synthesis	HumanML3D	MMM (predict length)	FID	0.080	# 5	Compare
			Diversity	9.411	# 15	Compare
			Multimodality	1.164	# 20	Compare
			R Precision Top3	0.794	# 6	Compare
Motion Synthesis	HumanML3D	MMM (gt length)	FID	0.089	# 6	Compare
			Diversity	9.577	# 11	Compare
			Multimodality	1.226	# 19	Compare
			R Precision Top3	0.804	# 2	Compare
Motion Synthesis	KIT Motion-Language	MMM (gt length)	FID	0.316	# 5	Compare
			R Precision Top3	0.744	# 11	Compare
			Diversity	10.910	# 9	Compare
			Multimodality	1.232	# 15	Compare
Motion Synthesis	KIT Motion-Language	MMM (predict length)	FID	0.429	# 8	Compare
			R Precision Top3	0.718	# 15	Compare
			Diversity	10.633	# 18	Compare
			Multimodality	1.105	# 17	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

MMM: Generative Masked Motion Model

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove