TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Motion Synthesis	HumanML3D	FineMoGen	FID	0.151	# 15
Motion Synthesis	HumanML3D	FineMoGen	Diversity	9.263	# 18
Motion Synthesis	HumanML3D	FineMoGen	Multimodality	2.696	# 4
Motion Synthesis	HumanML3D	FineMoGen	R Precision Top3	0.784	# 11
Motion Synthesis	KIT Motion-Language	FineMoGen	FID	0.178	# 2
Motion Synthesis	KIT Motion-Language	FineMoGen	R Precision Top3	0.772	# 2
Motion Synthesis	KIT Motion-Language	FineMoGen	Diversity	10.85	# 12
Motion Synthesis	KIT Motion-Language	FineMoGen	Multimodality	1.877	# 11

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/finemogen-fine-grained-spatio-temporal-motion-1/motion-synthesis-on-kit-motion-language)](https://paperswithcode.com/sota/motion-synthesis-on-kit-motion-language?p=finemogen-fine-grained-spatio-temporal-motion-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/finemogen-fine-grained-spatio-temporal-motion-1/motion-synthesis-on-humanml3d)](https://paperswithcode.com/sota/motion-synthesis-on-humanml3d?p=finemogen-fine-grained-spatio-temporal-motion-1)`

FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing

NeurIPS 2023 · Mingyuan Zhang, Huirong Li, Zhongang Cai, Jiawei Ren, Lei Yang, Ziwei Liu ·

Text-driven motion generation has achieved substantial progress with the emergence of diffusion models. However, existing methods still struggle to generate complex motion sequences that correspond to fine-grained descriptions, depicting detailed and accurate spatio-temporal actions. This lack of fine controllability limits the usage of motion generation to a larger audience. To tackle these challenges, we present FineMoGen, a diffusion-based motion generation and editing framework that can synthesize fine-grained motions, with spatial-temporal composition to the user instructions. Specifically, FineMoGen builds upon diffusion model with a novel transformer architecture dubbed Spatio-Temporal Mixture Attention (SAMI). SAMI optimizes the generation of the global attention template from two perspectives: 1) explicitly modeling the constraints of spatio-temporal composition; and 2) utilizing sparsely-activated mixture-of-experts to adaptively extract fine-grained features. To facilitate a large-scale study on this new fine-grained motion generation task, we contribute the HuMMan-MoGen dataset, which consists of 2,968 videos and 102,336 fine-grained spatio-temporal descriptions. Extensive experiments validate that FineMoGen exhibits superior motion generation quality over state-of-the-art methods. Notably, FineMoGen further enables zero-shot motion editing capabilities with the aid of modern large language models (LLM), which faithfully manipulates motion sequences with fine-grained instructions. Project Page: https://mingyuan-zhang.github.io/projects/FineMoGen.html

PDF Abstract NeurIPS 2023 PDF NeurIPS 2023 Abstract

Code

Add Remove Mark official

mingyuan-zhang/FineMoGen official

Tasks

Add Remove

Motion Synthesis

Datasets

HumanML3D

BABEL KIT Motion-Language

Results from the Paper

Add Remove

Ranked #2 on Motion Synthesis on KIT Motion-Language

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Motion Synthesis	HumanML3D	FineMoGen	FID	0.151	# 15	Compare
			Diversity	9.263	# 18	Compare
			Multimodality	2.696	# 4	Compare
			R Precision Top3	0.784	# 11	Compare
Motion Synthesis	KIT Motion-Language	FineMoGen	FID	0.178	# 2	Compare
			R Precision Top3	0.772	# 2	Compare
			Diversity	10.85	# 12	Compare
			Multimodality	1.877	# 11	Compare

Methods

Add Remove

Diffusion

Edit Social Preview

FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove