Search Results for author: Ziwei Liu

Found 264 papers, 185 papers with code

MMInA: Benchmarking Multihop Multimodal Internet Agents

no code implementations • 15 Apr 2024 • Ziniu Zhang, Shulin Tian, Liangyu Chen, Ziwei Liu

To answer this question, we present MMInA, a multihop and multimodal benchmark to evaluate the embodied agents for compositional Internet tasks, with several appealing properties: 1) Evolving real-world multimodal websites.

Benchmarking

Paper
Add Code

Move Anything with Layered Scene Diffusion

no code implementations • 10 Apr 2024 • Jiawei Ren, Mengmeng Xu, Jui-Chieh Wu, Ziwei Liu, Tao Xiang, Antoine Toisoul

Diffusion models generate images with an unprecedented level of quality, but how can we freely rearrange image layouts?

Denoising Disentanglement

Paper
Add Code

FashionEngine: Interactive Generation and Editing of 3D Clothed Humans

no code implementations • 2 Apr 2024 • Tao Hu, Fangzhou Hong, Zhaoxi Chen, Ziwei Liu

FashionEngine automates the 3D human production with three key components: 1) A pre-trained 3D human diffusion model that learns to model 3D humans in a semantic UV latent space from 2D image training data, which provides strong priors for diverse generation and editing tasks.

Virtual Try-on

Paper
Add Code

SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering

no code implementations • 1 Apr 2024 • Tao Hu, Fangzhou Hong, Ziwei Liu

2) Physical motion decoding that is designed to encourage physical motion learning by decoding the motion triplane features at timestep t to predict both spatial derivatives and temporal derivatives at the next timestep t+1 in the training stage.

Generalizable Novel View Synthesis Novel View Synthesis

Paper
Add Code

StructLDM: Structured Latent Diffusion for 3D Human Generation

no code implementations • 1 Apr 2024 • Tao Hu, Fangzhou Hong, Ziwei Liu

2) A structured 3D-aware auto-decoder that factorizes the global latent space into several semantic body parts parameterized by a set of conditional structured local NeRFs anchored to the body template, which embeds the properties learned from the 2D training data and can be decoded to render view-consistent humans under different poses and clothing styles.

Virtual Try-on

Paper
Add Code

Large Motion Model for Unified Multi-Modal Motion Generation

no code implementations • 1 Apr 2024 • Mingyuan Zhang, Daisheng Jin, Chenyang Gu, Fangzhou Hong, Zhongang Cai, Jingfang Huang, Chongzhi Zhang, Xinying Guo, Lei Yang, Ying He, Ziwei Liu

In this work, we present Large Motion Model (LMM), a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model.

Paper
Add Code

Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

1 code implementation • 29 Mar 2024 • Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Qing Yu, Go Irie, Yixuan Li, Hai Li, Ziwei Liu, Kiyoharu Aizawa

This paper introduces a novel and significant challenge for Vision Language Models (VLMs), termed Unsolvable Problem Detection (UPD).

Question Answering Visual Question Answering

Paper
Code

Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

no code implementations • 27 Mar 2024 • Li SiYao, Tianpei Gu, Zhitao Yang, Zhengyu Lin, Ziwei Liu, Henghui Ding, Lei Yang, Chen Change Loy

We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm.

Paper
Add Code

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

no code implementations • 26 Mar 2024 • Qingping Sun, Yanjun Wang, Ailing Zeng, Wanqi Yin, Chen Wei, Wenjia Wang, Haiyi Mei, Chi Sing Leung, Ziwei Liu, Lei Yang, Zhongang Cai

Expressive human pose and shape estimation (a. k. a.

Human Detection

Paper
Add Code

TC4D: Trajectory-Conditioned Text-to-4D Generation

no code implementations • 26 Mar 2024 • Sherwin Bahmani, Xian Liu, Yifan Wang, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi, David B. Lindell

We learn local deformations that conform to the global trajectory using supervision from a text-to-video model.

Scene Generation Video Generation

Paper
Add Code

AID: Attention Interpolation of Text-to-Image Diffusion

1 code implementation • 26 Mar 2024 • Qiyuan He, Jinghao Wang, Ziwei Liu, Angela Yao

To that end, we introduce a novel training-free technique named Attention Interpolation via Diffusion (AID).

Spatial Interpolation

Paper
Code

Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding

1 code implementation • 25 Mar 2024 • Lingdong Kong, Xiang Xu, Jun Cen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu

Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from 3D perception models.

Data Augmentation Scene Understanding

Paper
Code

ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars

no code implementations • 22 Mar 2024 • Zhenwei Wang, Tengfei Wang, Gerhard Hancke, Ziwei Liu, Rynson W. H. Lau

To this end, we design a two-stage framework that draws a concept image first, followed by a reference-informed 3D modeling stage.

3D Generation Unity

Paper
Add Code

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

2 code implementations • 19 Mar 2024 • Shuai Yang, Yifan Zhou, Ziwei Liu, Chen Change Loy

In this paper, we introduce FRESCO, intra-frame correspondence alongside inter-frame correspondence to establish a more robust spatial-temporal constraint.

Translation valid

588

Paper
Code

WHAC: World-grounded Humans and Cameras

1 code implementation • 19 Mar 2024 • Wanqi Yin, Zhongang Cai, Ruisi Wang, Fanzhou Wang, Chen Wei, Haiyi Mei, Weiye Xiao, Zhitao Yang, Qingping Sun, Atsushi Yamashita, Ziwei Liu, Lei Yang

In this study, we aim to recover expressive parametric human models (i. e., SMPL-X) and corresponding camera poses jointly, by leveraging the synergy between three critical players: the world, the human, and the camera.

Pose Estimation

169

Paper
Code

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

no code implementations • 19 Mar 2024 • Yongwei Chen, Tengfei Wang, Tong Wu, Xingang Pan, Kui Jia, Ziwei Liu

Though promising results have been achieved in single object generation, these methods often struggle to model complex 3D assets that inherently contain multiple objects.

3D Generation Object

Paper
Add Code

InTeX: Interactive Text-to-texture Synthesis via Unified Depth-aware Inpainting

no code implementations • 18 Mar 2024 • Jiaxiang Tang, Ruijie Lu, Xiaokang Chen, Xiang Wen, Gang Zeng, Ziwei Liu

Text-to-texture synthesis has become a new frontier in 3D content creation thanks to the recent advances in text-to-image models.

Texture Synthesis

Paper
Add Code

STAR-RIS Assisted Wireless-Powered and Backscattering Mobile Edge Computing Networks

no code implementations • 5 Mar 2024 • Bin Lyu, Yining Zhang, Pengcheng Chen, Ziwei Liu, Feng Tian

Wireless powered and backscattering mobile edge computing (WPB-MEC) network is a novel network paradigm to supply energy supplies and computing resource to wireless sensors (WSs).

Edge-computing

Paper
Add Code

3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

1 code implementation • 4 Mar 2024 • Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Tengfei Wang, Liang Pan, Dahua Lin, Ziwei Liu

Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping.

3D Generation Text to 3D +1

546

Paper
Code

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

1 code implementation • 7 Feb 2024 • Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, Ziwei Liu

2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models.

1,136

Paper
Code

A Comprehensive Survey on 3D Content Generation

1 code implementation • 2 Feb 2024 • Jian Liu, Xiaoshui Huang, Tianyu Huang, Lu Chen, Yuenan Hou, Shixiang Tang, Ziwei Liu, Wanli Ouyang, WangMeng Zuo, Junjun Jiang, Xianming Liu

Recent years have witnessed remarkable advances in artificial intelligence generated content(AIGC), with diverse input modalities, e. g., text, image, video, audio and 3D.

358

Paper
Code

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

no code implementations • 18 Jan 2024 • Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change Loy

We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process.

Video Inpainting

Paper
Add Code

Exploiting Hierarchical Interactions for Protein Surface Learning

1 code implementation • 17 Jan 2024 • Yiqun Lin, Liang Pan, Yi Li, Ziwei Liu, Xiaomeng Li

In this paper, we present a principled framework based on deep learning techniques, namely Hierarchical Chemical and Geometric Feature Interaction Network (HCGNet), for protein surface analysis by bridging chemical and geometric features with hierarchical interactions.

Paper
Code

Vlogger: Make Your Dream A Vlog

1 code implementation • 17 Jan 2024 • Shaobin Zhuang, Kunchang Li, Xinyuan Chen, Yaohui Wang, Ziwei Liu, Yu Qiao, Yali Wang

More importantly, Vlogger can generate over 5-minute vlogs from open-world descriptions, without loss of video coherence on script and actor.

Language Modelling Large Language Model +1

Paper
Code

Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization

no code implementations • 16 Jan 2024 • Chongzhi Zhang, Mingyuan Zhang, Zhiyang Teng, Jiayi Li, Xizhou Zhu, Lewei Lu, Ziwei Liu, Aixin Sun

Our method involves the direct generation of a global 2D temporal map via a conditional denoising diffusion process, based on the input video and language query.

Denoising Video Understanding

Paper
Add Code

URHand: Universal Relightable Hands

no code implementations • 10 Jan 2024 • Zhaoxi Chen, Gyeongsik Moon, Kaiwen Guo, Chen Cao, Stanislav Pidhorskyi, Tomas Simon, Rohan Joshi, Yuan Dong, Yichen Xu, Bernardo Pires, He Wen, Lucas Evans, Bo Peng, Julia Buffalini, Autumn Trimble, Kevyn McPhail, Melissa Schoeller, Shoou-I Yu, Javier Romero, Michael Zollhöfer, Yaser Sheikh, Ziwei Liu, Shunsuke Saito

To simplify the personalization process while retaining photorealism, we build a powerful universal relightable prior based on neural relighting from multi-view images of hands captured in a light stage with hundreds of identities.

Paper
Add Code

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

1 code implementation • 8 Jan 2024 • Tong Wu, Guandao Yang, Zhibing Li, Kai Zhang, Ziwei Liu, Leonidas Guibas, Dahua Lin, Gordon Wetzstein

These metrics lack the flexibility to generalize to different evaluation criteria and might not align well with human preferences.

3D Generation Text to 3D

173

Paper
Code

Latte: Latent Diffusion Transformer for Video Generation

2 code implementations • 5 Jan 2024 • Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Ziwei Liu, Yuan-Fang Li, Cunjian Chen, Yu Qiao

We propose a novel Latent Diffusion Transformer, namely Latte, for video generation.

Text-to-Video Generation Video Generation

135

Paper
Code

InsActor: Instruction-driven Physics-based Characters

no code implementations • NeurIPS 2023 • Jiawei Ren, Mingyuan Zhang, Cunjun Yu, Xiao Ma, Liang Pan, Ziwei Liu

Generating animation of physics-based characters with intuitive control has long been a desirable task with numerous applications.

Motion Planning

Paper
Add Code

DreamGaussian4D: Generative 4D Gaussian Splatting

1 code implementation • 28 Dec 2023 • Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, Ziwei Liu

Remarkable progress has been made in 4D content generation recently.

393

Paper
Code

Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases

1 code implementation • 22 Dec 2023 • Zhangyang Qi, Ye Fang, Mengchen Zhang, Zeyi Sun, Tong Wu, Ziwei Liu, Dahua Lin, Jiaqi Wang, Hengshuang Zhao

We conducted a series of structured experiments to evaluate their performance in various industrial application scenarios, offering a comprehensive perspective on their practical utility.

182

Paper
Code

FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing

1 code implementation • NeurIPS 2023 • Mingyuan Zhang, Huirong Li, Zhongang Cai, Jiawei Ren, Lei Yang, Ziwei Liu

Notably, FineMoGen further enables zero-shot motion editing capabilities with the aid of modern large language models (LLM), which faithfully manipulates motion sequences with fine-grained instructions.

Ranked #2 on Motion Synthesis on KIT Motion-Language

Motion Synthesis

Paper
Code

InstructVideo: Instructing Video Diffusion Models with Human Feedback

1 code implementation • 19 Dec 2023 • Hangjie Yuan, Shiwei Zhang, Xiang Wang, Yujie Wei, Tao Feng, Yining Pan, Yingya Zhang, Ziwei Liu, Samuel Albanie, Dong Ni

To tackle this problem, we propose InstructVideo to instruct text-to-video diffusion models with human feedback by reward fine-tuning.

Video Generation

Paper
Code

Towards Robust and Expressive Whole-body Human Pose and Shape Estimation

1 code implementation • NeurIPS 2023 • Hui EnPang, Zhongang Cai, Lei Yang, Qingyi Tao, Zhonghua Wu, Tianwei Zhang, Ziwei Liu

Whole-body pose and shape estimation aims to jointly predict different behaviors (e. g., pose, hand gesture, facial expression) of the entire human body from a monocular image.

Paper
Code

FreeInit: Bridging Initialization Gap in Video Diffusion Models

1 code implementation • 12 Dec 2023 • Tianxing Wu, Chenyang Si, Yuming Jiang, Ziqi Huang, Ziwei Liu

Though diffusion-based video generation has witnessed rapid progress, the inference results of existing models still exhibit unsatisfactory temporal consistency and unnatural dynamics.

Denoising Text-to-Video Generation +1

425

Paper
Code

PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation

1 code implementation • NeurIPS 2023 • Zhaoxi Chen, Fangzhou Hong, Haiyi Mei, Guangcong Wang, Lei Yang, Ziwei Liu

We present PrimDiffusion, the first diffusion-based framework for 3D human generation.

3D Inpainting Denoising

100

Paper
Code

HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image

no code implementations • 7 Dec 2023 • Tong Wu, Zhibing Li, Shuai Yang, Pan Zhang, Xinggang Pan, Jiaqi Wang, Dahua Lin, Ziwei Liu

Extensive experiments demonstrate the effectiveness of HyperDreamer in modeling region-aware materials with high-resolution textures and enabling user-friendly editing.

Semantic Segmentation

Paper
Add Code

Digital Life Project: Autonomous 3D Characters with Social Intelligence

no code implementations • 7 Dec 2023 • Zhongang Cai, Jianping Jiang, Zhongfei Qing, Xinying Guo, Mingyuan Zhang, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, Xiangyu Fan, Han Du, Liang Pan, Peng Gao, Zhitao Yang, Yang Gao, Jiaqi Li, Tianxiang Ren, Yukun Wei, Xiaogang Wang, Chen Change Loy, Lei Yang, Ziwei Liu

In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment.

Ranked #2 on Motion Synthesis on InterHuman

Motion Captioning Motion Synthesis

Paper
Add Code

GauHuman: Articulated Gaussian Splatting from Monocular Human Videos

1 code implementation • 5 Dec 2023 • Shoukang Hu, Ziwei Liu

We present, GauHuman, a 3D human model with Gaussian Splatting for both fast training (1 ~ 2 minutes) and real-time rendering (up to 189 FPS), compared with existing NeRF-based implicit representation modelling frameworks demanding hours of training and seconds of rendering per frame.

Ranked #1 on Generalizable Novel View Synthesis on ZJU-MoCap

Generalizable Novel View Synthesis Novel View Synthesis

248

Paper
Code

VideoBooth: Diffusion-based Video Generation with Image Prompts

no code implementations • 1 Dec 2023 • Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu

In this paper, we study the task of video generation with image prompts, which provide more accurate and direct content control beyond the text prompts.

Video Generation

Paper
Add Code

VBench: Comprehensive Benchmark Suite for Video Generative Models

1 code implementation • 29 Nov 2023 • Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, LiMin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations, and also include more video generation models in VBench to drive forward the field of video generation.

Image Generation Video Generation

265

Paper
Code

Panoptic Video Scene Graph Generation

3 code implementations • CVPR 2023 • Jingkang Yang, Wenxuan Peng, Xiangtai Li, Zujin Guo, Liangyu Chen, Bo Li, Zheng Ma, Kaiyang Zhou, Wayne Zhang, Chen Change Loy, Ziwei Liu

PVSG relates to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects grounded with bounding boxes in videos.

Graph Generation Panoptic Scene Graph Generation +5

Paper
Code

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

no code implementations • 28 Nov 2023 • Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, Ziwei Liu

In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance.

Paper
Add Code

SinSR: Diffusion-Based Image Super-Resolution in a Single Step

1 code implementation • 23 Nov 2023 • YuFei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C. Kot, Bihan Wen

Extensive experiments conducted on synthetic and real-world datasets demonstrate that the proposed method can achieve comparable or even superior performance compared to both previous SOTA methods and the teacher model, in just one sampling step, resulting in a remarkable up to x10 speedup for inference.

Image Super-Resolution

125

Paper
Code

OtterHD: A High-Resolution Multi-modality Model

1 code implementation • 7 Nov 2023 • Bo Li, Peiyuan Zhang, Jingkang Yang, Yuanhan Zhang, Fanyi Pu, Ziwei Liu

In this paper, we present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision.

Ranked #79 on Visual Question Answering on MM-Vet

Visual Question Answering

3,436

Paper
Code

A Two-Stage Generative Model with CycleGAN and Joint Diffusion for MRI-based Brain Tumor Detection

1 code implementation • 6 Nov 2023 • Wenxin Wang, Zhuo-Xu Cui, Guanxun Cheng, Chentao Cao, Xi Xu, Ziwei Liu, Haifeng Wang, Yulong Qi, Dong Liang, Yanjie Zhu

However, current supervised learning methods require extensively annotated images and the state-of-the-art generative models used in unsupervised methods often have limitations in covering the whole data distribution.

Anomaly Detection Generative Adversarial Network +2

Paper
Code

Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images

no code implementations • 2 Nov 2023 • Zalan Fabian, Zhongqi Miao, Chunyuan Li, Yuanhan Zhang, Ziwei Liu, Andrés Hernández, Andrés Montes-Rojas, Rafael Escucha, Laura Siabatto, Andrés Link, Pablo Arbeláez, Rahul Dodhia, Juan Lavista Ferres

In particular, we instruction tune vision-language models to generate detailed visual descriptions of camera trap images using similar terminology to experts.

Paper
Add Code

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

no code implementations • 31 Oct 2023 • Xinyuan Chen, Yaohui Wang, Lingjun Zhang, Shaobin Zhuang, Xin Ma, Jiashuo Yu, Yali Wang, Dahua Lin, Yu Qiao, Ziwei Liu

The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of shot-level videos.

Paper
Add Code

PERF: Panoramic Neural Radiance Field from a Single Panorama

1 code implementation • 25 Oct 2023 • Guangcong Wang, Peng Wang, Zhaoxi Chen, Wenping Wang, Chen Change Loy, Ziwei Liu

In this paper, we present PERF, a 360-degree novel view synthesis framework that trains a panoramic neural radiance field from a single panorama.

Novel View Synthesis Text to 3D

158

Paper
Code

FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling

3 code implementations • 23 Oct 2023 • Haonan Qiu, Menghan Xia, Yong Zhang, Yingqing He, Xintao Wang, Ying Shan, Ziwei Liu

With the availability of large-scale video datasets and the advances of diffusion models, text-driven video generation has achieved substantial progress.

Video Generation

313

Paper
Code

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

no code implementations • 12 Oct 2023 • Xian Liu, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Yanyu Li, Dahua Lin, Xihui Liu, Ziwei Liu, Sergey Tulyakov

Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network, where each branch in the model complements to each other with both structural awareness and textural richness.

Image Generation

Paper
Add Code

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

1 code implementation • 12 Oct 2023 • Jingkang Yang, Yuhao Dong, Shuai Liu, Bo Li, Ziyue Wang, Chencheng Jiang, Haoran Tan, Jiamu Kang, Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu

Large vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning.

Decision Making

225

Paper
Code

Deep Geometrized Cartoon Line Inbetweening

1 code implementation • ICCV 2023 • Li SiYao, Tianpei Gu, Weiye Xiao, Henghui Ding, Ziwei Liu, Chen Change Loy

To preserve the precision and detail of the line drawings, we propose a new approach, AnimeInbet, which geometrizes raster line drawings into graphs of endpoints and reframes the inbetweening task as a graph fusion problem with vertex repositioning.

280

Paper
Code

Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing

no code implementations • ICCV 2023 • Lu Dai, Liqian Ma, Shenhan Qian, Hao liu, Ziwei Liu, Hui Xiong

Finally, how to generate diverse and plausible results from a 2D clothing image.

Human Mesh Recovery Pose Estimation

Paper
Add Code

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

1 code implementation • 28 Sep 2023 • Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, Gang Zeng

In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks.

3D Generation

3,598

Paper
Code

Robust Sequential DeepFake Detection

1 code implementation • 26 Sep 2023 • Rui Shao, Tianxing Wu, Ziwei Liu

However, existing methods only focus on detecting one-step facial manipulation.

DeepFake Detection Face Swapping +1

118

Paper
Code

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

2 code implementations • 26 Sep 2023 • Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu

To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.

Ranked #4 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Generation +1

719

Paper
Code

Detecting and Grounding Multi-Modal Media Manipulation and Beyond

1 code implementation • 25 Sep 2023 • Rui Shao, Tianxing Wu, Jianlong Wu, Liqiang Nie, Ziwei Liu

HAMMER performs 1) manipulation-aware contrastive learning between two uni-modal encoders as shallow manipulation reasoning, and 2) modality-aware cross-attention by multi-modal aggregator as deep manipulation reasoning.

Binary Classification Contrastive Learning +4

268

Paper
Code

UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation

1 code implementation • ICCV 2023 • Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Wayne Wu, Ziwei Liu

A holistic human dataset inevitably has insufficient and low-resolution information on local parts.

Paper
Code

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

1 code implementation • 22 Sep 2023 • Jiahao Xie, Wei Li, Xiangtai Li, Ziwei Liu, Yew Soon Ong, Chen Change Loy

We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation.

Data Augmentation Instance Segmentation +1

104

Paper
Code

FreeU: Free Lunch in Diffusion U-Net

1 code implementation • 20 Sep 2023 • Chenyang Si, Ziqi Huang, Yuming Jiang, Ziwei Liu

In this paper, we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves the generation quality on the fly.

Denoising Video Generation

1,390

Paper
Code

Large-Vocabulary 3D Diffusion Model with Transformer

no code implementations • 14 Sep 2023 • Ziang Cao, Fangzhou Hong, Tong Wu, Liang Pan, Ziwei Liu

To this end, we propose a novel triplane-based 3D-aware Diffusion model with TransFormer, DiffTF, for handling challenges via three aspects.

3D Generation

Paper
Add Code

DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields

1 code implementation • 8 Sep 2023 • Junzhe Zhang, Yushi Lan, Shuai Yang, Fangzhou Hong, Quan Wang, Chai Kiat Yeo, Ziwei Liu, Chen Change Loy

In this paper, we address the challenging problem of 3D toonification, which involves transferring the style of an artistic domain onto a target 3D face with stylized geometry and texture.

Paper
Code

ReliTalk: Relightable Talking Portrait Generation from a Single Video

1 code implementation • 5 Sep 2023 • Haonan Qiu, Zhaoxi Chen, Yuming Jiang, Hang Zhou, Xiangyu Fan, Lei Yang, Wayne Wu, Ziwei Liu

Our key insight is to decompose the portrait's reflectance from implicitly learned audio-driven facial normals and images.

Single-Image Portrait Relighting

Paper
Code

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

1 code implementation • 1 Sep 2023 • Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu

3D city generation is a desirable yet challenging task, since humans are more sensitive to structural distortions in urban environments.

Ranked #1 on Scene Generation on OSM

Scene Generation

478

Paper
Code

PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds

no code implementations • 28 Aug 2023 • Zhongang Cai, Liang Pan, Chen Wei, Wanqi Yin, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

To tackle these challenges, we propose a principled framework, PointHPS, for accurate 3D HPS from point clouds captured in real-world settings, which iteratively refines point features through a cascaded architecture.

3D human pose and shape estimation

Paper
Add Code

Towards Real-World Visual Tracking with Temporal Contexts

1 code implementation • 20 Aug 2023 • Ziang Cao, Ziyuan Huang, Liang Pan, Shiwei Zhang, Ziwei Liu, Changhong Fu

To handle those problems, we propose a two-level framework (TCTrack) that can exploit temporal contexts efficiently.

Visual Tracking

152

Paper
Code

HumanLiff: Layer-wise 3D Human Generation with Diffusion Model

no code implementations • 18 Aug 2023 • Shoukang Hu, Fangzhou Hong, Tao Hu, Liang Pan, Haiyi Mei, Weiye Xiao, Lei Yang, Ziwei Liu

In this work, we propose HumanLiff, the first layer-wise 3D human generative model with a unified diffusion process.

3D Generation Neural Rendering

Paper
Add Code

Link-Context Learning for Multimodal LLMs

1 code implementation • 15 Aug 2023 • Yan Tai, Weichen Fan, Zhao Zhang, Feng Zhu, Rui Zhao, Ziwei Liu

The ability to learn from context with novel concepts, and deliver appropriate responses are essential in human conversations.

Few-Shot Learning In-Context Learning +1

Paper
Code

Hierarchy Flow For High-Fidelity Image-to-Image Translation

1 code implementation • 14 Aug 2023 • Weichen Fan, Jinghuan Chen, Ziwei Liu

In this work, we propose Hierarchy Flow, a novel flow-based model to achieve better content preservation during translation.

Image-to-Image Translation Style Transfer +1

Paper
Code

Temporally-Adaptive Models for Efficient Video Understanding

1 code implementation • 10 Aug 2023 • Ziyuan Huang, Shiwei Zhang, Liang Pan, Zhiwu Qing, Yingya Zhang, Ziwei Liu, Marcelo H. Ang Jr

Spatial convolutions are extensively used in numerous deep video models.

Ranked #3 on Action Recognition on EPIC-KITCHENS-100 (using extra training data)

Action Classification Action Recognition +1

215

Paper
Code

Benchmarking and Analyzing Generative Data for Visual Recognition

no code implementations • 25 Jul 2023 • Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, Ziwei Liu

Advancements in large pre-trained generative models have expanded their potential as effective data generators in visual recognition.

Benchmarking Retrieval

Paper
Add Code

DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering

1 code implementation • ICCV 2023 • Wei Cheng, Ruixiang Chen, Wanqi Yin, Siming Fan, Keyu Chen, Honglin He, Huiwen Luo, Zhongang Cai, Jingbo Wang, Yang Gao, Zhengming Yu, Zhengyu Lin, Daxuan Ren, Lei Yang, Ziwei Liu, Chen Change Loy, Chen Qian, Wayne Wu, Dahua Lin, Bo Dai, Kwan-Yee Lin

Realistic human-centric rendering plays a key role in both computer vision and computer graphics.

Camera Calibration Novel View Synthesis

199

Paper
Code

Make-A-Volume: Leveraging Latent Diffusion Models for Cross-Modality 3D Brain MRI Synthesis

no code implementations • 19 Jul 2023 • Lingting Zhu, Zeyue Xue, Zhenchao Jin, Xian Liu, Jingzhen He, Ziwei Liu, Lequan Yu

This paradigm extends the 2D image diffusion model to a volumetric version with a slightly increasing number of parameters and computation, offering a principled solution for generic cross-modality 3D medical image synthesis.

Computational Efficiency Image Generation

Paper
Add Code

Pair then Relation: Pair-Net for Panoptic Scene Graph Generation

1 code implementation • 17 Jul 2023 • Jinghao Wang, Zhengyu Wen, Xiangtai Li, Zujin Guo, Jingkang Yang, Ziwei Liu

Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes.

Graph Generation Panoptic Scene Graph Generation +2

Paper
Code

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

1 code implementation • 13 Jul 2023 • Yi Wang, Yinan He, Yizhuo Li, Kunchang Li, Jiashuo Yu, Xin Ma, Xinhao Li, Guo Chen, Xinyuan Chen, Yaohui Wang, Conghui He, Ping Luo, Ziwei Liu, Yali Wang, LiMin Wang, Yu Qiao

Specifically, we utilize a multi-scale approach to generate video-related descriptions.

Action Recognition Contrastive Learning +7

897

Paper
Code

MMBench: Is Your Multi-modal Model an All-around Player?

2 code implementations • 12 Jul 2023 • YuAn Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin

In response to these challenges, we propose MMBench, a novel multi-modality benchmark.

Ranked #1 on Visual Question Answering on MMBench

Visual Question Answering

2,430

Paper
Code

FunQA: Towards Surprising Video Comprehension

1 code implementation • 26 Jun 2023 • Binzhu Xie, Sicheng Zhang, Zitang Zhou, Bo Li, Yuanhan Zhang, Jack Hessel, Jingkang Yang, Ziwei Liu

Surprising videos, such as funny clips, creative performances, or visual illusions, attract significant attention.

Question Answering Text Generation +3

Paper
Code

Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

2 code implementations • NeurIPS 2023 • Youquan Liu, Lingdong Kong, Jun Cen, Runnan Chen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu

Recent advancements in vision foundation models (VFMs) have opened up new possibilities for versatile and efficient visual perception.

Representation Learning Transfer Learning

493

Paper
Code

OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection

1 code implementation • 15 Jun 2023 • Jingyang Zhang, Jingkang Yang, Pengyun Wang, Haoqi Wang, Yueqian Lin, Haoran Zhang, Yiyou Sun, Xuefeng Du, Kaiyang Zhou, Wayne Zhang, Yixuan Li, Ziwei Liu, Yiran Chen, Hai Li

Out-of-Distribution (OOD) detection is critical for the reliable operation of open-world intelligent systems.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

741

Paper
Code

Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

no code implementations • 13 Jun 2023 • Shuai Yang, Yifan Zhou, Ziwei Liu, Chen Change Loy

The framework includes two parts: key frame translation and full video translation.

Patch Matching Translation

Paper
Add Code

MIMIC-IT: Multi-Modal In-Context Instruction Tuning

2 code implementations • 8 Jun 2023 • Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Fanyi Pu, Jingkang Yang, Chunyuan Li, Ziwei Liu

We release the MIMIC-IT dataset, instruction-response collection pipeline, benchmarks, and the Otter model.

Ranked #81 on Visual Question Answering on MM-Vet

In-Context Learning Visual Question Answering

3,436

Paper
Code

GP-UNIT: Generative Prior for Versatile Unsupervised Image-to-Image Translation

1 code implementation • 7 Jun 2023 • Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy

In this paper, we introduce a novel versatile framework, Generative Prior-guided UNsupervised Image-to-image Translation (GP-UNIT), that improves the quality, applicability and controllability of the existing translation models.

Translation Unsupervised Image-To-Image Translation +1

181

Paper
Code

DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection

1 code implementation • 1 Jun 2023 • Rui Shao, Tianxing Wu, Liqiang Nie, Ziwei Liu

Unlike existing deepfake detection methods merely focusing on low-level forgery patterns, the forgery detection process of our model can be regularized by generalizable high-level semantics from a pre-trained ViT and adapted by global and local low-level forgeries of deepfake data.

DeepFake Detection Face Swapping

Paper
Code

Learning without Forgetting for Vision-Language Models

no code implementations • 30 May 2023 • Da-Wei Zhou, Yuanhan Zhang, Jingyi Ning, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu

While traditional CIL methods focus on visual information to grasp core features, recent advances in Vision-Language Models (VLM) have shown promising capabilities in learning generalizable representations with the aid of textual information.

Class Incremental Learning Incremental Learning

Paper
Add Code

SAD: Segment Any RGBD

1 code implementation • 23 May 2023 • Jun Cen, Yizheng Wu, Kewei Wang, Xingyi Li, Jingkang Yang, Yixuan Pei, Lingdong Kong, Ziwei Liu, Qifeng Chen

The Segment Anything Model (SAM) has demonstrated its effectiveness in segmenting any part of 2D RGB images.

Open Vocabulary Semantic Segmentation Panoptic Segmentation +1

720

Paper
Code

RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars

1 code implementation • NeurIPS 2023 • Dongwei Pan, Long Zhuo, Jingtan Piao, Huiwen Luo, Wei Cheng, Yuxin Wang, Siming Fan, Shengqi Liu, Lei Yang, Bo Dai, Ziwei Liu, Chen Change Loy, Chen Qian, Wayne Wu, Dahua Lin, Kwan-Yee Lin

It is a large-scale digital library for head avatars with three key attributes: 1) High Fidelity: all subjects are captured by 60 synchronized, high-resolution 2K cameras in 360 degrees.

2k Image Matting +2

213

Paper
Code

ConsistentNeRF: Enhancing Neural Radiance Fields with 3D Consistency for Sparse View Synthesis

1 code implementation • 18 May 2023 • Shoukang Hu, Kaichen Zhou, Kaiyu Li, Longhui Yu, Lanqing Hong, Tianyang Hu, Zhenguo Li, Gim Hee Lee, Ziwei Liu

In this paper, we propose ConsistentNeRF, a method that leverages depth information to regularize both multi-view and single-view 3D consistency among pixels.

3D Reconstruction SSIM

Paper
Code

StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator

no code implementations • CVPR 2023 • Jiazhi Guan, Zhanwang Zhang, Hang Zhou, Tianshu Hu, Kaisiyuan Wang, Dongliang He, Haocheng Feng, Jingtuo Liu, Errui Ding, Ziwei Liu, Jingdong Wang

Despite recent advances in syncing lip movements with any audio waves, current methods still struggle to balance generation quality and the model's generalization ability.

Paper
Add Code

Otter: A Multi-Modal Model with In-Context Instruction Tuning

1 code implementation • 5 May 2023 • Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Jingkang Yang, Ziwei Liu

Large language models (LLMs) have demonstrated significant universal capabilities as few/zero-shot learners in various tasks due to their pre-training on vast amounts of text data, as exemplified by GPT-3, which boosted to InstrctGPT and ChatGPT, effectively following natural language instructions to accomplish real-world tasks.

Ranked #8 on Visual Question Answering on BenchLMM

In-Context Learning Instruction Following +2

3,436

Paper
Code

Transmissive Reconfigurable Intelligent Surface Transmitter Empowered Cognitive RSMA Networks

no code implementations • 4 May 2023 • Ziwei Liu, Wen Chen, Zhendong Li, Jinhong Yuan, Qingqing Wu, Kunlun Wang

In this paper, we investigated the downlink transmission problem of a cognitive radio network (CRN) equipped with a novel transmissive reconfigurable intelligent surface (TRIS) transmitter.

Paper
Add Code

Collaborative Diffusion for Multi-Modal Face Generation and Editing

1 code implementation • CVPR 2023 • Ziqi Huang, Kelvin C. K. Chan, Yuming Jiang, Ziwei Liu

In this work, we present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training.

Denoising Face Generation

372

Paper
Code

Transformer-Based Visual Segmentation: A Survey

2 code implementations • 19 Apr 2023 • Xiangtai Li, Henghui Ding, Haobo Yuan, Wenwei Zhang, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy

Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks.

Autonomous Driving Point Cloud Segmentation +1

567

Paper
Code

Variational Relational Point Completion Network for Robust 3D Classification

no code implementations • 18 Apr 2023 • Liang Pan, Xinyi Chen, Zhongang Cai, Junzhe Zhang, Haiyu Zhao, Shuai Yi, Ziwei Liu

Existing point cloud completion methods tend to generate global shape skeletons and hence lack fine local details.

3D Classification Classification +1

Paper
Add Code

Text2Performer: Text-Driven Human Video Generation

1 code implementation • ICCV 2023 • Yuming Jiang, Shuai Yang, Tong Liang Koh, Wayne Wu, Chen Change Loy, Ziwei Liu

In this work, we present Text2Performer to generate vivid human videos with articulated motions from texts.

Video Generation

308

Paper
Code

RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions

1 code implementation • 13 Apr 2023 • Shaoyuan Xie, Lingdong Kong, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, Ziwei Liu

Our experiments further demonstrate that pre-training and depth-free BEV transformation has the potential to enhance out-of-distribution robustness.

Robust Camera Only 3D Object Detection

284

Paper
Code

DiffMimic: Efficient Motion Mimicking with Differentiable Physics

2 code implementations • 6 Apr 2023 • Jiawei Ren, Cunjun Yu, Siwei Chen, Xiao Ma, Liang Pan, Ziwei Liu

Motion mimicking is a foundational task in physics-based character animation.

reinforcement-learning Reinforcement Learning (RL)

257

Paper
Code

Detecting and Grounding Multi-Modal Media Manipulation

1 code implementation • CVPR 2023 • Rui Shao, Tianxing Wu, Ziwei Liu

In this paper, we highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM^4).

Binary Classification Contrastive Learning +4

268

Paper
Code

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

1 code implementation • ICCV 2023 • Mingyuan Zhang, Xinying Guo, Liang Pan, Zhongang Cai, Fangzhou Hong, Huirong Li, Lei Yang, Ziwei Liu

However, the performance on more diverse motions remains unsatisfactory.

Ranked #1 on Motion Synthesis on KIT Motion-Language

Denoising Motion Synthesis +1

291

Paper
Code

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions

1 code implementation • ICCV 2023 • Lingdong Kong, Youquan Liu, Xin Li, Runnan Chen, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, Ziwei Liu

The robustness of 3D perception systems under natural corruptions from environments and sensors is pivotal for safety-critical applications.

Robust 3D Object Detection Robust 3D Semantic Segmentation

270

Paper
Code

SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling

1 code implementation • ICCV 2023 • Zhitao Yang, Zhongang Cai, Haiyi Mei, Shuai Liu, Zhaoxi Chen, Weiye Xiao, Yukun Wei, Zhongfei Qing, Chen Wei, Bo Dai, Wayne Wu, Chen Qian, Dahua Lin, Ziwei Liu, Lei Yang

Synthetic data has emerged as a promising source for 3D human research as it offers low-cost access to large-scale human datasets.

Human Mesh Recovery Neural Rendering

169

Paper
Code

F$^{2}$-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories

1 code implementation • 28 Mar 2023 • Peng Wang, YuAn Liu, Zhaoxi Chen, Lingjie Liu, Ziwei Liu, Taku Komura, Christian Theobalt, Wenping Wang

Based on our analysis, we further propose a novel space-warping method called perspective warping, which allows us to handle arbitrary trajectories in the grid-based NeRF framework.

Novel View Synthesis

893

Paper
Code

SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis

no code implementations • ICCV 2023 • Guangcong Wang, Zhaoxi Chen, Chen Change Loy, Ziwei Liu

Since coarse depth maps are not strictly scaled to the ground-truth depth maps, we propose a simple yet effective constraint, a local depth ranking method, on NeRFs such that the expected depth ranking of the NeRF is consistent with that of the coarse depth maps in local patches.

Novel View Synthesis

Paper
Add Code

A Simple and Generic Framework for Feature Distillation via Channel-wise Transformation

no code implementations • 23 Mar 2023 • Ziwei Liu, Yongtao Wang, Xiaojie Chu

Specifically, we propose a learnable nonlinear channel-wise transformation to align the features of the student and the teacher model.

Image Classification Instance Segmentation +5

Paper
Add Code

ReVersion: Diffusion-Based Relation Inversion from Images

2 code implementations • 23 Mar 2023 • Ziqi Huang, Tianxing Wu, Yuming Jiang, Kelvin C. K. Chan, Ziwei Liu

Specifically, we propose a novel relation-steering contrastive learning scheme to impose two critical properties of the relation prompt: 1) The relation prompt should capture the interaction between objects, enforced by the preposition prior.

Contrastive Learning Relation

426

Paper
Code

SHERF: Generalizable Human NeRF from a Single Image

1 code implementation • ICCV 2023 • Shoukang Hu, Fangzhou Hong, Liang Pan, Haiyi Mei, Lei Yang, Ziwei Liu

To this end, we propose a bank of 3D-aware hierarchical features, including global, point-level, and pixel-aligned features, to facilitate informative encoding.

3D Human Reconstruction

285

Paper
Code

Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation

1 code implementation • CVPR 2023 • Lingting Zhu, Xian Liu, Xuanyu Liu, Rui Qian, Ziwei Liu, Lequan Yu

In this work, we propose a novel diffusion-based framework, named Diffusion Co-Speech Gesture (DiffGesture), to effectively capture the cross-modal audio-to-gesture associations and preserve temporal coherence for high-fidelity audio-driven co-speech gesture generation.

Gesture Generation

210

Paper
Code

Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need

2 code implementations • 13 Mar 2023 • Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu

ADAM is a general framework that can be orthogonally combined with any parameter-efficient tuning method, which holds the advantages of PTM's generalizability and adapted model's adaptivity.

Class Incremental Learning Incremental Learning +1

681

Paper
Code

StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces

1 code implementation • ICCV 2023 • Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy

Recent advances in face manipulation using StyleGAN have produced impressive results.

Attribute Super-Resolution

465

Paper
Code

Rethinking Range View Representation for LiDAR Segmentation

no code implementations • ICCV 2023 • Lingdong Kong, Youquan Liu, Runnan Chen, Yuexin Ma, Xinge Zhu, Yikang Li, Yuenan Hou, Yu Qiao, Ziwei Liu

We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks, i. e., SemanticKITTI, nuScenes, and ScribbleKITTI.

Ranked #4 on 3D Semantic Segmentation on SemanticKITTI

3D Semantic Segmentation Autonomous Driving +4

Paper
Add Code

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

no code implementations • 18 Feb 2023 • Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, Guangjing Wang, Kai Zhang, Cheng Ji, Qiben Yan, Lifang He, Hao Peng, JianXin Li, Jia Wu, Ziwei Liu, Pengtao Xie, Caiming Xiong, Jian Pei, Philip S. Yu, Lichao Sun

This study provides a comprehensive review of recent research advancements, challenges, and opportunities for PFMs in text, image, graph, as well as other data modalities.

Graph Learning Language Modelling +1

Paper
Add Code

Make Your Brief Stroke Real and Stereoscopic: 3D-Aware Simplified Sketch to Portrait Generation

no code implementations • 14 Feb 2023 • Yasheng Sun, Qianyi Wu, Hang Zhou, Kaisiyuan Wang, Tianshu Hu, Chen-Chieh Liao, Shio Miyafuji, Ziwei Liu, Hideki Koike

Creating the photo-realistic version of people sketched portraits is useful to various entertainment purposes.

Paper
Add Code

Deep Class-Incremental Learning: A Survey

2 code implementations • 7 Feb 2023 • Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu

Deep models, e. g., CNNs and Vision Transformers, have achieved impressive achievements in many vision tasks in the closed world.

Class Incremental Learning Image Classification +1

681

Paper
Code

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections

1 code implementation • 2 Feb 2023 • Zhaoxi Chen, Guangcong Wang, Ziwei Liu

Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics.

Ranked #3 on Scene Generation on GoogleEarth

Scene Generation

566

Paper
Code

What Makes Good Examples for Visual In-Context Learning?

1 code implementation • NeurIPS 2023 • Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu

To overcome the problem, we propose a prompt retrieval framework to automate the selection of in-context examples.

In-Context Learning Retrieval

155

Paper
Code

BiBench: Benchmarking and Analyzing Network Binarization

1 code implementation • 26 Jan 2023 • Haotong Qin, Mingyuan Zhang, Yifu Ding, Aoyu Li, Zhongang Cai, Ziwei Liu, Fisher Yu, Xianglong Liu

Network binarization emerges as one of the most promising compression approaches offering extraordinary computation and memory savings by minimizing the bit-width.

Benchmarking Binarization

Paper
Code

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation

1 code implementation • CVPR 2023 • Tong Wu, Jiarui Zhang, Xiao Fu, Yuxin Wang, Jiawei Ren, Liang Pan, Wayne Wu, Lei Yang, Jiaqi Wang, Chen Qian, Dahua Lin, Ziwei Liu

Recent advances in modeling 3D objects mostly rely on synthetic datasets due to the lack of large-scale realscanned 3D databases.

Novel View Synthesis Object +1

416

Paper
Code

DeformToon3D: Deformable Neural Radiance Fields for 3D Toonification

no code implementations • ICCV 2023 • Junzhe Zhang, Yushi Lan, Shuai Yang, Fangzhou Hong, Quan Wang, Chai Kiat Yeo, Ziwei Liu, Chen Change Loy

In this paper, we address the challenging problem of 3D toonification, which involves transferring the style of an artistic domain onto a target 3D face with stylized geometry and texture.

Paper
Add Code

F2-NeRF: Fast Neural Radiance Field Training With Free Camera Trajectories

no code implementations • CVPR 2023 • Peng Wang, YuAn Liu, Zhaoxi Chen, Lingjie Liu, Ziwei Liu, Taku Komura, Christian Theobalt, Wenping Wang

Existing fast grid-based NeRF training frameworks, like Instant-NGP, Plenoxels, DVGO, or TensoRF, are mainly designed for bounded scenes and rely on space warping to handle unbounded scenes.

Novel View Synthesis

Paper
Add Code

Reference-based Image and Video Super-Resolution via C2-Matching

1 code implementation • 19 Dec 2022 • Yuming Jiang, Kelvin C. K. Chan, Xintao Wang, Chen Change Loy, Ziwei Liu

To tackle these challenges, we propose C2-Matching in this work, which performs explicit robust matching crossing transformation and resolution.

Image Super-Resolution Reference-based Super-Resolution +2

191

Paper
Code

Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers

no code implementations • 9 Dec 2022 • Yasheng Sun, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Zhibin Hong, Jingtuo Liu, Errui Ding, Jingdong Wang, Ziwei Liu, Hideki Koike

This requires masking a large percentage of the original image and seamlessly inpainting it with the aid of audio and reference frames.

Paper
Add Code

Audio-Driven Co-Speech Gesture Video Generation

no code implementations • 5 Dec 2022 • Xian Liu, Qianyi Wu, Hang Zhou, Yuanqi Du, Wayne Wu, Dahua Lin, Ziwei Liu

Our key insight is that the co-speech gestures can be decomposed into common motion patterns and subtle rhythmic dynamics.

Video Generation

Paper
Add Code

AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies

1 code implementation • 10 Nov 2022 • Li SiYao, Yuhang Li, Bo Li, Chao Dong, Ziwei Liu, Chen Change Loy

Existing correspondence datasets for two-dimensional (2D) cartoon suffer from simple frame composition and monotonic movements, making them insufficient to simulate real animations.

Optical Flow Estimation

Paper
Code

Joint Communication and Computation Design in Transmissive RMS Transceiver Enabled Multi-Tier Computing Networks

no code implementations • 27 Oct 2022 • Zhendong Li, Wen Chen, Ziwei Liu, Hongying Tang, Jianmin Lu

We formulate a total energy consumption minimization problem by a joint optimization of subcarrier allocation, task input bits, time slot allocation, transmit power allocation and RMS transmissive coefficient while taking into account the constraints of communication resources and computing resources.

Total Energy

Paper
Add Code

OpenOOD: Benchmarking Generalized Out-of-Distribution Detection

3 code implementations • 13 Oct 2022 • Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenxuan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, Xuefeng Du, Kaiyang Zhou, Wayne Zhang, Dan Hendrycks, Yixuan Li, Ziwei Liu

Out-of-distribution (OOD) detection is vital to safety-critical machine learning applications and has thus been extensively studied, with a plethora of methods developed in the literature.

Anomaly Detection Benchmarking +3

741

Paper
Code

EVA3D: Compositional 3D Human Generation from 2D Image Collections

1 code implementation • 10 Oct 2022 • Fangzhou Hong, Zhaoxi Chen, Yushi Lan, Liang Pan, Ziwei Liu

At the core of EVA3D is a compositional human NeRF representation, which divides the human body into local parts.

566

Paper
Code

TripleE: Easy Domain Generalization via Episodic Replay

1 code implementation • 4 Oct 2022 • Xiaomeng Li, Hongyu Ren, Huifeng Yao, Ziwei Liu

In this paper, we propose TripleE, and the main idea is to encourage the network to focus on training on subsets (learning with replay) and enlarge the data space in learning on subsets.

Domain Generalization

Paper
Code

StyleSwap: Style-Based Generator Empowers Robust Face Swapping

no code implementations • 27 Sep 2022 • Zhiliang Xu, Hang Zhou, Zhibin Hong, Ziwei Liu, Jiaming Liu, Zhizhi Guo, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Our core idea is to leverage a style-based generator to empower high-fidelity and robust face swapping, thus the generator's advantage can be adopted for optimizing identity similarity.

Face Swapping

Paper
Add Code

VToonify: Controllable High-Resolution Portrait Video Style Transfer

1 code implementation • 22 Sep 2022 • Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy

Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency.

Face Alignment Style Transfer +2

3,461

Paper
Code

Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms

1 code implementation • 21 Sep 2022 • Hui En Pang, Zhongang Cai, Lei Yang, Tianwei Zhang, Ziwei Liu

Experiments with 10 backbones, ranging from CNNs to transformers, show the knowledge learnt from a proximity task is readily transferable to human mesh recovery.

3D human pose and shape estimation Benchmarking +1

112

Paper
Code

Text2Light: Zero-Shot Text-Driven HDR Panorama Generation

1 code implementation • 20 Sep 2022 • Zhaoxi Chen, Guangcong Wang, Ziwei Liu

To achieve super-resolution inverse tone mapping, we derive a continuous representation of 360-degree imaging from the LDR panorama as a set of structured latent codes anchored to the sphere.

4k inverse tone mapping +3

540

Paper
Code

On-Device Domain Generalization

2 code implementations • 15 Sep 2022 • Kaiyang Zhou, Yuanhan Zhang, Yuhang Zang, Jingkang Yang, Chen Change Loy, Ziwei Liu

Another interesting observation is that the teacher-student gap on out-of-distribution data is bigger than that on in-distribution data, which highlights the capacity mismatch issue as well as the shortcoming of KD.

Data Augmentation Domain Generalization +2

255

Paper
Code

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

2 code implementations • 31 Aug 2022 • Mingyuan Zhang, Zhongang Cai, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, Ziwei Liu

Instead of a deterministic language-motion mapping, MotionDiffuse generates motions through a series of denoising steps in which variations are injected.

Ranked #17 on Motion Synthesis on KIT Motion-Language

Denoising Motion Synthesis

769

Paper
Code

Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction

1 code implementation • 26 Aug 2022 • Tong Wu, Jiaqi Wang, Xingang Pan, Xudong Xu, Christian Theobalt, Ziwei Liu, Dahua Lin

Previous methods based on neural volume rendering mostly train a fully implicit model with MLPs, which typically require hours of training for a single scene.

Surface Reconstruction

399

Paper
Code

Mind the Gap in Distilling StyleGANs

1 code implementation • 18 Aug 2022 • Guodong Xu, Yuenan Hou, Ziwei Liu, Chen Change Loy

To further enhance the semantic consistency between the teacher and student model, we present a latent-direction-based distillation loss that preserves the semantic relations in latent space.

Knowledge Distillation

Paper
Code

Open Long-Tailed Recognition in a Dynamic World

no code implementations • 17 Aug 2022 • Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, Stella X. Yu

A practical recognition system must balance between majority (head) and minority (tail) classes, generalize across the distribution, and acknowledge novelty upon the instances of unseen classes (open classes).

Active Learning Classification +4

Paper
Add Code

StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3

1 code implementation • 16 Aug 2022 • Haonan Qiu, Yuming Jiang, Hang Zhou, Wayne Wu, Ziwei Liu

Notably, StyleFaceV is capable of generating realistic $1024\times1024$ face videos even without high-resolution training videos.

Image Generation Video Generation

131

Paper
Code

Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer

1 code implementation • 10 Aug 2022 • Zhipeng Luo, Changqing Zhou, Liang Pan, Gongjie Zhang, Tianrui Liu, Yueru Luo, Haiyu Zhao, Ziwei Liu, Shijian Lu

In a point cloud sequence, 3D object tracking aims to predict the location and orientation of an object in consecutive frames given an object template.

3D Object Tracking Autonomous Driving +3

115

Paper
Code

StyleLight: HDR Panorama Generation for Lighting Estimation and Editing

1 code implementation • 29 Jul 2022 • Guangcong Wang, Yinuo Yang, Chen Change Loy, Ziwei Liu

To tackle this problem, we propose a coupled dual-StyleGAN panorama synthesis network (StyleLight) that integrates LDR and HDR panorama synthesis into a unified framework.

Lighting Estimation

112

Paper
Code

Multi-Forgery Detection Challenge 2022: Push the Frontier of Unconstrained and Diverse Forgery Detection

no code implementations • 27 Jul 2022 • Jianshu Li, Man Luo, Jian Liu, Tao Chen, Chengjie Wang, Ziwei Liu, Shuo Liu, Kewei Yang, Xuning Shao, Kang Chen, Boyuan Liu, Mingyu Guo, Ying Guo, Yingying Ao, Pengfei Gao

In this paper, we present the solutions from the Top 3 teams, in order to boost the research work in the field of image forgery detection.

Image Forgery Detection Image Generation +1

Paper
Add Code

CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

1 code implementation • 25 Jul 2022 • Hao Zhu, Wayne Wu, Wentao Zhu, Liming Jiang, Siwei Tang, Li Zhang, Ziwei Liu, Chen Change Loy

Large-scale datasets have played indispensable roles in the recent success of face generation/editing and significantly facilitated the advances of emerging research fields.

Ranked #1 on Unconditional Video Generation on CelebV-HQ

Attribute Face Generation +1

349

Paper
Code

Panoptic Scene Graph Generation

1 code implementation • 22 Jul 2022 • Jingkang Yang, Yi Zhe Ang, Zujin Guo, Kaiyang Zhou, Wayne Zhang, Ziwei Liu

Existing research addresses scene graph generation (SGG) -- a critical technology for scene understanding in images -- from a detection perspective, i. e., objects are detected using bounding boxes followed by prediction of their pairwise relationships.

Ranked #5 on Panoptic Scene Graph Generation on PSG Dataset

Benchmarking Panoptic Scene Graph Generation +1

384

Paper
Code

UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation

1 code implementation • 20 Jul 2022 • Shenhan Qian, Jiale Xu, Ziwei Liu, Liqian Ma, Shenghua Gao

We propose united implicit functions (UNIF), a part-based method for clothed human reconstruction and animation with raw scans and skeletons as the input.

Position

Paper
Code

Benchmarking Omni-Vision Representation through the Lens of Visual Realms

1 code implementation • 14 Jul 2022 • Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu

We benchmark ReCo and other advances in omni-vision representation studies that are different in architectures (from CNNs to transformers) and in learning paradigms (from supervised learning to self-supervised learning) on OmniBenchmark.

Benchmarking Contrastive Learning +2

105

Paper
Code

Relighting4D: Neural Relightable Human from Videos

1 code implementation • 14 Jul 2022 • Zhaoxi Chen, Ziwei Liu

Our key insight is that the space-time varying geometry and reflectance of the human body can be decomposed as a set of neural fields of normal, occlusion, diffuse, and specular maps.

254

Paper
Code

Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis

1 code implementation • 11 Jul 2022 • Long Zhuo, Guangcong Wang, Shikai Li, Wayne Wu, Ziwei Liu

In this paper, we present a spatial-temporal compression framework, \textbf{Fast-Vid2Vid}, which focuses on data aspects of generative models.

Knowledge Distillation Motion Compensation +1

150

Paper
Code

Detecting and Recovering Sequential DeepFake Manipulation

1 code implementation • 5 Jul 2022 • Rui Shao, Tianxing Wu, Ziwei Liu

Moreover, we build a comprehensive benchmark and set up rigorous evaluation protocols and metrics for this new research problem.

DeepFake Detection Face Swapping +2

118

Paper
Code

LaserMix for Semi-Supervised LiDAR Semantic Segmentation

2 code implementations • CVPR 2023 • Lingdong Kong, Jiawei Ren, Liang Pan, Ziwei Liu

Densely annotating LiDAR point clouds is costly, which restrains the scalability of fully-supervised learning methods.

Ranked #1 on Semi-Supervised Semantic Segmentation on ScribbleKITTI

LIDAR Semantic Segmentation Segmentation +1

256

Paper
Code

Masked Frequency Modeling for Self-Supervised Visual Pre-Training

3 code implementations • 15 Jun 2022 • Jiahao Xie, Wei Li, Xiaohang Zhan, Ziwei Liu, Yew Soon Ong, Chen Change Loy

We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models.

Image Classification Image Restoration +2

Paper
Code

Neural Prompt Search

1 code implementation • 9 Jun 2022 • Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu

The size of vision models has grown exponentially over the last few years, especially after the emergence of Vision Transformer.

Ranked #1 on Image Classification on OmniBenchmark (using extra training data)

Few-Shot Learning Image Classification +3

202

Paper
Code

Sparse Mixture-of-Experts are Domain Generalizable Learners

1 code implementation • 8 Jun 2022 • Bo Li, Yifei Shen, Jingkang Yang, Yezhen Wang, Jiawei Ren, Tong Che, Jun Zhang, Ziwei Liu

It is motivated by an empirical finding that transformer-based models trained with empirical risk minimization (ERM) outperform CNN-based models employing state-of-the-art (SOTA) DG algorithms on multiple DG datasets.

Ranked #11 on Domain Generalization on DomainNet (using extra training data)

Domain Generalization Object Recognition

279

Paper
Code

Text2Human: Text-Driven Controllable Human Image Generation

2 code implementations • 31 May 2022 • Yuming Jiang, Shuai Yang, Haonan Qiu, Wayne Wu, Chen Change Loy, Ziwei Liu

In this work, we present a text-driven controllable framework, Text2Human, for a high-quality and diverse human generation.

Human Parsing Image Generation

803

Paper
Code

Free Lunch for Surgical Video Understanding by Distilling Self-Supervisions

1 code implementation • 19 May 2022 • Xinpeng Ding, Ziwei Liu, Xiaomeng Li

Our key insight is to distill knowledge from publicly available models trained on large generic datasets4 to facilitate the self-supervised learning of surgical videos.

Contrastive Learning Self-Supervised Learning +2

Paper
Code

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

1 code implementation • 17 May 2022 • Fangzhou Hong, Mingyuan Zhang, Liang Pan, Zhongang Cai, Lei Yang, Ziwei Liu

Our key insight is to take advantage of the powerful vision-language model CLIP for supervising neural human generation, in terms of 3D geometry, texture and animation.

Language Modelling Motion Synthesis +1

1,038

Paper
Code

HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling

no code implementations • 28 Apr 2022 • Zhongang Cai, Daxuan Ren, Ailing Zeng, Zhengyu Lin, Tao Yu, Wenjia Wang, Xiangyu Fan, Yang Gao, Yifan Yu, Liang Pan, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

4D human sensing and modeling are fundamental tasks in vision and graphics with numerous applications.

Fine-grained Action Recognition Pose Estimation

Paper
Add Code

Robust Face Anti-Spoofing with Dual Probabilistic Modeling

no code implementations • 27 Apr 2022 • Yuanhan Zhang, Yichao Wu, Zhenfei Yin, Jing Shao, Ziwei Liu

In this work, we attempt to fill this gap by automatically addressing the noise problem from both label and data perspectives in a probabilistic manner.

Face Anti-Spoofing

Paper
Add Code

StyleGAN-Human: A Data-Centric Odyssey of Human Generation

4 code implementations • 25 Apr 2022 • Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Chen Qian, Chen Change Loy, Wayne Wu, Ziwei Liu

In addition, a model zoo and human editing applications are demonstrated to facilitate future research in the community.

Image Generation

1,093

Paper
Code

Few-shot Forgery Detection via Guided Adversarial Interpolation

no code implementations • 12 Apr 2022 • Haonan Qiu, Siyu Chen, Bei Gan, Kun Wang, Huafeng Shi, Jing Shao, Ziwei Liu

Notably, our method is also validated to be robust to choices of majority and minority forgery approaches.

Paper
Add Code

Full-Spectrum Out-of-Distribution Detection

1 code implementation • 11 Apr 2022 • Jingkang Yang, Kaiyang Zhou, Ziwei Liu

In this paper, we take into account both shift types and introduce full-spectrum OOD (FS-OOD) detection, a more realistic problem setting that considers both detecting semantic shift and being tolerant to covariate shift; and designs three benchmarks.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

741

Paper
Code

Unsupervised Image-to-Image Translation with Generative Prior

1 code implementation • CVPR 2022 • Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy

In this work, we present a novel framework, Generative Prior-guided UNsupervised Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm.

Translation Unsupervised Image-To-Image Translation

181

Paper
Code

Balanced MSE for Imbalanced Visual Regression

1 code implementation • CVPR 2022 • Jiawei Ren, Mingyuan Zhang, Cunjun Yu, Ziwei Liu

Data imbalance exists ubiquitously in real-world visual regressions, e. g., age estimation and pose estimation, hurting the model's generalizability and fairness.

Age Estimation Fairness +3

350

Paper
Code

Versatile Multi-Modal Pre-Training for Human-Centric Perception

1 code implementation • CVPR 2022 • Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu

To tackle the challenges, we design the novel Dense Intra-sample Contrastive Learning and Sparse Structure-aware Contrastive Learning targets by hierarchically learning a modal-invariant latent space featured with continuous and ordinal feature distribution and structure-aware semantic consistency.

Contrastive Learning Human Parsing +1

115

Paper
Code

SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance

no code implementations • 25 Mar 2022 • Xinchi Zhou, Dongzhan Zhou, Wanli Ouyang, Hang Zhou, Ziwei Liu, Di Hu

Recent years have witnessed the success of deep learning on the visual sound separation task.

Paper
Add Code

Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory

1 code implementation • CVPR 2022 • Li SiYao, Weijiang Yu, Tianpei Gu, Chunze Lin, Quan Wang, Chen Qian, Chen Change Loy, Ziwei Liu

With the learned choreographic memory, dance generation is realized on the quantized units that meet high choreography standards, such that the generated dancing sequences are confined within the spatial constraints.

Ranked #1 on Motion Synthesis on AIST++

Motion Synthesis

362

Paper
Code

Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer

1 code implementation • CVPR 2022 • Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy

Recent studies on StyleGAN show high performance on artistic portrait generation by transfer learning with limited data.

Style Transfer Transfer Learning +1

1,571

Paper
Code

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

no code implementations • 16 Mar 2022 • Yinan He, Gengshi Huang, Siyu Chen, Jianing Teng, Wang Kun, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao

2) Squeeze Stage: X-Learner condenses the model to a reasonable size and learns the universal and generalizable representation for various tasks transferring.

object-detection Object Detection +3

Paper
Add Code

Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy

2 code implementations • 15 Mar 2022 • Yuanhan Zhang, Qinghong Sun, Yichun Zhou, Zexin He, Zhenfei Yin, Kun Wang, Lu Sheng, Yu Qiao, Jing Shao, Ziwei Liu

This work thus proposes a novel active learning framework for realistic dataset annotation.

Ranked #1 on Image Classification on Food-101 (using extra training data)

Active Learning Classification +3

161

Paper
Code

LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network

1 code implementation • 14 Mar 2022 • Fangzhou Hong, Hui Zhou, Xinge Zhu, Hongsheng Li, Ziwei Liu

In this work, we address the task of LiDAR-based panoptic segmentation, which aims to parse both objects and scenes in a unified manner.

4D Panoptic Segmentation Autonomous Driving +3

230

Paper
Code

BiBERT: Accurate Fully Binarized BERT

1 code implementation • ICLR 2022 • Haotong Qin, Yifu Ding, Mingyuan Zhang, Qinghua Yan, Aishan Liu, Qingqing Dang, Ziwei Liu, Xianglong Liu

The large pre-trained BERT has achieved remarkable performance on Natural Language Processing (NLP) tasks but is also computation and memory expensive.

Binarization

Paper
Code

Conditional Prompt Learning for Vision-Language Models

9 code implementations • CVPR 2022 • Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets.

Ranked #3 on Prompt Engineering on ImageNet V2

Domain Generalization Prompt Engineering

1,465

Paper
Code

TCTrack: Temporal Contexts for Aerial Tracking

1 code implementation • CVPR 2022 • Ziang Cao, Ziyuan Huang, Liang Pan, Shiwei Zhang, Ziwei Liu, Changhong Fu

Temporal contexts among consecutive frames are far from being fully utilized in existing visual trackers.

152

Paper
Code

Visual Sound Localization in the Wild by Cross-Modal Interference Erasing

1 code implementation • 13 Feb 2022 • Xian Liu, Rui Qian, Hang Zhou, Di Hu, Weiyao Lin, Ziwei Liu, Bolei Zhou, Xiaowei Zhou

Specifically, we observe that the previous practice of learning only a single audio representation is insufficient due to the additive nature of audio signals.

Paper
Code

Benchmarking and Analyzing Point Cloud Classification under Corruptions

4 code implementations • 7 Feb 2022 • Jiawei Ren, Liang Pan, Ziwei Liu

3D perception, especially point cloud classification, has achieved substantial progress.

Ranked #7 on Point Cloud Classification on PointCloud-C

Benchmarking Classification +1

162

Paper
Code

Full-Range Virtual Try-On With Recurrent Tri-Level Transform

no code implementations • CVPR 2022 • Han Yang, Xinrui Yu, Ziwei Liu

Virtual try-on aims to transfer a target clothing image onto a reference person.

Ranked #4 on Virtual Try-on on VITON

Virtual Try-on

Paper
Add Code

Multi-View Partial (MVP) Point Cloud Challenge 2021 on Completion and Registration: Methods and Results

2 code implementations • 22 Dec 2021 • Liang Pan, Tong Wu, Zhongang Cai, Ziwei Liu, Xumin Yu, Yongming Rao, Jiwen Lu, Jie zhou, Mingye Xu, Xiaoyuan Luo, Kexue Fu, Peng Gao, Manning Wang, Yali Wang, Yu Qiao, Junsheng Zhou, Xin Wen, Peng Xiang, Yu-Shen Liu, Zhizhong Han, Yuanjie Yan, Junyi An, Lifa Zhu, Changwei Lin, Dongrui Liu, Xin Li, Francisco Gómez-Fernández, Qinlong Wang, Yang Yang

Based on the MVP dataset, this paper reports methods and results in the Multi-View Partial Point Cloud Challenge 2021 on Completion and Registration.

3D Reconstruction Point Cloud Completion +2

150

Paper
Code

ForgeryNet -- Face Forgery Analysis Challenge 2021: Methods and Results

no code implementations • 15 Dec 2021 • Yinan He, Lu Sheng, Jing Shao, Ziwei Liu, Zhaofan Zou, Zhizhi Guo, Shan Jiang, Curitis Sun, Guosheng Zhang, Keyao Wang, Haixiao Yue, Zhibin Hong, Wanguo Wang, Zhenyu Li, Qi Wang, Zhenli Wang, Ronghao Xu, Mingwen Zhang, Zhiheng Wang, Zhenhang Huang, Tianming Zhang, Ningning Zhao

The rapid progress of photorealistic synthesis techniques has reached a critical point where the boundary between real and manipulated images starts to blur.

valid

Paper
Add Code

Garment4D: Garment Reconstruction from Point Cloud Sequences

1 code implementation • NeurIPS 2021 • Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu

The main challenges are two-fold: 1) effective 3D feature learning for fine details, and 2) capture of garment dynamics caused by the interaction between garments and the human body, especially for loose garments like skirts.

Garment Reconstruction

128

Paper
Code

Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion

1 code implementation • NeurIPS 2021 • Tong Wu, Liang Pan, Junzhe Zhang, Tai Wang, Ziwei Liu, Dahua Lin

We adopt DCD to evaluate the point cloud completion task, where experimental results show that DCD pays attention to both the overall structure and local geometric details and provides a more reliable evaluation even when CD and EMD contradict each other.

Point Cloud Completion

133

Paper
Code

Robust Partial-to-Partial Point Cloud Registration in a Full Range

1 code implementation • 30 Nov 2021 • Liang Pan, Zhongang Cai, Ziwei Liu

\textbf{3)} Based on a synergy of hierarchical graph networks and graphical modeling, we propose the {H}ierarchical {G}raphical {M}odeling (\textbf{HGM}) architecture to encode robust descriptors consisting of i) a unary term learned from {\textit{RI}} features; and ii) multiple smoothness terms encoded from neighboring point relations at different scales through our TPT modules.

Graph Matching Point Cloud Registration

Paper
Code

Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion

1 code implementation • 24 Nov 2021 • Tong Wu, Liang Pan, Junzhe Zhang, Tai Wang, Ziwei Liu, Dahua Lin

Point Cloud Completion

133

Paper
Code

Few-Shot Object Detection via Association and DIscrimination

1 code implementation • NeurIPS 2021 • Yuhang Cao, Jiaqi Wang, Ying Jin, Tong Wu, Kai Chen, Ziwei Liu, Dahua Lin

1) In the association step, in contrast to implicitly leveraging multiple base classes, we construct a compact novel class feature space via explicitly imitating a specific base class feature space.

Few-Shot Object Detection Object +3

Paper
Code

Lifting 2D Human Pose to 3D with Domain Adapted 3D Body Concept

no code implementations • 23 Nov 2021 • Qiang Nie, Ziwei Liu, Yunhui Liu

Inspired by this, we propose a new framework that leverages the labeled 3D human poses to learn a 3D concept of the human body to reduce the ambiguity.

3D Pose Estimation Domain Adaptation

Paper
Add Code

Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements

no code implementations • 1 Nov 2021 • Yu Rong, Jingbo Wang, Ziwei Liu, Chen Change Loy

In this paper, we make the first attempt to reconstruct 3D interacting hands from monocular single RGB images.

3D Reconstruction

Paper
Add Code

Generalized Out-of-Distribution Detection: A Survey

3 code implementations • 21 Oct 2021 • Jingkang Yang, Kaiyang Zhou, Yixuan Li, Ziwei Liu

In this survey, we first present a unified framework called generalized OOD detection, which encompasses the five aforementioned problems, i. e., AD, ND, OSR, OOD detection, and OD.

Anomaly Detection Autonomous Driving +5

741

Paper
Code

Playing for 3D Human Recovery

no code implementations • 14 Oct 2021 • Zhongang Cai, Mingyuan Zhang, Jiawei Ren, Chen Wei, Daxuan Ren, Zhengyu Lin, Haiyu Zhao, Lei Yang, Chen Change Loy, Ziwei Liu

Specifically, we contribute GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine, featuring a highly diverse set of subjects, actions, and scenarios.

Paper
Add Code

TAda! Temporally-Adaptive Convolutions for Video Understanding

2 code implementations • ICLR 2022 • Ziyuan Huang, Shiwei Zhang, Liang Pan, Zhiwu Qing, Mingqian Tang, Ziwei Liu, Marcelo H. Ang Jr

This work presents Temporally-Adaptive Convolutions (TAdaConv) for video understanding, which shows that adaptive weight calibration along the temporal dimension is an efficient way to facilitate modelling complex temporal dynamics in videos.

Ranked #67 on Action Recognition on Something-Something V2 (using extra training data)

Action Classification Action Recognition +2

215

Paper
Code

Bayesian Imbalanced Regression Debiasing

no code implementations • 29 Sep 2021 • Jiawei Ren, Mingyuan Zhang, Cunjun Yu, Ziwei Liu

Compared to imbalanced and long-tailed classification, imbalanced regression has its unique challenges as the regression label space can be continuous, boundless, and high-dimensional.

Age Estimation imbalanced classification +2

Paper
Add Code

A Comprehensive Overhaul of Distilling Unconditional GANs

no code implementations • 29 Sep 2021 • Guodong Xu, Yuenan Hou, Ziwei Liu, Chen Change Loy

To further enhance the semantic consistency between the teacher and student model, we present another latent-direction-based distillation loss that preserves the semantic relations in latent space.

Knowledge Distillation

Paper
Add Code

Talk-to-Edit: Fine-Grained Facial Editing via Dialog

1 code implementation • ICCV 2021 • Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, Ziwei Liu

In this work, we propose Talk-to-Edit, an interactive facial editing framework that performs fine-grained attribute manipulation through dialog between the user and the system.

Ranked #1 on Fine-Grained Facial Editing on CelebA-Dialog

Attribute Facial Editing +1

302

Paper
Code

Learning to Prompt for Vision-Language Models

13 code implementations • 2 Sep 2021 • Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu

Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks.

Ranked #2 on Few-shot Age Estimation on MORPH Album2

Domain Generalization Few-shot Age Estimation +2

1,465

Paper
Code

Semantically Coherent Out-of-Distribution Detection

2 code implementations • ICCV 2021 • Jingkang Yang, Haoqi Wang, Litong Feng, Xiaopeng Yan, Huabin Zheng, Wayne Zhang, Ziwei Liu

The proposed UDG can not only enrich the semantic knowledge of the model by exploiting unlabeled data in an unsupervised manner, but also distinguish ID/OOD samples to enhance ID classification and OOD detection tasks simultaneously.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

Paper
Code

Energy-Based Open-World Uncertainty Modeling for Confidence Calibration

no code implementations • ICCV 2021 • Yezhen Wang, Bo Li, Tong Che, Kaiyang Zhou, Ziwei Liu, Dongsheng Li

Confidence calibration is of great importance to the reliability of decisions made by machine learning systems.

Paper
Add Code

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency

1 code implementation • ICCV 2021 • Zhipeng Luo, Zhongang Cai, Changqing Zhou, Gongjie Zhang, Haiyu Zhao, Shuai Yi, Shijian Lu, Hongsheng Li, Shanghang Zhang, Ziwei Liu

In addition, existing 3D domain adaptive detection methods often assume prior access to the target domain annotations, which is rarely feasible in the real world.

3D Object Detection Autonomous Driving +1

Paper
Code

Unsupervised Object-Level Representation Learning from Scene Images

1 code implementation • NeurIPS 2021 • Jiahao Xie, Xiaohang Zhan, Ziwei Liu, Yew Soon Ong, Chen Change Loy

Extensive experiments on COCO show that ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks.

Object Representation Learning +2

Paper
Code

Delving Deep into the Generalization of Vision Transformers under Distribution Shifts

1 code implementation • CVPR 2022 • Chongzhi Zhang, Mingyuan Zhang, Shanghang Zhang, Daisheng Jin, Qiang Zhou, Zhongang Cai, Haiyu Zhao, Xianglong Liu, Ziwei Liu

By comprehensively investigating these GE-ViTs and comparing with their corresponding CNN models, we observe: 1) For the enhanced model, larger ViTs still benefit more for the OOD generalization.

Out-of-Distribution Generalization Self-Supervised Learning

Paper
Code

Robust Reference-based Super-Resolution via C2-Matching

1 code implementation • CVPR 2021 • Yuming Jiang, Kelvin C. K. Chan, Xintao Wang, Chen Change Loy, Ziwei Liu

However, performing local transfer is difficult because of two gaps between input and reference images: the transformation gap (e. g. scale and rotation) and the resolution gap (e. g. HR and LR).

Reference-based Super-Resolution

191

Paper
Code

Semi-Supervised Domain Generalization with Stochastic StyleMatch

2 code implementations • 1 Jun 2021 • Kaiyang Zhou, Chen Change Loy, Ziwei Liu

We find that the DG methods, which by design are unable to handle unlabeled data, perform poorly with limited labels in SSDG; the SSL methods, especially FixMatch, obtain much better results but are still far away from the basic vanilla model trained using full labels.

Domain Generalization Semi-Supervised Domain Generalization

1,078

Paper
Code

Iterative Human and Automated Identification of Wildlife Images

1 code implementation • 5 May 2021 • Zhongqi Miao, Ziwei Liu, Kaitlyn M. Gaynor, Meredith S. Palmer, Stella X. Yu, Wayne M. Getz

Camera trapping is increasingly used to monitor wildlife, but this technology typically requires extensive data annotation.

Paper
Code

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation

1 code implementation • CVPR 2021 • Hang Zhou, Yasheng Sun, Wayne Wu, Chen Change Loy, Xiaogang Wang, Ziwei Liu

While speech content information can be defined by learning the intrinsic synchronization between audio-visual modalities, we identify that a pose code will be complementarily learned in a modulated convolution-based reconstruction framework.

Talking Face Generation

902

Paper
Code

Variational Relational Point Completion Network

1 code implementation • CVPR 2021 • Liang Pan, Xinyi Chen, Zhongang Cai, Junzhe Zhang, Haiyu Zhao, Shuai Yi, Ziwei Liu

In particular, we propose a dual-path architecture to enable principled probabilistic modeling across partial and complete clouds.

Ranked #2 on Point Cloud Completion on Completion3D

Point Cloud Completion

150

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.