Search Results for author: Ziwei Liu

Found 264 papers, 185 papers with code

MMInA: Benchmarking Multihop Multimodal Internet Agents

no code implementations15 Apr 2024 Ziniu Zhang, Shulin Tian, Liangyu Chen, Ziwei Liu

To answer this question, we present MMInA, a multihop and multimodal benchmark to evaluate the embodied agents for compositional Internet tasks, with several appealing properties: 1) Evolving real-world multimodal websites.

Benchmarking

Move Anything with Layered Scene Diffusion

no code implementations10 Apr 2024 Jiawei Ren, Mengmeng Xu, Jui-Chieh Wu, Ziwei Liu, Tao Xiang, Antoine Toisoul

Diffusion models generate images with an unprecedented level of quality, but how can we freely rearrange image layouts?

Denoising Disentanglement

FashionEngine: Interactive Generation and Editing of 3D Clothed Humans

no code implementations2 Apr 2024 Tao Hu, Fangzhou Hong, Zhaoxi Chen, Ziwei Liu

FashionEngine automates the 3D human production with three key components: 1) A pre-trained 3D human diffusion model that learns to model 3D humans in a semantic UV latent space from 2D image training data, which provides strong priors for diverse generation and editing tasks.

Virtual Try-on

SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering

no code implementations1 Apr 2024 Tao Hu, Fangzhou Hong, Ziwei Liu

2) Physical motion decoding that is designed to encourage physical motion learning by decoding the motion triplane features at timestep t to predict both spatial derivatives and temporal derivatives at the next timestep t+1 in the training stage.

Generalizable Novel View Synthesis Novel View Synthesis

StructLDM: Structured Latent Diffusion for 3D Human Generation

no code implementations1 Apr 2024 Tao Hu, Fangzhou Hong, Ziwei Liu

2) A structured 3D-aware auto-decoder that factorizes the global latent space into several semantic body parts parameterized by a set of conditional structured local NeRFs anchored to the body template, which embeds the properties learned from the 2D training data and can be decoded to render view-consistent humans under different poses and clothing styles.

Virtual Try-on

Large Motion Model for Unified Multi-Modal Motion Generation

no code implementations1 Apr 2024 Mingyuan Zhang, Daisheng Jin, Chenyang Gu, Fangzhou Hong, Zhongang Cai, Jingfang Huang, Chongzhi Zhang, Xinying Guo, Lei Yang, Ying He, Ziwei Liu

In this work, we present Large Motion Model (LMM), a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model.

Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

no code implementations27 Mar 2024 Li SiYao, Tianpei Gu, Zhitao Yang, Zhengyu Lin, Ziwei Liu, Henghui Ding, Lei Yang, Chen Change Loy

We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm.

AID: Attention Interpolation of Text-to-Image Diffusion

1 code implementation26 Mar 2024 Qiyuan He, Jinghao Wang, Ziwei Liu, Angela Yao

To that end, we introduce a novel training-free technique named Attention Interpolation via Diffusion (AID).

Spatial Interpolation

Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding

1 code implementation25 Mar 2024 Lingdong Kong, Xiang Xu, Jun Cen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu

Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from 3D perception models.

Data Augmentation Scene Understanding

ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars

no code implementations22 Mar 2024 Zhenwei Wang, Tengfei Wang, Gerhard Hancke, Ziwei Liu, Rynson W. H. Lau

To this end, we design a two-stage framework that draws a concept image first, followed by a reference-informed 3D modeling stage.

3D Generation Unity

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

2 code implementations19 Mar 2024 Shuai Yang, Yifan Zhou, Ziwei Liu, Chen Change Loy

In this paper, we introduce FRESCO, intra-frame correspondence alongside inter-frame correspondence to establish a more robust spatial-temporal constraint.

Translation valid

WHAC: World-grounded Humans and Cameras

1 code implementation19 Mar 2024 Wanqi Yin, Zhongang Cai, Ruisi Wang, Fanzhou Wang, Chen Wei, Haiyi Mei, Weiye Xiao, Zhitao Yang, Qingping Sun, Atsushi Yamashita, Ziwei Liu, Lei Yang

In this study, we aim to recover expressive parametric human models (i. e., SMPL-X) and corresponding camera poses jointly, by leveraging the synergy between three critical players: the world, the human, and the camera.

Pose Estimation

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

no code implementations19 Mar 2024 Yongwei Chen, Tengfei Wang, Tong Wu, Xingang Pan, Kui Jia, Ziwei Liu

Though promising results have been achieved in single object generation, these methods often struggle to model complex 3D assets that inherently contain multiple objects.

3D Generation Object

InTeX: Interactive Text-to-texture Synthesis via Unified Depth-aware Inpainting

no code implementations18 Mar 2024 Jiaxiang Tang, Ruijie Lu, Xiaokang Chen, Xiang Wen, Gang Zeng, Ziwei Liu

Text-to-texture synthesis has become a new frontier in 3D content creation thanks to the recent advances in text-to-image models.

Texture Synthesis

STAR-RIS Assisted Wireless-Powered and Backscattering Mobile Edge Computing Networks

no code implementations5 Mar 2024 Bin Lyu, Yining Zhang, Pengcheng Chen, Ziwei Liu, Feng Tian

Wireless powered and backscattering mobile edge computing (WPB-MEC) network is a novel network paradigm to supply energy supplies and computing resource to wireless sensors (WSs).

Edge-computing

3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

1 code implementation4 Mar 2024 Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Tengfei Wang, Liang Pan, Dahua Lin, Ziwei Liu

Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping.

3D Generation Text to 3D +1

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

1 code implementation7 Feb 2024 Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, Ziwei Liu

2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models.

A Comprehensive Survey on 3D Content Generation

1 code implementation2 Feb 2024 Jian Liu, Xiaoshui Huang, Tianyu Huang, Lu Chen, Yuenan Hou, Shixiang Tang, Ziwei Liu, Wanli Ouyang, WangMeng Zuo, Junjun Jiang, Xianming Liu

Recent years have witnessed remarkable advances in artificial intelligence generated content(AIGC), with diverse input modalities, e. g., text, image, video, audio and 3D.

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

no code implementations18 Jan 2024 Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change Loy

We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process.

Video Inpainting

Exploiting Hierarchical Interactions for Protein Surface Learning

1 code implementation17 Jan 2024 Yiqun Lin, Liang Pan, Yi Li, Ziwei Liu, Xiaomeng Li

In this paper, we present a principled framework based on deep learning techniques, namely Hierarchical Chemical and Geometric Feature Interaction Network (HCGNet), for protein surface analysis by bridging chemical and geometric features with hierarchical interactions.

Vlogger: Make Your Dream A Vlog

1 code implementation17 Jan 2024 Shaobin Zhuang, Kunchang Li, Xinyuan Chen, Yaohui Wang, Ziwei Liu, Yu Qiao, Yali Wang

More importantly, Vlogger can generate over 5-minute vlogs from open-world descriptions, without loss of video coherence on script and actor.

Language Modelling Large Language Model +1

Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization

no code implementations16 Jan 2024 Chongzhi Zhang, Mingyuan Zhang, Zhiyang Teng, Jiayi Li, Xizhou Zhu, Lewei Lu, Ziwei Liu, Aixin Sun

Our method involves the direct generation of a global 2D temporal map via a conditional denoising diffusion process, based on the input video and language query.

Denoising Video Understanding

URHand: Universal Relightable Hands

no code implementations10 Jan 2024 Zhaoxi Chen, Gyeongsik Moon, Kaiwen Guo, Chen Cao, Stanislav Pidhorskyi, Tomas Simon, Rohan Joshi, Yuan Dong, Yichen Xu, Bernardo Pires, He Wen, Lucas Evans, Bo Peng, Julia Buffalini, Autumn Trimble, Kevyn McPhail, Melissa Schoeller, Shoou-I Yu, Javier Romero, Michael Zollhöfer, Yaser Sheikh, Ziwei Liu, Shunsuke Saito

To simplify the personalization process while retaining photorealism, we build a powerful universal relightable prior based on neural relighting from multi-view images of hands captured in a light stage with hundreds of identities.

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

1 code implementation8 Jan 2024 Tong Wu, Guandao Yang, Zhibing Li, Kai Zhang, Ziwei Liu, Leonidas Guibas, Dahua Lin, Gordon Wetzstein

These metrics lack the flexibility to generalize to different evaluation criteria and might not align well with human preferences.

3D Generation Text to 3D

InsActor: Instruction-driven Physics-based Characters

no code implementations NeurIPS 2023 Jiawei Ren, Mingyuan Zhang, Cunjun Yu, Xiao Ma, Liang Pan, Ziwei Liu

Generating animation of physics-based characters with intuitive control has long been a desirable task with numerous applications.

Motion Planning

DreamGaussian4D: Generative 4D Gaussian Splatting

1 code implementation28 Dec 2023 Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, Ziwei Liu

Remarkable progress has been made in 4D content generation recently.

Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases

1 code implementation22 Dec 2023 Zhangyang Qi, Ye Fang, Mengchen Zhang, Zeyi Sun, Tong Wu, Ziwei Liu, Dahua Lin, Jiaqi Wang, Hengshuang Zhao

We conducted a series of structured experiments to evaluate their performance in various industrial application scenarios, offering a comprehensive perspective on their practical utility.

FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing

1 code implementation NeurIPS 2023 Mingyuan Zhang, Huirong Li, Zhongang Cai, Jiawei Ren, Lei Yang, Ziwei Liu

Notably, FineMoGen further enables zero-shot motion editing capabilities with the aid of modern large language models (LLM), which faithfully manipulates motion sequences with fine-grained instructions.

Motion Synthesis

InstructVideo: Instructing Video Diffusion Models with Human Feedback

1 code implementation19 Dec 2023 Hangjie Yuan, Shiwei Zhang, Xiang Wang, Yujie Wei, Tao Feng, Yining Pan, Yingya Zhang, Ziwei Liu, Samuel Albanie, Dong Ni

To tackle this problem, we propose InstructVideo to instruct text-to-video diffusion models with human feedback by reward fine-tuning.

Video Generation

Towards Robust and Expressive Whole-body Human Pose and Shape Estimation

1 code implementation NeurIPS 2023 Hui EnPang, Zhongang Cai, Lei Yang, Qingyi Tao, Zhonghua Wu, Tianwei Zhang, Ziwei Liu

Whole-body pose and shape estimation aims to jointly predict different behaviors (e. g., pose, hand gesture, facial expression) of the entire human body from a monocular image.

FreeInit: Bridging Initialization Gap in Video Diffusion Models

1 code implementation12 Dec 2023 Tianxing Wu, Chenyang Si, Yuming Jiang, Ziqi Huang, Ziwei Liu

Though diffusion-based video generation has witnessed rapid progress, the inference results of existing models still exhibit unsatisfactory temporal consistency and unnatural dynamics.

Denoising Text-to-Video Generation +1

HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image

no code implementations7 Dec 2023 Tong Wu, Zhibing Li, Shuai Yang, Pan Zhang, Xinggang Pan, Jiaqi Wang, Dahua Lin, Ziwei Liu

Extensive experiments demonstrate the effectiveness of HyperDreamer in modeling region-aware materials with high-resolution textures and enabling user-friendly editing.

Semantic Segmentation

Digital Life Project: Autonomous 3D Characters with Social Intelligence

no code implementations7 Dec 2023 Zhongang Cai, Jianping Jiang, Zhongfei Qing, Xinying Guo, Mingyuan Zhang, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, Xiangyu Fan, Han Du, Liang Pan, Peng Gao, Zhitao Yang, Yang Gao, Jiaqi Li, Tianxiang Ren, Yukun Wei, Xiaogang Wang, Chen Change Loy, Lei Yang, Ziwei Liu

In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment.

Motion Captioning Motion Synthesis

GauHuman: Articulated Gaussian Splatting from Monocular Human Videos

1 code implementation5 Dec 2023 Shoukang Hu, Ziwei Liu

We present, GauHuman, a 3D human model with Gaussian Splatting for both fast training (1 ~ 2 minutes) and real-time rendering (up to 189 FPS), compared with existing NeRF-based implicit representation modelling frameworks demanding hours of training and seconds of rendering per frame.

Generalizable Novel View Synthesis Novel View Synthesis

VideoBooth: Diffusion-based Video Generation with Image Prompts

no code implementations1 Dec 2023 Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu

In this paper, we study the task of video generation with image prompts, which provide more accurate and direct content control beyond the text prompts.

Video Generation

VBench: Comprehensive Benchmark Suite for Video Generative Models

1 code implementation29 Nov 2023 Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, LiMin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations, and also include more video generation models in VBench to drive forward the field of video generation.

Image Generation Video Generation

Panoptic Video Scene Graph Generation

3 code implementations CVPR 2023 Jingkang Yang, Wenxuan Peng, Xiangtai Li, Zujin Guo, Liangyu Chen, Bo Li, Zheng Ma, Kaiyang Zhou, Wayne Zhang, Chen Change Loy, Ziwei Liu

PVSG relates to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects grounded with bounding boxes in videos.

Graph Generation Panoptic Scene Graph Generation +5

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

no code implementations28 Nov 2023 Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, Ziwei Liu

In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance.

SinSR: Diffusion-Based Image Super-Resolution in a Single Step

1 code implementation23 Nov 2023 YuFei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C. Kot, Bihan Wen

Extensive experiments conducted on synthetic and real-world datasets demonstrate that the proposed method can achieve comparable or even superior performance compared to both previous SOTA methods and the teacher model, in just one sampling step, resulting in a remarkable up to x10 speedup for inference.

Image Super-Resolution

OtterHD: A High-Resolution Multi-modality Model

1 code implementation7 Nov 2023 Bo Li, Peiyuan Zhang, Jingkang Yang, Yuanhan Zhang, Fanyi Pu, Ziwei Liu

In this paper, we present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision.

Visual Question Answering

A Two-Stage Generative Model with CycleGAN and Joint Diffusion for MRI-based Brain Tumor Detection

1 code implementation6 Nov 2023 Wenxin Wang, Zhuo-Xu Cui, Guanxun Cheng, Chentao Cao, Xi Xu, Ziwei Liu, Haifeng Wang, Yulong Qi, Dong Liang, Yanjie Zhu

However, current supervised learning methods require extensively annotated images and the state-of-the-art generative models used in unsupervised methods often have limitations in covering the whole data distribution.

Anomaly Detection Generative Adversarial Network +2

PERF: Panoramic Neural Radiance Field from a Single Panorama

1 code implementation25 Oct 2023 Guangcong Wang, Peng Wang, Zhaoxi Chen, Wenping Wang, Chen Change Loy, Ziwei Liu

In this paper, we present PERF, a 360-degree novel view synthesis framework that trains a panoramic neural radiance field from a single panorama.

Novel View Synthesis Text to 3D

FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling

3 code implementations23 Oct 2023 Haonan Qiu, Menghan Xia, Yong Zhang, Yingqing He, Xintao Wang, Ying Shan, Ziwei Liu

With the availability of large-scale video datasets and the advances of diffusion models, text-driven video generation has achieved substantial progress.

Video Generation

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

no code implementations12 Oct 2023 Xian Liu, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Yanyu Li, Dahua Lin, Xihui Liu, Ziwei Liu, Sergey Tulyakov

Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network, where each branch in the model complements to each other with both structural awareness and textural richness.

Image Generation

Deep Geometrized Cartoon Line Inbetweening

1 code implementation ICCV 2023 Li SiYao, Tianpei Gu, Weiye Xiao, Henghui Ding, Ziwei Liu, Chen Change Loy

To preserve the precision and detail of the line drawings, we propose a new approach, AnimeInbet, which geometrizes raster line drawings into graphs of endpoints and reframes the inbetweening task as a graph fusion problem with vertex repositioning.

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

1 code implementation28 Sep 2023 Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, Gang Zeng

In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks.

3D Generation

Robust Sequential DeepFake Detection

1 code implementation26 Sep 2023 Rui Shao, Tianxing Wu, Ziwei Liu

However, existing methods only focus on detecting one-step facial manipulation.

DeepFake Detection Face Swapping +1

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

2 code implementations26 Sep 2023 Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu

To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.

Text-to-Video Generation Video Generation +1

Detecting and Grounding Multi-Modal Media Manipulation and Beyond

1 code implementation25 Sep 2023 Rui Shao, Tianxing Wu, Jianlong Wu, Liqiang Nie, Ziwei Liu

HAMMER performs 1) manipulation-aware contrastive learning between two uni-modal encoders as shallow manipulation reasoning, and 2) modality-aware cross-attention by multi-modal aggregator as deep manipulation reasoning.

Binary Classification Contrastive Learning +4

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

1 code implementation22 Sep 2023 Jiahao Xie, Wei Li, Xiangtai Li, Ziwei Liu, Yew Soon Ong, Chen Change Loy

We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation.

Data Augmentation Instance Segmentation +1

FreeU: Free Lunch in Diffusion U-Net

1 code implementation20 Sep 2023 Chenyang Si, Ziqi Huang, Yuming Jiang, Ziwei Liu

In this paper, we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves the generation quality on the fly.

Denoising Video Generation

Large-Vocabulary 3D Diffusion Model with Transformer

no code implementations14 Sep 2023 Ziang Cao, Fangzhou Hong, Tong Wu, Liang Pan, Ziwei Liu

To this end, we propose a novel triplane-based 3D-aware Diffusion model with TransFormer, DiffTF, for handling challenges via three aspects.

3D Generation

DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields

1 code implementation8 Sep 2023 Junzhe Zhang, Yushi Lan, Shuai Yang, Fangzhou Hong, Quan Wang, Chai Kiat Yeo, Ziwei Liu, Chen Change Loy

In this paper, we address the challenging problem of 3D toonification, which involves transferring the style of an artistic domain onto a target 3D face with stylized geometry and texture.

ReliTalk: Relightable Talking Portrait Generation from a Single Video

1 code implementation5 Sep 2023 Haonan Qiu, Zhaoxi Chen, Yuming Jiang, Hang Zhou, Xiangyu Fan, Lei Yang, Wayne Wu, Ziwei Liu

Our key insight is to decompose the portrait's reflectance from implicitly learned audio-driven facial normals and images.

Single-Image Portrait Relighting

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

1 code implementation1 Sep 2023 Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu

3D city generation is a desirable yet challenging task, since humans are more sensitive to structural distortions in urban environments.

Scene Generation

PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds

no code implementations28 Aug 2023 Zhongang Cai, Liang Pan, Chen Wei, Wanqi Yin, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

To tackle these challenges, we propose a principled framework, PointHPS, for accurate 3D HPS from point clouds captured in real-world settings, which iteratively refines point features through a cascaded architecture.

3D human pose and shape estimation

Towards Real-World Visual Tracking with Temporal Contexts

1 code implementation20 Aug 2023 Ziang Cao, Ziyuan Huang, Liang Pan, Shiwei Zhang, Ziwei Liu, Changhong Fu

To handle those problems, we propose a two-level framework (TCTrack) that can exploit temporal contexts efficiently.

Visual Tracking

HumanLiff: Layer-wise 3D Human Generation with Diffusion Model

no code implementations18 Aug 2023 Shoukang Hu, Fangzhou Hong, Tao Hu, Liang Pan, Haiyi Mei, Weiye Xiao, Lei Yang, Ziwei Liu

In this work, we propose HumanLiff, the first layer-wise 3D human generative model with a unified diffusion process.

3D Generation Neural Rendering

Link-Context Learning for Multimodal LLMs

1 code implementation15 Aug 2023 Yan Tai, Weichen Fan, Zhao Zhang, Feng Zhu, Rui Zhao, Ziwei Liu

The ability to learn from context with novel concepts, and deliver appropriate responses are essential in human conversations.

Few-Shot Learning In-Context Learning +1

Hierarchy Flow For High-Fidelity Image-to-Image Translation

1 code implementation14 Aug 2023 Weichen Fan, Jinghuan Chen, Ziwei Liu

In this work, we propose Hierarchy Flow, a novel flow-based model to achieve better content preservation during translation.

Image-to-Image Translation Style Transfer +1

Benchmarking and Analyzing Generative Data for Visual Recognition

no code implementations25 Jul 2023 Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, Ziwei Liu

Advancements in large pre-trained generative models have expanded their potential as effective data generators in visual recognition.

Benchmarking Retrieval

Make-A-Volume: Leveraging Latent Diffusion Models for Cross-Modality 3D Brain MRI Synthesis

no code implementations19 Jul 2023 Lingting Zhu, Zeyue Xue, Zhenchao Jin, Xian Liu, Jingzhen He, Ziwei Liu, Lequan Yu

This paradigm extends the 2D image diffusion model to a volumetric version with a slightly increasing number of parameters and computation, offering a principled solution for generic cross-modality 3D medical image synthesis.

Computational Efficiency Image Generation

Pair then Relation: Pair-Net for Panoptic Scene Graph Generation

1 code implementation17 Jul 2023 Jinghao Wang, Zhengyu Wen, Xiangtai Li, Zujin Guo, Jingkang Yang, Ziwei Liu

Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes.

Graph Generation Panoptic Scene Graph Generation +2

FunQA: Towards Surprising Video Comprehension

1 code implementation26 Jun 2023 Binzhu Xie, Sicheng Zhang, Zitang Zhou, Bo Li, Yuanhan Zhang, Jack Hessel, Jingkang Yang, Ziwei Liu

Surprising videos, such as funny clips, creative performances, or visual illusions, attract significant attention.

Question Answering Text Generation +3

GP-UNIT: Generative Prior for Versatile Unsupervised Image-to-Image Translation

1 code implementation7 Jun 2023 Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy

In this paper, we introduce a novel versatile framework, Generative Prior-guided UNsupervised Image-to-image Translation (GP-UNIT), that improves the quality, applicability and controllability of the existing translation models.

Translation Unsupervised Image-To-Image Translation +1

DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection

1 code implementation1 Jun 2023 Rui Shao, Tianxing Wu, Liqiang Nie, Ziwei Liu

Unlike existing deepfake detection methods merely focusing on low-level forgery patterns, the forgery detection process of our model can be regularized by generalizable high-level semantics from a pre-trained ViT and adapted by global and local low-level forgeries of deepfake data.

DeepFake Detection Face Swapping

Learning without Forgetting for Vision-Language Models

no code implementations30 May 2023 Da-Wei Zhou, Yuanhan Zhang, Jingyi Ning, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu

While traditional CIL methods focus on visual information to grasp core features, recent advances in Vision-Language Models (VLM) have shown promising capabilities in learning generalizable representations with the aid of textual information.

Class Incremental Learning Incremental Learning

SAD: Segment Any RGBD

1 code implementation23 May 2023 Jun Cen, Yizheng Wu, Kewei Wang, Xingyi Li, Jingkang Yang, Yixuan Pei, Lingdong Kong, Ziwei Liu, Qifeng Chen

The Segment Anything Model (SAM) has demonstrated its effectiveness in segmenting any part of 2D RGB images.

Open Vocabulary Semantic Segmentation Panoptic Segmentation +1

RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars

1 code implementation NeurIPS 2023 Dongwei Pan, Long Zhuo, Jingtan Piao, Huiwen Luo, Wei Cheng, Yuxin Wang, Siming Fan, Shengqi Liu, Lei Yang, Bo Dai, Ziwei Liu, Chen Change Loy, Chen Qian, Wayne Wu, Dahua Lin, Kwan-Yee Lin

It is a large-scale digital library for head avatars with three key attributes: 1) High Fidelity: all subjects are captured by 60 synchronized, high-resolution 2K cameras in 360 degrees.

2k Image Matting +2

ConsistentNeRF: Enhancing Neural Radiance Fields with 3D Consistency for Sparse View Synthesis

1 code implementation18 May 2023 Shoukang Hu, Kaichen Zhou, Kaiyu Li, Longhui Yu, Lanqing Hong, Tianyang Hu, Zhenguo Li, Gim Hee Lee, Ziwei Liu

In this paper, we propose ConsistentNeRF, a method that leverages depth information to regularize both multi-view and single-view 3D consistency among pixels.

3D Reconstruction SSIM

StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator

no code implementations CVPR 2023 Jiazhi Guan, Zhanwang Zhang, Hang Zhou, Tianshu Hu, Kaisiyuan Wang, Dongliang He, Haocheng Feng, Jingtuo Liu, Errui Ding, Ziwei Liu, Jingdong Wang

Despite recent advances in syncing lip movements with any audio waves, current methods still struggle to balance generation quality and the model's generalization ability.

Otter: A Multi-Modal Model with In-Context Instruction Tuning

1 code implementation5 May 2023 Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Jingkang Yang, Ziwei Liu

Large language models (LLMs) have demonstrated significant universal capabilities as few/zero-shot learners in various tasks due to their pre-training on vast amounts of text data, as exemplified by GPT-3, which boosted to InstrctGPT and ChatGPT, effectively following natural language instructions to accomplish real-world tasks.

In-Context Learning Instruction Following +2

Transmissive Reconfigurable Intelligent Surface Transmitter Empowered Cognitive RSMA Networks

no code implementations4 May 2023 Ziwei Liu, Wen Chen, Zhendong Li, Jinhong Yuan, Qingqing Wu, Kunlun Wang

In this paper, we investigated the downlink transmission problem of a cognitive radio network (CRN) equipped with a novel transmissive reconfigurable intelligent surface (TRIS) transmitter.

Collaborative Diffusion for Multi-Modal Face Generation and Editing

1 code implementation CVPR 2023 Ziqi Huang, Kelvin C. K. Chan, Yuming Jiang, Ziwei Liu

In this work, we present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training.

Denoising Face Generation

Transformer-Based Visual Segmentation: A Survey

2 code implementations19 Apr 2023 Xiangtai Li, Henghui Ding, Haobo Yuan, Wenwei Zhang, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy

Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks.

Autonomous Driving Point Cloud Segmentation +1

Text2Performer: Text-Driven Human Video Generation

1 code implementation ICCV 2023 Yuming Jiang, Shuai Yang, Tong Liang Koh, Wayne Wu, Chen Change Loy, Ziwei Liu

In this work, we present Text2Performer to generate vivid human videos with articulated motions from texts.

Video Generation

RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions

1 code implementation13 Apr 2023 Shaoyuan Xie, Lingdong Kong, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, Ziwei Liu

Our experiments further demonstrate that pre-training and depth-free BEV transformation has the potential to enhance out-of-distribution robustness.

Robust Camera Only 3D Object Detection

Detecting and Grounding Multi-Modal Media Manipulation

1 code implementation CVPR 2023 Rui Shao, Tianxing Wu, Ziwei Liu

In this paper, we highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM^4).

Binary Classification Contrastive Learning +4

F$^{2}$-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories

1 code implementation28 Mar 2023 Peng Wang, YuAn Liu, Zhaoxi Chen, Lingjie Liu, Ziwei Liu, Taku Komura, Christian Theobalt, Wenping Wang

Based on our analysis, we further propose a novel space-warping method called perspective warping, which allows us to handle arbitrary trajectories in the grid-based NeRF framework.

Novel View Synthesis

SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis

no code implementations ICCV 2023 Guangcong Wang, Zhaoxi Chen, Chen Change Loy, Ziwei Liu

Since coarse depth maps are not strictly scaled to the ground-truth depth maps, we propose a simple yet effective constraint, a local depth ranking method, on NeRFs such that the expected depth ranking of the NeRF is consistent with that of the coarse depth maps in local patches.

Novel View Synthesis

A Simple and Generic Framework for Feature Distillation via Channel-wise Transformation

no code implementations23 Mar 2023 Ziwei Liu, Yongtao Wang, Xiaojie Chu

Specifically, we propose a learnable nonlinear channel-wise transformation to align the features of the student and the teacher model.

Image Classification Instance Segmentation +5

ReVersion: Diffusion-Based Relation Inversion from Images

2 code implementations23 Mar 2023 Ziqi Huang, Tianxing Wu, Yuming Jiang, Kelvin C. K. Chan, Ziwei Liu

Specifically, we propose a novel relation-steering contrastive learning scheme to impose two critical properties of the relation prompt: 1) The relation prompt should capture the interaction between objects, enforced by the preposition prior.

Contrastive Learning Relation

SHERF: Generalizable Human NeRF from a Single Image

1 code implementation ICCV 2023 Shoukang Hu, Fangzhou Hong, Liang Pan, Haiyi Mei, Lei Yang, Ziwei Liu

To this end, we propose a bank of 3D-aware hierarchical features, including global, point-level, and pixel-aligned features, to facilitate informative encoding.

3D Human Reconstruction

Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation

1 code implementation CVPR 2023 Lingting Zhu, Xian Liu, Xuanyu Liu, Rui Qian, Ziwei Liu, Lequan Yu

In this work, we propose a novel diffusion-based framework, named Diffusion Co-Speech Gesture (DiffGesture), to effectively capture the cross-modal audio-to-gesture associations and preserve temporal coherence for high-fidelity audio-driven co-speech gesture generation.

Gesture Generation

Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need

2 code implementations13 Mar 2023 Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu

ADAM is a general framework that can be orthogonally combined with any parameter-efficient tuning method, which holds the advantages of PTM's generalizability and adapted model's adaptivity.

Class Incremental Learning Incremental Learning +1

Rethinking Range View Representation for LiDAR Segmentation

no code implementations ICCV 2023 Lingdong Kong, Youquan Liu, Runnan Chen, Yuexin Ma, Xinge Zhu, Yikang Li, Yuenan Hou, Yu Qiao, Ziwei Liu

We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks, i. e., SemanticKITTI, nuScenes, and ScribbleKITTI.

3D Semantic Segmentation Autonomous Driving +4

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

no code implementations18 Feb 2023 Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, Guangjing Wang, Kai Zhang, Cheng Ji, Qiben Yan, Lifang He, Hao Peng, JianXin Li, Jia Wu, Ziwei Liu, Pengtao Xie, Caiming Xiong, Jian Pei, Philip S. Yu, Lichao Sun

This study provides a comprehensive review of recent research advancements, challenges, and opportunities for PFMs in text, image, graph, as well as other data modalities.

Graph Learning Language Modelling +1

Deep Class-Incremental Learning: A Survey

2 code implementations7 Feb 2023 Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu

Deep models, e. g., CNNs and Vision Transformers, have achieved impressive achievements in many vision tasks in the closed world.

Class Incremental Learning Image Classification +1

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections

1 code implementation2 Feb 2023 Zhaoxi Chen, Guangcong Wang, Ziwei Liu

Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics.

Scene Generation

What Makes Good Examples for Visual In-Context Learning?

1 code implementation NeurIPS 2023 Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu

To overcome the problem, we propose a prompt retrieval framework to automate the selection of in-context examples.

In-Context Learning Retrieval

BiBench: Benchmarking and Analyzing Network Binarization

1 code implementation26 Jan 2023 Haotong Qin, Mingyuan Zhang, Yifu Ding, Aoyu Li, Zhongang Cai, Ziwei Liu, Fisher Yu, Xianglong Liu

Network binarization emerges as one of the most promising compression approaches offering extraordinary computation and memory savings by minimizing the bit-width.

Benchmarking Binarization

DeformToon3D: Deformable Neural Radiance Fields for 3D Toonification

no code implementations ICCV 2023 Junzhe Zhang, Yushi Lan, Shuai Yang, Fangzhou Hong, Quan Wang, Chai Kiat Yeo, Ziwei Liu, Chen Change Loy

In this paper, we address the challenging problem of 3D toonification, which involves transferring the style of an artistic domain onto a target 3D face with stylized geometry and texture.

F2-NeRF: Fast Neural Radiance Field Training With Free Camera Trajectories

no code implementations CVPR 2023 Peng Wang, YuAn Liu, Zhaoxi Chen, Lingjie Liu, Ziwei Liu, Taku Komura, Christian Theobalt, Wenping Wang

Existing fast grid-based NeRF training frameworks, like Instant-NGP, Plenoxels, DVGO, or TensoRF, are mainly designed for bounded scenes and rely on space warping to handle unbounded scenes.

Novel View Synthesis

Reference-based Image and Video Super-Resolution via C2-Matching

1 code implementation19 Dec 2022 Yuming Jiang, Kelvin C. K. Chan, Xintao Wang, Chen Change Loy, Ziwei Liu

To tackle these challenges, we propose C2-Matching in this work, which performs explicit robust matching crossing transformation and resolution.

Image Super-Resolution Reference-based Super-Resolution +2

Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers

no code implementations9 Dec 2022 Yasheng Sun, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Zhibin Hong, Jingtuo Liu, Errui Ding, Jingdong Wang, Ziwei Liu, Hideki Koike

This requires masking a large percentage of the original image and seamlessly inpainting it with the aid of audio and reference frames.

Audio-Driven Co-Speech Gesture Video Generation

no code implementations5 Dec 2022 Xian Liu, Qianyi Wu, Hang Zhou, Yuanqi Du, Wayne Wu, Dahua Lin, Ziwei Liu

Our key insight is that the co-speech gestures can be decomposed into common motion patterns and subtle rhythmic dynamics.

Video Generation

AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies

1 code implementation10 Nov 2022 Li SiYao, Yuhang Li, Bo Li, Chao Dong, Ziwei Liu, Chen Change Loy

Existing correspondence datasets for two-dimensional (2D) cartoon suffer from simple frame composition and monotonic movements, making them insufficient to simulate real animations.

Optical Flow Estimation

Joint Communication and Computation Design in Transmissive RMS Transceiver Enabled Multi-Tier Computing Networks

no code implementations27 Oct 2022 Zhendong Li, Wen Chen, Ziwei Liu, Hongying Tang, Jianmin Lu

We formulate a total energy consumption minimization problem by a joint optimization of subcarrier allocation, task input bits, time slot allocation, transmit power allocation and RMS transmissive coefficient while taking into account the constraints of communication resources and computing resources.

Total Energy

OpenOOD: Benchmarking Generalized Out-of-Distribution Detection

3 code implementations13 Oct 2022 Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenxuan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, Xuefeng Du, Kaiyang Zhou, Wayne Zhang, Dan Hendrycks, Yixuan Li, Ziwei Liu

Out-of-distribution (OOD) detection is vital to safety-critical machine learning applications and has thus been extensively studied, with a plethora of methods developed in the literature.

Anomaly Detection Benchmarking +3

EVA3D: Compositional 3D Human Generation from 2D Image Collections

1 code implementation10 Oct 2022 Fangzhou Hong, Zhaoxi Chen, Yushi Lan, Liang Pan, Ziwei Liu

At the core of EVA3D is a compositional human NeRF representation, which divides the human body into local parts.

TripleE: Easy Domain Generalization via Episodic Replay

1 code implementation4 Oct 2022 Xiaomeng Li, Hongyu Ren, Huifeng Yao, Ziwei Liu

In this paper, we propose TripleE, and the main idea is to encourage the network to focus on training on subsets (learning with replay) and enlarge the data space in learning on subsets.

Domain Generalization

StyleSwap: Style-Based Generator Empowers Robust Face Swapping

no code implementations27 Sep 2022 Zhiliang Xu, Hang Zhou, Zhibin Hong, Ziwei Liu, Jiaming Liu, Zhizhi Guo, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Our core idea is to leverage a style-based generator to empower high-fidelity and robust face swapping, thus the generator's advantage can be adopted for optimizing identity similarity.

Face Swapping

VToonify: Controllable High-Resolution Portrait Video Style Transfer

1 code implementation22 Sep 2022 Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy

Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency.

Face Alignment Style Transfer +2

Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms

1 code implementation21 Sep 2022 Hui En Pang, Zhongang Cai, Lei Yang, Tianwei Zhang, Ziwei Liu

Experiments with 10 backbones, ranging from CNNs to transformers, show the knowledge learnt from a proximity task is readily transferable to human mesh recovery.

3D human pose and shape estimation Benchmarking +1

Text2Light: Zero-Shot Text-Driven HDR Panorama Generation

1 code implementation20 Sep 2022 Zhaoxi Chen, Guangcong Wang, Ziwei Liu

To achieve super-resolution inverse tone mapping, we derive a continuous representation of 360-degree imaging from the LDR panorama as a set of structured latent codes anchored to the sphere.

4k inverse tone mapping +3

On-Device Domain Generalization

2 code implementations15 Sep 2022 Kaiyang Zhou, Yuanhan Zhang, Yuhang Zang, Jingkang Yang, Chen Change Loy, Ziwei Liu

Another interesting observation is that the teacher-student gap on out-of-distribution data is bigger than that on in-distribution data, which highlights the capacity mismatch issue as well as the shortcoming of KD.

Data Augmentation Domain Generalization +2

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

2 code implementations31 Aug 2022 Mingyuan Zhang, Zhongang Cai, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, Ziwei Liu

Instead of a deterministic language-motion mapping, MotionDiffuse generates motions through a series of denoising steps in which variations are injected.

Denoising Motion Synthesis

Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction

1 code implementation26 Aug 2022 Tong Wu, Jiaqi Wang, Xingang Pan, Xudong Xu, Christian Theobalt, Ziwei Liu, Dahua Lin

Previous methods based on neural volume rendering mostly train a fully implicit model with MLPs, which typically require hours of training for a single scene.

Surface Reconstruction

Mind the Gap in Distilling StyleGANs

1 code implementation18 Aug 2022 Guodong Xu, Yuenan Hou, Ziwei Liu, Chen Change Loy

To further enhance the semantic consistency between the teacher and student model, we present a latent-direction-based distillation loss that preserves the semantic relations in latent space.

Knowledge Distillation

Open Long-Tailed Recognition in a Dynamic World

no code implementations17 Aug 2022 Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, Stella X. Yu

A practical recognition system must balance between majority (head) and minority (tail) classes, generalize across the distribution, and acknowledge novelty upon the instances of unseen classes (open classes).

Active Learning Classification +4

StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3

1 code implementation16 Aug 2022 Haonan Qiu, Yuming Jiang, Hang Zhou, Wayne Wu, Ziwei Liu

Notably, StyleFaceV is capable of generating realistic $1024\times1024$ face videos even without high-resolution training videos.

Image Generation Video Generation

Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer

1 code implementation10 Aug 2022 Zhipeng Luo, Changqing Zhou, Liang Pan, Gongjie Zhang, Tianrui Liu, Yueru Luo, Haiyu Zhao, Ziwei Liu, Shijian Lu

In a point cloud sequence, 3D object tracking aims to predict the location and orientation of an object in consecutive frames given an object template.

3D Object Tracking Autonomous Driving +3

StyleLight: HDR Panorama Generation for Lighting Estimation and Editing

1 code implementation29 Jul 2022 Guangcong Wang, Yinuo Yang, Chen Change Loy, Ziwei Liu

To tackle this problem, we propose a coupled dual-StyleGAN panorama synthesis network (StyleLight) that integrates LDR and HDR panorama synthesis into a unified framework.

Lighting Estimation

CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

1 code implementation25 Jul 2022 Hao Zhu, Wayne Wu, Wentao Zhu, Liming Jiang, Siwei Tang, Li Zhang, Ziwei Liu, Chen Change Loy

Large-scale datasets have played indispensable roles in the recent success of face generation/editing and significantly facilitated the advances of emerging research fields.

Attribute Face Generation +1

Panoptic Scene Graph Generation

1 code implementation22 Jul 2022 Jingkang Yang, Yi Zhe Ang, Zujin Guo, Kaiyang Zhou, Wayne Zhang, Ziwei Liu

Existing research addresses scene graph generation (SGG) -- a critical technology for scene understanding in images -- from a detection perspective, i. e., objects are detected using bounding boxes followed by prediction of their pairwise relationships.

Benchmarking Panoptic Scene Graph Generation +1

UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation

1 code implementation20 Jul 2022 Shenhan Qian, Jiale Xu, Ziwei Liu, Liqian Ma, Shenghua Gao

We propose united implicit functions (UNIF), a part-based method for clothed human reconstruction and animation with raw scans and skeletons as the input.

Position

Benchmarking Omni-Vision Representation through the Lens of Visual Realms

1 code implementation14 Jul 2022 Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu

We benchmark ReCo and other advances in omni-vision representation studies that are different in architectures (from CNNs to transformers) and in learning paradigms (from supervised learning to self-supervised learning) on OmniBenchmark.

Benchmarking Contrastive Learning +2

Relighting4D: Neural Relightable Human from Videos

1 code implementation14 Jul 2022 Zhaoxi Chen, Ziwei Liu

Our key insight is that the space-time varying geometry and reflectance of the human body can be decomposed as a set of neural fields of normal, occlusion, diffuse, and specular maps.

Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis

1 code implementation11 Jul 2022 Long Zhuo, Guangcong Wang, Shikai Li, Wayne Wu, Ziwei Liu

In this paper, we present a spatial-temporal compression framework, \textbf{Fast-Vid2Vid}, which focuses on data aspects of generative models.

Knowledge Distillation Motion Compensation +1

Detecting and Recovering Sequential DeepFake Manipulation

1 code implementation5 Jul 2022 Rui Shao, Tianxing Wu, Ziwei Liu

Moreover, we build a comprehensive benchmark and set up rigorous evaluation protocols and metrics for this new research problem.

DeepFake Detection Face Swapping +2

Masked Frequency Modeling for Self-Supervised Visual Pre-Training

3 code implementations15 Jun 2022 Jiahao Xie, Wei Li, Xiaohang Zhan, Ziwei Liu, Yew Soon Ong, Chen Change Loy

We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models.

Image Classification Image Restoration +2

Neural Prompt Search

1 code implementation9 Jun 2022 Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu

The size of vision models has grown exponentially over the last few years, especially after the emergence of Vision Transformer.

 Ranked #1 on Image Classification on OmniBenchmark (using extra training data)

Few-Shot Learning Image Classification +3

Sparse Mixture-of-Experts are Domain Generalizable Learners

1 code implementation8 Jun 2022 Bo Li, Yifei Shen, Jingkang Yang, Yezhen Wang, Jiawei Ren, Tong Che, Jun Zhang, Ziwei Liu

It is motivated by an empirical finding that transformer-based models trained with empirical risk minimization (ERM) outperform CNN-based models employing state-of-the-art (SOTA) DG algorithms on multiple DG datasets.

Ranked #11 on Domain Generalization on DomainNet (using extra training data)

Domain Generalization Object Recognition

Text2Human: Text-Driven Controllable Human Image Generation

2 code implementations31 May 2022 Yuming Jiang, Shuai Yang, Haonan Qiu, Wayne Wu, Chen Change Loy, Ziwei Liu

In this work, we present a text-driven controllable framework, Text2Human, for a high-quality and diverse human generation.

Human Parsing Image Generation

Free Lunch for Surgical Video Understanding by Distilling Self-Supervisions

1 code implementation19 May 2022 Xinpeng Ding, Ziwei Liu, Xiaomeng Li

Our key insight is to distill knowledge from publicly available models trained on large generic datasets4 to facilitate the self-supervised learning of surgical videos.

Contrastive Learning Self-Supervised Learning +2

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

1 code implementation17 May 2022 Fangzhou Hong, Mingyuan Zhang, Liang Pan, Zhongang Cai, Lei Yang, Ziwei Liu

Our key insight is to take advantage of the powerful vision-language model CLIP for supervising neural human generation, in terms of 3D geometry, texture and animation.

Language Modelling Motion Synthesis +1

Robust Face Anti-Spoofing with Dual Probabilistic Modeling

no code implementations27 Apr 2022 Yuanhan Zhang, Yichao Wu, Zhenfei Yin, Jing Shao, Ziwei Liu

In this work, we attempt to fill this gap by automatically addressing the noise problem from both label and data perspectives in a probabilistic manner.

Face Anti-Spoofing

StyleGAN-Human: A Data-Centric Odyssey of Human Generation

4 code implementations25 Apr 2022 Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Chen Qian, Chen Change Loy, Wayne Wu, Ziwei Liu

In addition, a model zoo and human editing applications are demonstrated to facilitate future research in the community.

Image Generation

Few-shot Forgery Detection via Guided Adversarial Interpolation

no code implementations12 Apr 2022 Haonan Qiu, Siyu Chen, Bei Gan, Kun Wang, Huafeng Shi, Jing Shao, Ziwei Liu

Notably, our method is also validated to be robust to choices of majority and minority forgery approaches.

Full-Spectrum Out-of-Distribution Detection

1 code implementation11 Apr 2022 Jingkang Yang, Kaiyang Zhou, Ziwei Liu

In this paper, we take into account both shift types and introduce full-spectrum OOD (FS-OOD) detection, a more realistic problem setting that considers both detecting semantic shift and being tolerant to covariate shift; and designs three benchmarks.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

Unsupervised Image-to-Image Translation with Generative Prior

1 code implementation CVPR 2022 Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy

In this work, we present a novel framework, Generative Prior-guided UNsupervised Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm.

Translation Unsupervised Image-To-Image Translation

Balanced MSE for Imbalanced Visual Regression

1 code implementation CVPR 2022 Jiawei Ren, Mingyuan Zhang, Cunjun Yu, Ziwei Liu

Data imbalance exists ubiquitously in real-world visual regressions, e. g., age estimation and pose estimation, hurting the model's generalizability and fairness.

Age Estimation Fairness +3

Versatile Multi-Modal Pre-Training for Human-Centric Perception

1 code implementation CVPR 2022 Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu

To tackle the challenges, we design the novel Dense Intra-sample Contrastive Learning and Sparse Structure-aware Contrastive Learning targets by hierarchically learning a modal-invariant latent space featured with continuous and ordinal feature distribution and structure-aware semantic consistency.

Contrastive Learning Human Parsing +1

SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance

no code implementations25 Mar 2022 Xinchi Zhou, Dongzhan Zhou, Wanli Ouyang, Hang Zhou, Ziwei Liu, Di Hu

Recent years have witnessed the success of deep learning on the visual sound separation task.

Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory

1 code implementation CVPR 2022 Li SiYao, Weijiang Yu, Tianpei Gu, Chunze Lin, Quan Wang, Chen Qian, Chen Change Loy, Ziwei Liu

With the learned choreographic memory, dance generation is realized on the quantized units that meet high choreography standards, such that the generated dancing sequences are confined within the spatial constraints.

Motion Synthesis

Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer

1 code implementation CVPR 2022 Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy

Recent studies on StyleGAN show high performance on artistic portrait generation by transfer learning with limited data.

Style Transfer Transfer Learning +1

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

no code implementations16 Mar 2022 Yinan He, Gengshi Huang, Siyu Chen, Jianing Teng, Wang Kun, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao

2) Squeeze Stage: X-Learner condenses the model to a reasonable size and learns the universal and generalizable representation for various tasks transferring.

object-detection Object Detection +3

LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network

1 code implementation14 Mar 2022 Fangzhou Hong, Hui Zhou, Xinge Zhu, Hongsheng Li, Ziwei Liu

In this work, we address the task of LiDAR-based panoptic segmentation, which aims to parse both objects and scenes in a unified manner.

4D Panoptic Segmentation Autonomous Driving +3

BiBERT: Accurate Fully Binarized BERT

1 code implementation ICLR 2022 Haotong Qin, Yifu Ding, Mingyuan Zhang, Qinghua Yan, Aishan Liu, Qingqing Dang, Ziwei Liu, Xianglong Liu

The large pre-trained BERT has achieved remarkable performance on Natural Language Processing (NLP) tasks but is also computation and memory expensive.

Binarization

Conditional Prompt Learning for Vision-Language Models

9 code implementations CVPR 2022 Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets.

Domain Generalization Prompt Engineering

TCTrack: Temporal Contexts for Aerial Tracking

1 code implementation CVPR 2022 Ziang Cao, Ziyuan Huang, Liang Pan, Shiwei Zhang, Ziwei Liu, Changhong Fu

Temporal contexts among consecutive frames are far from being fully utilized in existing visual trackers.

Visual Sound Localization in the Wild by Cross-Modal Interference Erasing

1 code implementation13 Feb 2022 Xian Liu, Rui Qian, Hang Zhou, Di Hu, Weiyao Lin, Ziwei Liu, Bolei Zhou, Xiaowei Zhou

Specifically, we observe that the previous practice of learning only a single audio representation is insufficient due to the additive nature of audio signals.

Garment4D: Garment Reconstruction from Point Cloud Sequences

1 code implementation NeurIPS 2021 Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu

The main challenges are two-fold: 1) effective 3D feature learning for fine details, and 2) capture of garment dynamics caused by the interaction between garments and the human body, especially for loose garments like skirts.

Garment Reconstruction

Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion

1 code implementation NeurIPS 2021 Tong Wu, Liang Pan, Junzhe Zhang, Tai Wang, Ziwei Liu, Dahua Lin

We adopt DCD to evaluate the point cloud completion task, where experimental results show that DCD pays attention to both the overall structure and local geometric details and provides a more reliable evaluation even when CD and EMD contradict each other.

Point Cloud Completion

Robust Partial-to-Partial Point Cloud Registration in a Full Range

1 code implementation30 Nov 2021 Liang Pan, Zhongang Cai, Ziwei Liu

\textbf{3)} Based on a synergy of hierarchical graph networks and graphical modeling, we propose the {H}ierarchical {G}raphical {M}odeling (\textbf{HGM}) architecture to encode robust descriptors consisting of i) a unary term learned from {\textit{RI}} features; and ii) multiple smoothness terms encoded from neighboring point relations at different scales through our TPT modules.

Graph Matching Point Cloud Registration

Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion

1 code implementation24 Nov 2021 Tong Wu, Liang Pan, Junzhe Zhang, Tai Wang, Ziwei Liu, Dahua Lin

We adopt DCD to evaluate the point cloud completion task, where experimental results show that DCD pays attention to both the overall structure and local geometric details and provides a more reliable evaluation even when CD and EMD contradict each other.

Point Cloud Completion

Few-Shot Object Detection via Association and DIscrimination

1 code implementation NeurIPS 2021 Yuhang Cao, Jiaqi Wang, Ying Jin, Tong Wu, Kai Chen, Ziwei Liu, Dahua Lin

1) In the association step, in contrast to implicitly leveraging multiple base classes, we construct a compact novel class feature space via explicitly imitating a specific base class feature space.

Few-Shot Object Detection Object +3

Lifting 2D Human Pose to 3D with Domain Adapted 3D Body Concept

no code implementations23 Nov 2021 Qiang Nie, Ziwei Liu, Yunhui Liu

Inspired by this, we propose a new framework that leverages the labeled 3D human poses to learn a 3D concept of the human body to reduce the ambiguity.

3D Pose Estimation Domain Adaptation

Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements

no code implementations1 Nov 2021 Yu Rong, Jingbo Wang, Ziwei Liu, Chen Change Loy

In this paper, we make the first attempt to reconstruct 3D interacting hands from monocular single RGB images.

3D Reconstruction

Generalized Out-of-Distribution Detection: A Survey

3 code implementations21 Oct 2021 Jingkang Yang, Kaiyang Zhou, Yixuan Li, Ziwei Liu

In this survey, we first present a unified framework called generalized OOD detection, which encompasses the five aforementioned problems, i. e., AD, ND, OSR, OOD detection, and OD.

Anomaly Detection Autonomous Driving +5

Playing for 3D Human Recovery

no code implementations14 Oct 2021 Zhongang Cai, Mingyuan Zhang, Jiawei Ren, Chen Wei, Daxuan Ren, Zhengyu Lin, Haiyu Zhao, Lei Yang, Chen Change Loy, Ziwei Liu

Specifically, we contribute GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine, featuring a highly diverse set of subjects, actions, and scenarios.

TAda! Temporally-Adaptive Convolutions for Video Understanding

2 code implementations ICLR 2022 Ziyuan Huang, Shiwei Zhang, Liang Pan, Zhiwu Qing, Mingqian Tang, Ziwei Liu, Marcelo H. Ang Jr

This work presents Temporally-Adaptive Convolutions (TAdaConv) for video understanding, which shows that adaptive weight calibration along the temporal dimension is an efficient way to facilitate modelling complex temporal dynamics in videos.

Ranked #67 on Action Recognition on Something-Something V2 (using extra training data)

Action Classification Action Recognition +2

Bayesian Imbalanced Regression Debiasing

no code implementations29 Sep 2021 Jiawei Ren, Mingyuan Zhang, Cunjun Yu, Ziwei Liu

Compared to imbalanced and long-tailed classification, imbalanced regression has its unique challenges as the regression label space can be continuous, boundless, and high-dimensional.

Age Estimation imbalanced classification +2

A Comprehensive Overhaul of Distilling Unconditional GANs

no code implementations29 Sep 2021 Guodong Xu, Yuenan Hou, Ziwei Liu, Chen Change Loy

To further enhance the semantic consistency between the teacher and student model, we present another latent-direction-based distillation loss that preserves the semantic relations in latent space.

Knowledge Distillation

Talk-to-Edit: Fine-Grained Facial Editing via Dialog

1 code implementation ICCV 2021 Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, Ziwei Liu

In this work, we propose Talk-to-Edit, an interactive facial editing framework that performs fine-grained attribute manipulation through dialog between the user and the system.

Attribute Facial Editing +1

Learning to Prompt for Vision-Language Models

13 code implementations2 Sep 2021 Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu

Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks.

Domain Generalization Few-shot Age Estimation +2

Semantically Coherent Out-of-Distribution Detection

2 code implementations ICCV 2021 Jingkang Yang, Haoqi Wang, Litong Feng, Xiaopeng Yan, Huabin Zheng, Wayne Zhang, Ziwei Liu

The proposed UDG can not only enrich the semantic knowledge of the model by exploiting unlabeled data in an unsupervised manner, but also distinguish ID/OOD samples to enhance ID classification and OOD detection tasks simultaneously.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

Energy-Based Open-World Uncertainty Modeling for Confidence Calibration

no code implementations ICCV 2021 Yezhen Wang, Bo Li, Tong Che, Kaiyang Zhou, Ziwei Liu, Dongsheng Li

Confidence calibration is of great importance to the reliability of decisions made by machine learning systems.

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency

1 code implementation ICCV 2021 Zhipeng Luo, Zhongang Cai, Changqing Zhou, Gongjie Zhang, Haiyu Zhao, Shuai Yi, Shijian Lu, Hongsheng Li, Shanghang Zhang, Ziwei Liu

In addition, existing 3D domain adaptive detection methods often assume prior access to the target domain annotations, which is rarely feasible in the real world.

3D Object Detection Autonomous Driving +1

Unsupervised Object-Level Representation Learning from Scene Images

1 code implementation NeurIPS 2021 Jiahao Xie, Xiaohang Zhan, Ziwei Liu, Yew Soon Ong, Chen Change Loy

Extensive experiments on COCO show that ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks.

Object Representation Learning +2

Delving Deep into the Generalization of Vision Transformers under Distribution Shifts

1 code implementation CVPR 2022 Chongzhi Zhang, Mingyuan Zhang, Shanghang Zhang, Daisheng Jin, Qiang Zhou, Zhongang Cai, Haiyu Zhao, Xianglong Liu, Ziwei Liu

By comprehensively investigating these GE-ViTs and comparing with their corresponding CNN models, we observe: 1) For the enhanced model, larger ViTs still benefit more for the OOD generalization.

Out-of-Distribution Generalization Self-Supervised Learning

Robust Reference-based Super-Resolution via C2-Matching

1 code implementation CVPR 2021 Yuming Jiang, Kelvin C. K. Chan, Xintao Wang, Chen Change Loy, Ziwei Liu

However, performing local transfer is difficult because of two gaps between input and reference images: the transformation gap (e. g. scale and rotation) and the resolution gap (e. g. HR and LR).

Reference-based Super-Resolution

Semi-Supervised Domain Generalization with Stochastic StyleMatch

2 code implementations1 Jun 2021 Kaiyang Zhou, Chen Change Loy, Ziwei Liu

We find that the DG methods, which by design are unable to handle unlabeled data, perform poorly with limited labels in SSDG; the SSL methods, especially FixMatch, obtain much better results but are still far away from the basic vanilla model trained using full labels.

Domain Generalization Semi-Supervised Domain Generalization

Iterative Human and Automated Identification of Wildlife Images

1 code implementation5 May 2021 Zhongqi Miao, Ziwei Liu, Kaitlyn M. Gaynor, Meredith S. Palmer, Stella X. Yu, Wayne M. Getz

Camera trapping is increasingly used to monitor wildlife, but this technology typically requires extensive data annotation.

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation

1 code implementation CVPR 2021 Hang Zhou, Yasheng Sun, Wayne Wu, Chen Change Loy, Xiaogang Wang, Ziwei Liu

While speech content information can be defined by learning the intrinsic synchronization between audio-visual modalities, we identify that a pose code will be complementarily learned in a modulated convolution-based reconstruction framework.

Talking Face Generation

Variational Relational Point Completion Network

1 code implementation CVPR 2021 Liang Pan, Xinyi Chen, Zhongang Cai, Junzhe Zhang, Haiyu Zhao, Shuai Yi, Ziwei Liu

In particular, we propose a dual-path architecture to enable principled probabilistic modeling across partial and complete clouds.

Point Cloud Completion

Cannot find the paper you are looking for? You can Submit a new open access paper.