Search Results for author: Wenliang Zhao

Found 15 papers, 13 papers with code

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

1 code implementation • 1 Apr 2024 • Yixuan Zhu, Ao Li, Yansong Tang, Wenliang Zhao, Jie zhou, Jiwen Lu

The recovery of occluded human meshes presents challenges for current methods due to the difficulty in extracting effective image features under severe occlusion.

Denoising Human Mesh Recovery

Paper
Code

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers

no code implementations • 19 Dec 2023 • Haoyu Ma, Shahin Mahdizadehaghdam, Bichen Wu, Zhipeng Fan, YuChao Gu, Wenliang Zhao, Lior Shapira, Xiaohui Xie

Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control.

Video Editing

Paper
Add Code

OpenVoice: Versatile Instant Voice Cloning

1 code implementation • 3 Dec 2023 • Zengyi Qin, Wenliang Zhao, Xumin Yu, Xin Sun

The voice styles are not directly copied from and constrained by the style of the reference speaker.

Voice Cloning

17,308

Paper
Code

Unleashing Text-to-Image Diffusion Models for Visual Perception

2 code implementations • ICCV 2023 • Wenliang Zhao, Yongming Rao, Zuyan Liu, Benlin Liu, Jie zhou, Jiwen Lu

In this paper, we propose VPD (Visual Perception with a pre-trained Diffusion model), a new framework that exploits the semantic information of a pre-trained text-to-image diffusion model in visual perception tasks.

Ranked #7 on Referring Expression Segmentation on RefCoCo val

Denoising Image Segmentation +4

7,390

Paper
Code

DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation

1 code implementation • CVPR 2023 • Shuai Shen, Wenliang Zhao, Zibin Meng, Wanhua Li, Zheng Zhu, Jie zhou, Jiwen Lu

In this way, the proposed DiffTalk is capable of producing high-quality talking head videos in synchronization with the source audio, and more importantly, it can be naturally generalized across different identities without any further fine-tuning.

Denoising Talking Head Generation

400

Paper
Code

DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion

1 code implementation • CVPR 2023 • Wenliang Zhao, Yongming Rao, Weikang Shi, Zuyan Liu, Jie zhou, Jiwen Lu

Unlike previous work that relies on carefully designed network architectures and loss functions to fuse the information from the source and target faces, we reformulate the face swapping as a conditional inpainting task, performed by a powerful diffusion model guided by the desired face attributes (e. g., identity and landmarks).

Face Swapping

Paper
Code

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

7 code implementations • 28 Jul 2022 • Yongming Rao, Wenliang Zhao, Yansong Tang, Jie zhou, Ser-Nam Lim, Jiwen Lu

In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework.

Ranked #20 on Semantic Segmentation on ADE20K

Image Classification Object Detection +2

3,154

Paper
Code

Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks

1 code implementation • 4 Jul 2022 • Yongming Rao, Zuyan Liu, Wenliang Zhao, Jie zhou, Jiwen Lu

We extend our method to hierarchical models including CNNs and hierarchical vision Transformers as well as more complex dense prediction tasks that require structured feature maps by formulating a more generic dynamic spatial sparsification framework with progressive sparsification and asymmetric computation for different spatial locations.

532

Paper
Code

A Roadmap for Big Model

no code implementations • 26 Mar 2022 • Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han, Zhenghao Liu, Ning Ding, Yongming Rao, Yizhao Gao, Liang Zhang, Ming Ding, Cong Fang, Yisen Wang, Mingsheng Long, Jing Zhang, Yinpeng Dong, Tianyu Pang, Peng Cui, Lingxiao Huang, Zheng Liang, HuaWei Shen, HUI ZHANG, Quanshi Zhang, Qingxiu Dong, Zhixing Tan, Mingxuan Wang, Shuo Wang, Long Zhou, Haoran Li, Junwei Bao, Yingwei Pan, Weinan Zhang, Zhou Yu, Rui Yan, Chence Shi, Minghao Xu, Zuobai Zhang, Guoqiang Wang, Xiang Pan, Mengjie Li, Xiaoyu Chu, Zijun Yao, Fangwei Zhu, Shulin Cao, Weicheng Xue, Zixuan Ma, Zhengyan Zhang, Shengding Hu, Yujia Qin, Chaojun Xiao, Zheni Zeng, Ganqu Cui, Weize Chen, Weilin Zhao, Yuan YAO, Peng Li, Wenzhao Zheng, Wenliang Zhao, Ziyi Wang, Borui Zhang, Nanyi Fei, Anwen Hu, Zenan Ling, Haoyang Li, Boxi Cao, Xianpei Han, Weidong Zhan, Baobao Chang, Hao Sun, Jiawen Deng, Chujie Zheng, Juanzi Li, Lei Hou, Xigang Cao, Jidong Zhai, Zhiyuan Liu, Maosong Sun, Jiwen Lu, Zhiwu Lu, Qin Jin, Ruihua Song, Ji-Rong Wen, Zhouchen Lin, LiWei Wang, Hang Su, Jun Zhu, Zhifang Sui, Jiajun Zhang, Yang Liu, Xiaodong He, Minlie Huang, Jian Tang, Jie Tang

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.

Language Modelling Machine Translation +1

Paper
Add Code

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

1 code implementation • CVPR 2022 • Yongming Rao, Wenliang Zhao, Guangyi Chen, Yansong Tang, Zheng Zhu, Guan Huang, Jie zhou, Jiwen Lu

In this work, we present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP.

Image-text matching Instance Segmentation +6

489

Paper
Code

Group-aware Contrastive Regression for Action Quality Assessment

1 code implementation • ICCV 2021 • Xumin Yu, Yongming Rao, Wenliang Zhao, Jiwen Lu, Jie zhou

Assessing action quality is challenging due to the subtle differences between videos and large variations in scores.

Ranked #2 on Action Quality Assessment on MTL-AQA

Action Quality Assessment regression

Paper
Code

Towards Interpretable Deep Metric Learning with Structural Matching

1 code implementation • ICCV 2021 • Wenliang Zhao, Yongming Rao, Ziyi Wang, Jiwen Lu, Jie zhou

Our method is model-agnostic, which can be applied to off-the-shelf backbone networks and metric learning methods.

Ranked #16 on Metric Learning on CUB-200-2011

Metric Learning

Paper
Code

Global Filter Networks for Image Classification

4 code implementations • NeurIPS 2021 • Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, Jie zhou

Recent advances in self-attention and pure multi-layer perceptrons (MLP) models for vision have shown great potential in achieving promising performance with fewer inductive biases.

Ranked #9 on Image Classification on Stanford Cars (using extra training data)

Classification Domain Generalization +1

388

Paper
Code

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

1 code implementation • NeurIPS 2021 • Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie zhou, Cho-Jui Hsieh

Based on this observation, we propose a dynamic token sparsification framework to prune redundant tokens progressively and dynamically based on the input.

Ranked #3 on Efficient ViTs on ImageNet-1K (With LV-ViT-S)

Blocking Efficient ViTs

532

Paper
Code

Field-weighted Factorization Machines for Click-Through Rate Prediction in Display Advertising

5 code implementations • 9 Jun 2018 • Junwei Pan, Jian Xu, Alfonso Lobos Ruiz, Wenliang Zhao, Shengjun Pan, Yu Sun, Quan Lu

The data involved in CTR prediction are typically multi-field categorical data, i. e., every feature is categorical and belongs to one and only one field.

Click-Through Rate Prediction

7,345

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.