Search Results for author: Zihang Jiang

Found 16 papers, 13 papers with code

PostoMETRO: Pose Token Enhanced Mesh Transformer for Robust 3D Human Mesh Recovery

no code implementations • 19 Mar 2024 • Wendi Yang, Zihang Jiang, Shang Zhao, S. Kevin Zhou

With the recent advancements in single-image-based human mesh recovery, there is a growing interest in enhancing its performance in certain extreme scenarios, such as occlusion, while maintaining overall model accuracy.

Ranked #8 on 3D Human Pose Estimation on 3DPW

3D Human Pose Estimation 3D Reconstruction +2

Paper
Add Code

CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification

1 code implementation • 27 Feb 2024 • Haoran Lai, Qingsong Yao, Zihang Jiang, Rongsheng Wang, ZhiYang He, Xiaodong Tao, S. Kevin Zhou

The advancement of Zero-Shot Learning in the medical domain has been driven forward by using pre-trained models on large-scale image-text pairs, focusing on image-text alignment.

Classification Language Modelling +3

Paper
Code

ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training

1 code implementation • 20 Dec 2023 • Rongsheng Wang, Qingsong Yao, Haoran Lai, ZhiYang He, Xiaodong Tao, Zihang Jiang, S. Kevin Zhou

Despite significant advancements in medical vision-language pre-training, existing methods have largely overlooked the inherent entity-specific context within radiology reports and the complex cross-modality contextual relationships between text and images.

Language Modelling Large Language Model +2

Paper
Code

TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration

1 code implementation • ICCV 2023 • Kehong Gong, Dongze Lian, Heng Chang, Chuan Guo, Zihang Jiang, Xinxin Zuo, Michael Bi Mi, Xinchao Wang

We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities.

motion prediction Motion Synthesis

Paper
Code

OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis

no code implementations • CVPR 2023 • Hongyi Xu, Guoxian Song, Zihang Jiang, Jianfeng Zhang, Yichun Shi, Jing Liu, WanChun Ma, Jiashi Feng, Linjie Luo

We present OmniAvatar, a novel geometry-guided 3D head synthesis model trained from in-the-wild unstructured images that is capable of synthesizing diverse identity-preserved 3D heads with compelling dynamic details under full disentangled control over camera poses, facial expressions, head shapes, articulated neck and jaw poses.

Paper
Add Code

AgileGAN3D: Few-Shot 3D Portrait Stylization by Augmented Transfer Learning

no code implementations • 24 Mar 2023 • Guoxian Song, Hongyi Xu, Jing Liu, Tiancheng Zhi, Yichun Shi, Jianfeng Zhang, Zihang Jiang, Jiashi Feng, Shen Sang, Linjie Luo

Capitalizing on the recent advancement of 3D-aware GAN models, we perform \emph{guided transfer learning} on a pretrained 3D GAN generator to produce multi-view-consistent stylized renderings.

Transfer Learning

Paper
Add Code

AvatarGen: A 3D Generative Model for Animatable Human Avatars

1 code implementation • 26 Nov 2022 • Jianfeng Zhang, Zihang Jiang, Dingdong Yang, Hongyi Xu, Yichun Shi, Guoxian Song, Zhongcong Xu, Xinchao Wang, Jiashi Feng

Specifically, we decompose the generative 3D human synthesis into pose-guided mapping and canonical representation with predefined human pose and shape, such that the canonical representation can be explicitly driven to different poses and shapes with the guidance of a 3D parametric human model SMPL.

241

Paper
Code

AvatarGen: a 3D Generative Model for Animatable Human Avatars

1 code implementation • 1 Aug 2022 • Jianfeng Zhang, Zihang Jiang, Dingdong Yang, Hongyi Xu, Yichun Shi, Guoxian Song, Zhongcong Xu, Xinchao Wang, Jiashi Feng

Unsupervised generation of clothed virtual humans with various appearance and animatable poses is important for creating 3D human avatars and other AR/VR applications.

3D Human Reconstruction

241

Paper
Code

Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning

1 code implementation • CVPR 2022 • Yujun Shi, Kuangqi Zhou, Jian Liang, Zihang Jiang, Jiashi Feng, Philip Torr, Song Bai, Vincent Y. F. Tan

Specifically, we experimentally show that directly encouraging CIL Learner at the initial phase to output similar representations as the model jointly trained on all classes can greatly boost the CIL performance.

Class Incremental Learning Incremental Learning

Paper
Code

VOLO: Vision Outlooker for Visual Recognition

7 code implementations • 24 Jun 2021 • Li Yuan, Qibin Hou, Zihang Jiang, Jiashi Feng, Shuicheng Yan

Though recently the prevailing vision transformers (ViTs) have shown great potential of self-attention based models in ImageNet classification, their performance is still inferior to that of the latest SOTA CNNs if no extra data are provided.

Ranked #1 on Image Classification on VizWiz-Classification

Domain Generalization Image Classification +1

29,758

Paper
Code

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition

4 code implementations • 23 Jun 2021 • Qibin Hou, Zihang Jiang, Li Yuan, Ming-Ming Cheng, Shuicheng Yan, Jiashi Feng

By realizing the importance of the positional information carried by 2D feature representations, unlike recent MLP-like models that encode the spatial information along the flattened spatial dimensions, Vision Permutator separately encodes the feature representations along the height and width dimensions with linear projections.

184

Paper
Code

LV-BERT: Exploiting Layer Variety for BERT

1 code implementation • Findings (ACL) 2021 • Weihao Yu, Zihang Jiang, Fei Chen, Qibin Hou, Jiashi Feng

In this paper, beyond this stereotyped layer pattern, we aim to improve pre-trained models by exploiting layer variety from two aspects: the layer type set and the layer order.

Paper
Code

Refiner: Refining Self-attention for Vision Transformers

1 code implementation • 7 Jun 2021 • Daquan Zhou, Yujun Shi, Bingyi Kang, Weihao Yu, Zihang Jiang, Yuan Li, Xiaojie Jin, Qibin Hou, Jiashi Feng

Vision Transformers (ViTs) have shown competitive accuracy in image classification tasks compared with CNNs.

Ranked #174 on Image Classification on ImageNet

Image Classification

106

Paper
Code

All Tokens Matter: Token Labeling for Training Better Vision Transformers

6 code implementations • NeurIPS 2021 • Zihang Jiang, Qibin Hou, Li Yuan, Daquan Zhou, Yujun Shi, Xiaojie Jin, Anran Wang, Jiashi Feng

In this paper, we present token labeling -- a new training objective for training high-performance vision transformers (ViTs).

Ranked #3 on Efficient ViTs on ImageNet-1K (With LV-ViT-S)

Efficient ViTs General Classification +1

417

Paper
Code

DeepViT: Towards Deeper Vision Transformer

5 code implementations • 22 Mar 2021 • Daquan Zhou, Bingyi Kang, Xiaojie Jin, Linjie Yang, Xiaochen Lian, Zihang Jiang, Qibin Hou, Jiashi Feng

In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the performance of ViTs saturate fast when scaled to be deeper.

Ranked #426 on Image Classification on ImageNet

Image Classification Representation Learning

134

Paper
Code

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

13 code implementations • ICCV 2021 • Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis EH Tay, Jiashi Feng, Shuicheng Yan

To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study.

Ranked #403 on Image Classification on ImageNet

Image Classification Language Modelling

3,157

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.