Search Results for author: Zhengzhuo Xu

Found 13 papers, 9 papers with code

ChartBench: A Benchmark for Complex Visual Reasoning in Charts

no code implementations26 Dec 2023 Zhengzhuo Xu, Sinan Du, Yiyan Qi, Chengjin Xu, Chun Yuan, Jian Guo

Multimodal Large Language Models (MLLMs) demonstrate impressive image understanding and generating capabilities.

Visual Reasoning

Towards Effective Collaborative Learning in Long-Tailed Recognition

no code implementations5 May 2023 Zhengzhuo Xu, Zenghao Chai, Chengyin Xu, Chun Yuan, Haiqin Yang

In this paper, we observe that the knowledge transfer between experts is imbalanced in terms of class distribution, which results in limited performance improvement of the minority classes.

Transfer Learning

Rethink Long-tailed Recognition with Vision Transformers

no code implementations28 Feb 2023 Zhengzhuo Xu, Shuo Yang, Xingjun Wang, Chun Yuan

Hence, we propose to adopt unsupervised learning to utilize long-tailed data.

Accurate 3D Face Reconstruction with Facial Component Tokens

no code implementations ICCV 2023 Tianke Zhang, Xuangeng Chu, Yunfei Liu, Lijian Lin, Zhendong Yang, Zhengzhuo Xu, Chengkun Cao, Fei Yu, Changyin Zhou, Chun Yuan, Yu Li

However, the current deep learning-based methods face significant challenges in achieving accurate reconstruction with disentangled facial parameters and ensuring temporal stability in single-frame methods for 3D face tracking on video data.

3D Face Reconstruction

Learning Imbalanced Data with Vision Transformers

1 code implementation CVPR 2023 Zhengzhuo Xu, Ruikang Liu, Shuo Yang, Zenghao Chai, Chun Yuan

In this paper, we systematically investigate the ViTs' performance in LTR and propose LiVT to train ViTs from scratch only with LT data.

Long-tail Learning

HyP$^2$ Loss: Beyond Hypersphere Metric Space for Multi-label Image Retrieval

1 code implementation14 Aug 2022 Chengyin Xu, Zenghao Chai, Zhengzhuo Xu, Chun Yuan, Yanbo Fan, Jue Wang

Image retrieval has become an increasingly appealing technique with broad multimedia application prospects, where deep hashing serves as the dominant branch towards low storage and efficient retrieval.

Deep Hashing Metric Learning +1

REALY: Rethinking the Evaluation of 3D Face Reconstruction

1 code implementation18 Mar 2022 Zenghao Chai, Haoxian Zhang, Jing Ren, Di Kang, Zhengzhuo Xu, Xuefei Zhe, Chun Yuan, Linchao Bao

The evaluation of 3D face reconstruction results typically relies on a rigid shape alignment between the estimated 3D model and the ground-truth scan.

3D Face Reconstruction

Semantic-Sparse Colorization Network for Deep Exemplar-based Colorization

1 code implementation2 Dec 2021 Yunpeng Bai, Chao Dong, Zenghao Chai, Andong Wang, Zhengzhuo Xu, Chun Yuan

To address these two problems, we propose Semantic-Sparse Colorization Network (SSCN) to transfer both the global image style and detailed semantic-related colors to the gray-scale image in a coarse-to-fine manner.

Colorization

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

1 code implementation NeurIPS 2021 Zhengzhuo Xu, Zenghao Chai, Chun Yuan

Real-world data universally confronts a severe class-imbalance problem and exhibits a long-tailed distribution, i. e., most labels are associated with limited instances.

Data Augmentation Long-tail Learning

MoDeRNN: Towards Fine-grained Motion Details for Spatiotemporal Predictive Learning

1 code implementation25 Oct 2021 Zenghao Chai, Zhengzhuo Xu, Chun Yuan

We carefully design Detail Context Block (DCB) to extract fine-grained details and improve the isolated correlation between upper context state and current input state.

CMS-LSTM: Context Embedding and Multi-Scale Spatiotemporal Expression LSTM for Predictive Learning

1 code implementation6 Feb 2021 Zenghao Chai, Zhengzhuo Xu, Yunpeng Bai, Zhihui Lin, Chun Yuan

To tackle the increasing ambiguity during forecasting, we design CMS-LSTM to focus on context correlations and multi-scale spatiotemporal flow with details on fine-grained locals, containing two elaborate designed blocks: Context Embedding (CE) and Spatiotemporal Expression (SE) blocks.

Video Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.