Search Results for author: Yanbin Hao

Found 19 papers, 11 papers with code

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

1 code implementation28 Mar 2024 Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi Tian

Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo-realistic images and giving rise to a multitude of applications.

Data Augmentation Image Classification

Boosting Few-Shot Learning via Attentive Feature Regularization

no code implementations23 Mar 2024 Xingyu Zhu, Shuo Wang, Jinda Lu, Yanbin Hao, Haifeng Liu, Xiangnan He

Few-shot learning (FSL) based on manifold regularization aims to improve the recognition capacity of novel objects with limited training samples by mixing two samples from different categories with a blending factor.

Few-Shot Learning

CAR: Consolidation, Augmentation and Regulation for Recipe Retrieval

no code implementations8 Dec 2023 Fangzhou Song, Bin Zhu, Yanbin Hao, Shuo Wang, Xiangnan He

Learning recipe and food image representation in common embedding space is non-trivial but crucial for cross-modal recipe retrieval.

Retrieval

Selective Volume Mixup for Video Action Recognition

no code implementations18 Sep 2023 Yi Tan, Zhaofan Qiu, Yanbin Hao, Ting Yao, Xiangnan He, Tao Mei

In this paper, we propose a novel video augmentation strategy named Selective Volume Mixup (SV-Mix) to improve the generalization ability of deep models with limited training videos.

Action Recognition Image Augmentation +1

CgT-GAN: CLIP-guided Text GAN for Image Captioning

1 code implementation23 Aug 2023 Jiarui Yu, Haoran Li, Yanbin Hao, Bin Zhu, Tong Xu, Xiangnan He

Particularly, we use adversarial training to teach CgT-GAN to mimic the phrases of an external text corpus and CLIP-based reward to provide semantic guidance.

Image Captioning

Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation

1 code implementation15 May 2023 Fangwen Wu, Jingxuan He, Yufei Yin, Yanbin Hao, Gang Huang, Lechao Cheng

This study introduces an efficacious approach, Masked Collaborative Contrast (MCC), to highlight semantic regions in weakly supervised semantic segmentation.

Contrastive Learning Weakly supervised Semantic Segmentation +1

Bi-directional Distribution Alignment for Transductive Zero-Shot Learning

1 code implementation CVPR 2023 Zhicai Wang, Yanbin Hao, Tingting Mu, Ouxiang Li, Shuo Wang, Xiangnan He

It is well-known that zero-shot learning (ZSL) can suffer severely from the problem of domain shift, where the true and learned data distributions for the unseen classes do not match.

Zero-Shot Learning

3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention

1 code implementation CVPR 2023 Zhenhua Tang, Zhaofan Qiu, Yanbin Hao, Richang Hong, Ting Yao

On this basis, we devise STCFormer by stacking multiple STC blocks and further integrate a new Structure-enhanced Positional Embedding (SPE) into STCFormer to take the structure of human body into consideration.

3D Human Pose Estimation

Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

1 code implementation15 Jul 2022 Zhicai Wang, Yanbin Hao, Xingyu Gao, Hao Zhang, Shuo Wang, Tingting Mu, Xiangnan He

They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers.

Long-term Leap Attention, Short-term Periodic Shift for Video Classification

1 code implementation12 Jul 2022 Hao Zhang, Lechao Cheng, Yanbin Hao, Chong-Wah Ngo

By replacing a vanilla 2D attention with the LAPS, we could adapt a static transformer into a video one, with zero extra parameters and neglectable computation overhead ($\sim$2. 6\%).

Video Classification

Attention in Attention: Modeling Context Correlation for Efficient Video Classification

1 code implementation20 Apr 2022 Yanbin Hao, Shuo Wang, Pei Cao, Xinjian Gao, Tong Xu, Jinmeng Wu, Xiangnan He

Attention mechanisms have significantly boosted the performance of video classification neural networks thanks to the utilization of perspective contexts.

Video Classification

Group Contextualization for Video Recognition

1 code implementation CVPR 2022 Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Xiangnan He

By utilizing calibrators to embed feature with four different kinds of contexts in parallel, the learnt representation is expected to be more resilient to diverse types of activities.

Action Recognition Egocentric Activity Recognition +1

Token Shift Transformer for Video Classification

3 code implementations5 Aug 2021 Hao Zhang, Yanbin Hao, Chong-Wah Ngo

It is worth noticing that our TokShift transformer is a pure convolutional-free video transformer pilot with computational efficiency for video understanding.

Classification Computational Efficiency +2

Aggregated Multi-GANs for Controlled 3D Human Motion Prediction

no code implementations17 Mar 2021 Zhenguang Liu, Kedi Lyu, Shuang Wu, Haipeng Chen, Yanbin Hao, Shouling Ji

Our method is compelling in that it enables manipulable motion prediction across activity types and allows customization of the human movement in a variety of fine-grained ways.

Human motion prediction motion prediction

Motion Prediction Using Trajectory Cues

1 code implementation ICCV 2021 Zhenguang Liu, Pengxiang Su, Shuang Wu, Xuanjing Shen, Haipeng Chen, Yanbin Hao, Meng Wang

Predicting human motion from a historical pose sequence is at the core of many applications in computer vision.

motion prediction

Cross-sentence Pre-trained Model for Interactive QA matching

no code implementations LREC 2020 Jinmeng Wu, Yanbin Hao

In addition to the context information captured at each word position, we incorporate a new quantity of context information jump to facilitate the attention weight formulation.

Language Modelling Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.