Search Results for author: Yanbin Hao

Found 19 papers, 11 papers with code

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

1 code implementation • 28 Mar 2024 • Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi Tian

Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo-realistic images and giving rise to a multitude of applications.

Data Augmentation Image Classification

Paper
Code

Boosting Few-Shot Learning via Attentive Feature Regularization

no code implementations • 23 Mar 2024 • Xingyu Zhu, Shuo Wang, Jinda Lu, Yanbin Hao, Haifeng Liu, Xiangnan He

Few-shot learning (FSL) based on manifold regularization aims to improve the recognition capacity of novel objects with limited training samples by mixing two samples from different categories with a blending factor.

Few-Shot Learning

Paper
Add Code

Noise-NeRF: Hide Information in Neural Radiance Fields using Trainable Noise

no code implementations • 2 Jan 2024 • Qinglong Huang, Yong Liao, Yanbin Hao, Pengyuan Zhou

Neural radiance fields (NeRF) have been proposed as an innovative 3D representation method.

Image Steganography Super-Resolution

Paper
Add Code

CAR: Consolidation, Augmentation and Regulation for Recipe Retrieval

no code implementations • 8 Dec 2023 • Fangzhou Song, Bin Zhu, Yanbin Hao, Shuo Wang, Xiangnan He

Learning recipe and food image representation in common embedding space is non-trivial but crucial for cross-modal recipe retrieval.

Retrieval

Paper
Add Code

3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing

no code implementations • 18 Nov 2023 • Haoran Li, Long Ma, Yong Liao, Lechao Cheng, Yanbin Hao, Pengyuan Zhou

First, we segment the objects and the background in a multi-object image.

Attribute Object +1

Paper
Add Code

Selective Volume Mixup for Video Action Recognition

no code implementations • 18 Sep 2023 • Yi Tan, Zhaofan Qiu, Yanbin Hao, Ting Yao, Xiangnan He, Tao Mei

In this paper, we propose a novel video augmentation strategy named Selective Volume Mixup (SV-Mix) to improve the generalization ability of deep models with limited training videos.

Action Recognition Image Augmentation +1

Paper
Add Code

CgT-GAN: CLIP-guided Text GAN for Image Captioning

1 code implementation • 23 Aug 2023 • Jiarui Yu, Haoran Li, Yanbin Hao, Bin Zhu, Tong Xu, Xiangnan He

Particularly, we use adversarial training to teach CgT-GAN to mimic the phrases of an external text corpus and CLIP-based reward to provide semantic guidance.

Image Captioning

Paper
Code

Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation

1 code implementation • 15 May 2023 • Fangwen Wu, Jingxuan He, Yufei Yin, Yanbin Hao, Gang Huang, Lechao Cheng

This study introduces an efficacious approach, Masked Collaborative Contrast (MCC), to highlight semantic regions in weakly supervised semantic segmentation.

Contrastive Learning Weakly supervised Semantic Segmentation +1

Paper
Code

TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction

no code implementations • 17 Mar 2023 • Haoran Li, Pengyuan Zhou, Yihang Lin, Yanbin Hao, Haiyong Xie, Yong Liao

Video prediction is a complex time-series forecasting task with great potential in many use cases.

Time Series Time Series Forecasting +1

Paper
Add Code

Bi-directional Distribution Alignment for Transductive Zero-Shot Learning

1 code implementation • CVPR 2023 • Zhicai Wang, Yanbin Hao, Tingting Mu, Ouxiang Li, Shuo Wang, Xiangnan He

It is well-known that zero-shot learning (ZSL) can suffer severely from the problem of domain shift, where the true and learned data distributions for the unseen classes do not match.

Zero-Shot Learning

Paper
Code

3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention

1 code implementation • CVPR 2023 • Zhenhua Tang, Zhaofan Qiu, Yanbin Hao, Richang Hong, Ting Yao

On this basis, we devise STCFormer by stacking multiple STC blocks and further integrate a new Structure-enhanced Positional Embedding (SPE) into STCFormer to take the structure of human body into consideration.

Ranked #6 on 3D Human Pose Estimation on MPI-INF-3DHP

3D Human Pose Estimation

Paper
Code

Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

1 code implementation • 15 Jul 2022 • Zhicai Wang, Yanbin Hao, Xingyu Gao, Hao Zhang, Shuo Wang, Tingting Mu, Xiangnan He

They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers.

Paper
Code

Long-term Leap Attention, Short-term Periodic Shift for Video Classification

1 code implementation • 12 Jul 2022 • Hao Zhang, Lechao Cheng, Yanbin Hao, Chong-Wah Ngo

By replacing a vanilla 2D attention with the LAPS, we could adapt a static transformer into a video one, with zero extra parameters and neglectable computation overhead ($\sim$2. 6\%).

Video Classification

Paper
Code

Attention in Attention: Modeling Context Correlation for Efficient Video Classification

1 code implementation • 20 Apr 2022 • Yanbin Hao, Shuo Wang, Pei Cao, Xinjian Gao, Tong Xu, Jinmeng Wu, Xiangnan He

Attention mechanisms have significantly boosted the performance of video classification neural networks thanks to the utilization of perspective contexts.

Video Classification

Paper
Code

Group Contextualization for Video Recognition

1 code implementation • CVPR 2022 • Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Xiangnan He

By utilizing calibrators to embed feature with four different kinds of contexts in parallel, the learnt representation is expected to be more resilient to diverse types of activities.

Ranked #3 on Egocentric Activity Recognition on EGTEA

Action Recognition Egocentric Activity Recognition +1

Paper
Code

Token Shift Transformer for Video Classification

3 code implementations • 5 Aug 2021 • Hao Zhang, Yanbin Hao, Chong-Wah Ngo

It is worth noticing that our TokShift transformer is a pure convolutional-free video transformer pilot with computational efficiency for video understanding.

Classification Computational Efficiency +2

Paper
Code

Aggregated Multi-GANs for Controlled 3D Human Motion Prediction

no code implementations • 17 Mar 2021 • Zhenguang Liu, Kedi Lyu, Shuang Wu, Haipeng Chen, Yanbin Hao, Shouling Ji

Our method is compelling in that it enables manipulable motion prediction across activity types and allows customization of the human movement in a variety of fine-grained ways.

Human motion prediction motion prediction

Paper
Add Code

Motion Prediction Using Trajectory Cues

1 code implementation • ICCV 2021 • Zhenguang Liu, Pengxiang Su, Shuang Wu, Xuanjing Shen, Haipeng Chen, Yanbin Hao, Meng Wang

Predicting human motion from a historical pose sequence is at the core of many applications in computer vision.

motion prediction

Paper
Code

Cross-sentence Pre-trained Model for Interactive QA matching

no code implementations • LREC 2020 • Jinmeng Wu, Yanbin Hao

In addition to the context information captured at each word position, we incorporate a new quantity of context information jump to facilitate the attention weight formulation.

Language Modelling Sentence

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.