Search Results for author: Hehe Fan

Found 42 papers, 19 papers with code

Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition

1 code implementation19 Dec 2024 Kun Li, Dan Guo, Guoliang Chen, Chunxiao Fan, Jingyuan Xu, Zhiliang Wu, Hehe Fan, Meng Wang

In addition, we propose a new prototypical diversity amplification loss to strengthen the model's capacity by amplifying the differences between different prototypes.

Emotion Recognition Micro-Action Recognition

EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space

no code implementations19 Dec 2024 Jianrong Zhang, Hehe Fan, Yi Yang

To address this issue, we propose EnergyMoGen, which includes two spectrums of Energy-Based Models: (1) We interpret the diffusion model as a latent-aware energy-based model that generates motions by composing a set of diffusion models in latent space; (2) We introduce a semantic-aware energy model based on cross-attention, which enables semantic composition and adaptive gradient descent for text embeddings.

Motion Generation Semantic Composition

InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation

no code implementations27 Nov 2024 Wenjie Zhuo, Fan Ma, Hehe Fan

InfiniDreamer addresses the limitations of current motion generation methods, which are typically restricted to short sequences due to the lack of long motion training data.

Motion Generation

CktGen: Specification-Conditioned Analog Circuit Generation

no code implementations1 Oct 2024 Yuxuan Hou, Jianrong Zhang, Hua Chen, Min Zhou, Faxin Yu, Hehe Fan, Yi Yang

To address this limitation, we introduce a task that directly generates analog circuits based on specified specifications, termed specification-conditioned analog circuit generation.

Contrastive Learning

ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning

no code implementations27 Aug 2024 Wenjin Hou, Dingjie Fu, Kun Li, Shiming Chen, Hehe Fan, Yi Yang

Due to the limited receptive fields of CNNs and the quadratic complexity of ViTs, however, these visual backbones achieve suboptimal visual-semantic interactions.

Mamba Representation Learning +1

Prototype Learning for Micro-gesture Classification

no code implementations6 Aug 2024 Guoliang Chen, Fei Wang, Kun Li, Zhiliang Wu, Hehe Fan, Yi Yang, Meng Wang, Dan Guo

In this paper, we briefly introduce the solution developed by our team, HFUT-VUT, for the track of Micro-gesture Classification in the MiGA challenge at IJCAI 2024.

Action Recognition Classification +2

VividDreamer: Invariant Score Distillation For Hyper-Realistic Text-to-3D Generation

no code implementations13 Jul 2024 Wenjie Zhuo, Fan Ma, Hehe Fan, Yi Yang

In this paper, SDS is decoupled into a weighted sum of two components: the reconstruction term and the classifier-free guidance term.

3D Generation Text to 3D

Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models

no code implementations24 May 2024 Yue Zhang, Hehe Fan, Yi Yang

To bridge the gap between vision and language modalities, Multimodal Large Language Models (MLLMs) usually learn an adapter that converts visual inputs to understandable tokens for Large Language Models (LLMs).

Question Answering Visual Question Answering

Clustering for Protein Representation Learning

1 code implementation CVPR 2024 Ruijie Quan, Wenguan Wang, Fan Ma, Hehe Fan, Yi Yang

We select the highest-scoring clusters and use their medoid nodes for the next iteration of clustering, until we obtain a hierarchical and informative representation of the protein.

Clustering Protein Folding +1

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing

no code implementations24 Mar 2024 Xiangpeng Yang, Linchao Zhu, Hehe Fan, Yi Yang

We find that the crux of the issue stems from the imprecise distribution of attention weights across designated regions, including inaccurate text-to-attribute control and attention leakage.

Attribute Video Editing

ProtChatGPT: Towards Understanding Proteins with Large Language Models

no code implementations15 Feb 2024 Chao Wang, Hehe Fan, Ruijie Quan, Yi Yang

The protein first undergoes protein encoders and PLP-former to produce protein embeddings, which are then projected by the adapter to conform with the LLM.

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

1 code implementation9 Feb 2024 Zhenglin Zhou, Fan Ma, Hehe Fan, Zongxin Yang, Yi Yang

Extensive experiments demonstrate the efficacy of HeadStudio in generating animatable avatars from textual prompts, exhibiting appealing appearances.

Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

1 code implementation CVPR 2024 Hang Du, Sicheng Zhang, Binzhu Xie, Guoshun Nan, Jiayang Zhang, Junrui Xu, Hangyu Liu, Sicong Leng, Jiangming Liu, Hehe Fan, Dajiu Huang, Jing Feng, Linli Chen, Can Zhang, Xuhuan Li, Hao Zhang, Jianhang Chen, Qimei Cui, Xiaofeng Tao

In addition we also introduce MMEval a novel evaluation metric designed to better align with human preferences for CUVA facilitating the measurement of existing LLMs in comprehending the underlying cause and corresponding effect of video anomalies.

Anomaly Detection

DocMSU: A Comprehensive Benchmark for Document-level Multimodal Sarcasm Understanding

1 code implementation26 Dec 2023 Hang Du, Guoshun Nan, Sicheng Zhang, Binzhu Xie, Junrui Xu, Hehe Fan, Qimei Cui, Xiaofeng Tao, Xudong Jiang

Multimodal Sarcasm Understanding (MSU) has a wide range of applications in the news field such as public opinion analysis and forgery detection.

Object Detection Sarcasm Detection +1

Building Category Graphs Representation with Spatial and Temporal Attention for Visual Navigation

no code implementations6 Dec 2023 Xiaobo Hu, Youfang Lin, Hehe Fan, Shuo Wang, Zhihao Wu, Kai Lv

To this end, an agent needs to 1) learn a piece of certain knowledge about the relations of object categories in the world during training and 2) look for the target object based on the pre-learned object category relations and its moving trajectory in the current unseen environment.

Object Visual Navigation

A Reliable Representation with Bidirectional Transition Model for Visual Reinforcement Learning Generalization

no code implementations4 Dec 2023 Xiaobo Hu, Youfang Lin, Yue Liu, Jinwen Wang, Shuo Wang, Hehe Fan, Kai Lv

Visual reinforcement learning has proven effective in solving control tasks with high-dimensional observations.

FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax

no code implementations27 Nov 2023 Yu Lu, Linchao Zhu, Hehe Fan, Yi Yang

Text-to-video (T2V) generation is a rapidly growing research area that aims to translate the scenes, objects, and actions within complex video text into a sequence of coherent visual frames.

Video Generation

Prior-Free Continual Learning with Unlabeled Data in the Wild

1 code implementation16 Oct 2023 Tao Zhuo, Zhiyong Cheng, Hehe Fan, Mohan Kankanhalli

Existing CL methods usually reduce forgetting with task priors, \ie using task identity or a subset of previously seen samples for model training.

Continual Learning Image Classification

Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos

no code implementations ICCV 2023 Xiaoxiao Sheng, Zhiqiang Shen, Gang Xiao, Longguang Wang, Yulan Guo, Hehe Fan

Instead of contrasting the representations of clips or frames, in this paper, we propose a unified self-supervised framework by conducting contrastive learning at the point level.

Contrastive Learning Representation Learning +1

DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation

no code implementations31 Jul 2023 Yue Zhang, Hehe Fan, Yi Yang, Mohan Kankanhalli

The proposed method, named Mixture of Depth and Point cloud video experts (DPMix), achieved the first place in the 4D Action Segmentation Track of the HOI4D Challenge 2023.

Action Segmentation Human-Object Interaction Detection +2

Keyword-Aware Relative Spatio-Temporal Graph Networks for Video Question Answering

no code implementations25 Jul 2023 Yi Cheng, Hehe Fan, Dongyun Lin, Ying Sun, Mohan Kankanhalli, Joo-Hwee Lim

The main challenge in video question answering (VideoQA) is to capture and understand the complex spatial and temporal relations between objects based on given questions.

graph construction Question Answering +2

A Study on Differentiable Logic and LLMs for EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2023

no code implementations13 Jul 2023 Yi Cheng, Ziwei Xu, Fen Fang, Dongyun Lin, Hehe Fan, Yongkang Wong, Ying Sun, Mohan Kankanhalli

Our research focuses on the innovative application of a differentiable logic loss in the training to leverage the co-occurrence relations between verb and noun, as well as the pre-trained Large Language Models (LLMs) to generate the logic rules for the adaptation to unseen action labels.

Action Recognition Unsupervised Domain Adaptation

Continual Learning with Strong Experience Replay

1 code implementation23 May 2023 Tao Zhuo, Zhiyong Cheng, Zan Gao, Hehe Fan, Mohan Kankanhalli

Experience Replay (ER) is a simple and effective rehearsal-based strategy, which optimizes the model with current training data and a subset of old samples stored in a memory buffer.

Continual Learning Image Classification

Text to Point Cloud Localization with Relation-Enhanced Transformer

no code implementations13 Jan 2023 Guangzhi Wang, Hehe Fan, Mohan Kankanhalli

To overcome these two challenges, we propose a unified Relation-Enhanced Transformer (RET) to improve representation discriminability for both point cloud and natural language queries.

Natural Language Queries Relation

STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition

no code implementations ICCV 2023 Ming Li, Xiangyu Xu, Hehe Fan, Pan Zhou, Jun Liu, Jia-Wei Liu, Jiahe Li, Jussi Keppo, Mike Zheng Shou, Shuicheng Yan

For the first time, we introduce vision Transformers into PPAR by treating a video as a tubelet sequence, and accordingly design two complementary mechanisms, i. e., sparsification and anonymization, to remove privacy from a spatio-temporal perspective.

Action Recognition Facial Expression Recognition (FER) +2

PointListNet: Deep Learning on 3D Point Lists

no code implementations CVPR 2023 Hehe Fan, Linchao Zhu, Yi Yang, Mohan Kankanhalli

Deep neural networks on regular 1D lists (e. g., natural languages) and irregular 3D sets (e. g., point clouds) have made tremendous achievements.

Deep Learning

Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?

2 code implementations15 Sep 2022 Yi Wang, Zhiwen Fan, Tianlong Chen, Hehe Fan, Zhangyang Wang

Vision Transformers (ViTs) have proven to be effective, in solving 2D image understanding tasks by training over large-scale image datasets; and meanwhile as a somehow separate track, in modeling the 3D visual world too such as voxels or point clouds.

Point Cloud Segmentation

SEFormer: Structure Embedding Transformer for 3D Object Detection

no code implementations5 Sep 2022 Xiaoyu Feng, Heming Du, Yueqi Duan, Yongpan Liu, Hehe Fan

Effectively preserving and encoding structure features from objects in irregular and sparse LiDAR points is a key challenge to 3D object detection on point cloud.

3D Object Detection Autonomous Driving +2

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

1 code implementation ICLR 2021 Hehe Fan, Xin Yu, Yuhang Ding, Yi Yang, Mohan Kankanhalli

Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension.

3D Action Recognition Semantic Segmentation

PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing

2 code implementations18 Oct 2019 Hehe Fan, Yi Yang

We apply PointRNN, PointGRU and PointLSTM to moving point cloud prediction, which aims to predict the future trajectories of points in a set given their history movements.

Moving Point Cloud Processing

Cascaded Revision Network for Novel Object Captioning

1 code implementation6 Aug 2019 Qianyu Feng, Yu Wu, Hehe Fan, Chenggang Yan, Yi Yang

By this novel cascaded captioning-revising mechanism, CRN can accurately describe images with unseen objects.

Image Captioning Object +3

Attract or Distract: Exploit the Margin of Open Set

1 code implementation ICCV 2019 Qianyu Feng, Guoliang Kang, Hehe Fan, Yi Yang

In this paper, we exploit the semantic structure of open set data from two aspects: 1) Semantic Categorical Alignment, which aims to achieve good separability of target known classes by categorically aligning the centroid of target with the source.

Domain Adaptation

Adaptive Exploration for Unsupervised Person Re-Identification

1 code implementation9 Jul 2019 Yuhang Ding, Hehe Fan, Mingliang Xu, Yi Yang

However, a problem of the adaptive selection is that, when an image has too many neighborhoods, it is more likely to attract other images as its neighborhoods.

Unsupervised Person Re-Identification

Cubic LSTMs for Video Prediction

no code implementations20 Apr 2019 Hehe Fan, Linchao Zhu, Yi Yang

Predicting future frames in videos has become a promising direction of research for both computer vision and robot learning communities.

motion prediction Video Prediction

Complex Event Detection by Identifying Reliable Shots From Untrimmed Videos

no code implementations ICCV 2017 Hehe Fan, Xiaojun Chang, De Cheng, Yi Yang, Dong Xu, Alexander G. Hauptmann

relevant) to the given event class, we formulate this task as a multi-instance learning (MIL) problem by taking each video as a bag and the video shots in each video as instances.

Event Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.