1 code implementation • 19 Dec 2024 • Kun Li, Dan Guo, Guoliang Chen, Chunxiao Fan, Jingyuan Xu, Zhiliang Wu, Hehe Fan, Meng Wang
In addition, we propose a new prototypical diversity amplification loss to strengthen the model's capacity by amplifying the differences between different prototypes.
Ranked #1 on Micro-Action Recognition on MA-52
no code implementations • 19 Dec 2024 • Jianrong Zhang, Hehe Fan, Yi Yang
To address this issue, we propose EnergyMoGen, which includes two spectrums of Energy-Based Models: (1) We interpret the diffusion model as a latent-aware energy-based model that generates motions by composing a set of diffusion models in latent space; (2) We introduce a semantic-aware energy model based on cross-attention, which enables semantic composition and adaptive gradient descent for text embeddings.
no code implementations • 27 Nov 2024 • Wenjie Zhuo, Fan Ma, Hehe Fan
InfiniDreamer addresses the limitations of current motion generation methods, which are typically restricted to short sequences due to the lack of long motion training data.
no code implementations • 1 Oct 2024 • Yuxuan Hou, Jianrong Zhang, Hua Chen, Min Zhou, Faxin Yu, Hehe Fan, Yi Yang
To address this limitation, we introduce a task that directly generates analog circuits based on specified specifications, termed specification-conditioned analog circuit generation.
no code implementations • 27 Aug 2024 • Wenjin Hou, Dingjie Fu, Kun Li, Shiming Chen, Hehe Fan, Yi Yang
Due to the limited receptive fields of CNNs and the quadratic complexity of ViTs, however, these visual backbones achieve suboptimal visual-semantic interactions.
no code implementations • 6 Aug 2024 • Guoliang Chen, Fei Wang, Kun Li, Zhiliang Wu, Hehe Fan, Yi Yang, Meng Wang, Dan Guo
In this paper, we briefly introduce the solution developed by our team, HFUT-VUT, for the track of Micro-gesture Classification in the MiGA challenge at IJCAI 2024.
no code implementations • 13 Jul 2024 • Wenjie Zhuo, Fan Ma, Hehe Fan, Yi Yang
In this paper, SDS is decoupled into a weighted sum of two components: the reconstruction term and the classifier-free guidance term.
no code implementations • 24 May 2024 • Yue Zhang, Hehe Fan, Yi Yang
To bridge the gap between vision and language modalities, Multimodal Large Language Models (MLLMs) usually learn an adapter that converts visual inputs to understandable tokens for Large Language Models (LLMs).
1 code implementation • 22 May 2024 • Wei Li, Hehe Fan, Yongkang Wong, Mohan Kankanhalli, Yi Yang
Recent advancements in image understanding have benefited from the extensive use of web image-text pairs.
2 code implementations • 30 Apr 2024 • Hang Du, Sicheng Zhang, Binzhu Xie, Guoshun Nan, Jiayang Zhang, Junrui Xu, Hangyu Liu, Sicong Leng, Jiangming Liu, Hehe Fan, Dajiu Huang, Jing Feng, Linli Chen, Can Zhang, Xuhuan Li, Hao Zhang, Jianhang Chen, Qimei Cui, Xiaofeng Tao
In pursuit of these answers, we present a comprehensive benchmark for Causation Understanding of Video Anomaly (CUVA).
1 code implementation • CVPR 2024 • Ruijie Quan, Wenguan Wang, Fan Ma, Hehe Fan, Yi Yang
We select the highest-scoring clusters and use their medoid nodes for the next iteration of clustering, until we obtain a hierarchical and informative representation of the protein.
no code implementations • 24 Mar 2024 • Xiangpeng Yang, Linchao Zhu, Hehe Fan, Yi Yang
We find that the crux of the issue stems from the imprecise distribution of attention weights across designated regions, including inaccurate text-to-attribute control and attention leakage.
no code implementations • 15 Feb 2024 • Chao Wang, Hehe Fan, Ruijie Quan, Yi Yang
The protein first undergoes protein encoders and PLP-former to produce protein embeddings, which are then projected by the adapter to conform with the LLM.
1 code implementation • 9 Feb 2024 • Zhenglin Zhou, Fan Ma, Hehe Fan, Zongxin Yang, Yi Yang
Extensive experiments demonstrate the efficacy of HeadStudio in generating animatable avatars from textual prompts, exhibiting appealing appearances.
1 code implementation • 29 Jan 2024 • Yuze Hao, Jianrong Zhang, Tao Zhuo, Fuan Wen, Hehe Fan
To address this problem, we propose a data-driven method for coarse motion refinement.
1 code implementation • CVPR 2024 • Hang Du, Sicheng Zhang, Binzhu Xie, Guoshun Nan, Jiayang Zhang, Junrui Xu, Hangyu Liu, Sicong Leng, Jiangming Liu, Hehe Fan, Dajiu Huang, Jing Feng, Linli Chen, Can Zhang, Xuhuan Li, Hao Zhang, Jianhang Chen, Qimei Cui, Xiaofeng Tao
In addition we also introduce MMEval a novel evaluation metric designed to better align with human preferences for CUVA facilitating the measurement of existing LLMs in comprehending the underlying cause and corresponding effect of video anomalies.
1 code implementation • 26 Dec 2023 • Hang Du, Guoshun Nan, Sicheng Zhang, Binzhu Xie, Junrui Xu, Hehe Fan, Qimei Cui, Xiaofeng Tao, Xudong Jiang
Multimodal Sarcasm Understanding (MSU) has a wide range of applications in the news field such as public opinion analysis and forgery detection.
no code implementations • 6 Dec 2023 • Xiaobo Hu, Youfang Lin, Hehe Fan, Shuo Wang, Zhihao Wu, Kai Lv
To this end, an agent needs to 1) learn a piece of certain knowledge about the relations of object categories in the world during training and 2) look for the target object based on the pre-learned object category relations and its moving trajectory in the current unseen environment.
no code implementations • 4 Dec 2023 • Xiaobo Hu, Youfang Lin, Yue Liu, Jinwen Wang, Shuo Wang, Hehe Fan, Kai Lv
Visual reinforcement learning has proven effective in solving control tasks with high-dimensional observations.
no code implementations • 27 Nov 2023 • Yu Lu, Linchao Zhu, Hehe Fan, Yi Yang
Text-to-video (T2V) generation is a rapidly growing research area that aims to translate the scenes, objects, and actions within complex video text into a sequence of coherent visual frames.
1 code implementation • 16 Oct 2023 • Tao Zhuo, Zhiyong Cheng, Hehe Fan, Mohan Kankanhalli
Existing CL methods usually reduce forgetting with task priors, \ie using task identity or a subset of previously seen samples for model training.
no code implementations • ICCV 2023 • Xiaoxiao Sheng, Zhiqiang Shen, Gang Xiao, Longguang Wang, Yulan Guo, Hehe Fan
Instead of contrasting the representations of clips or frames, in this paper, we propose a unified self-supervised framework by conducting contrastive learning at the point level.
1 code implementation • ICCV 2023 • Zhiqiang Shen, Xiaoxiao Sheng, Hehe Fan, Longguang Wang, Yulan Guo, Qiong Liu, Hao Wen, Xi Zhou
In this paper, we propose a Masked Spatio-Temporal Structure Prediction (MaST-Pre) method to capture the structure of point cloud videos without human annotations.
no code implementations • 31 Jul 2023 • Yue Zhang, Hehe Fan, Yi Yang, Mohan Kankanhalli
The proposed method, named Mixture of Depth and Point cloud video experts (DPMix), achieved the first place in the 4D Action Segmentation Track of the HOI4D Challenge 2023.
no code implementations • 25 Jul 2023 • Yi Cheng, Hehe Fan, Dongyun Lin, Ying Sun, Mohan Kankanhalli, Joo-Hwee Lim
The main challenge in video question answering (VideoQA) is to capture and understand the complex spatial and temporal relations between objects based on given questions.
no code implementations • 13 Jul 2023 • Yi Cheng, Ziwei Xu, Fen Fang, Dongyun Lin, Hehe Fan, Yongkang Wong, Ying Sun, Mohan Kankanhalli
Our research focuses on the innovative application of a differentiable logic loss in the training to leverage the co-occurrence relations between verb and noun, as well as the pre-trained Large Language Models (LLMs) to generate the logic rules for the adaptation to unseen action labels.
1 code implementation • 23 May 2023 • Tao Zhuo, Zhiyong Cheng, Zan Gao, Hehe Fan, Mohan Kankanhalli
Experience Replay (ER) is a simple and effective rehearsal-based strategy, which optimizes the model with current training data and a subset of old samples stored in a memory buffer.
no code implementations • 13 Jan 2023 • Guangzhi Wang, Hehe Fan, Mohan Kankanhalli
To overcome these two challenges, we propose a unified Relation-Enhanced Transformer (RET) to improve representation discriminability for both point cloud and natural language queries.
no code implementations • ICCV 2023 • Ming Li, Xiangyu Xu, Hehe Fan, Pan Zhou, Jun Liu, Jia-Wei Liu, Jiahe Li, Jussi Keppo, Mike Zheng Shou, Shuicheng Yan
For the first time, we introduce vision Transformers into PPAR by treating a video as a tubelet sequence, and accordingly design two complementary mechanisms, i. e., sparsification and anonymization, to remove privacy from a spatio-temporal perspective.
no code implementations • CVPR 2023 • Hehe Fan, Linchao Zhu, Yi Yang, Mohan Kankanhalli
Deep neural networks on regular 1D lists (e. g., natural languages) and irregular 3D sets (e. g., point clouds) have made tremendous achievements.
2 code implementations • 15 Sep 2022 • Yi Wang, Zhiwen Fan, Tianlong Chen, Hehe Fan, Zhangyang Wang
Vision Transformers (ViTs) have proven to be effective, in solving 2D image understanding tasks by training over large-scale image datasets; and meanwhile as a somehow separate track, in modeling the 3D visual world too such as voxels or point clouds.
no code implementations • 5 Sep 2022 • Xiaoyu Feng, Heming Du, Yueqi Duan, Yongpan Liu, Hehe Fan
Effectively preserving and encoding structure features from objects in irregular and sparse LiDAR points is a key challenge to 3D object detection on point cloud.
1 code implementation • ICLR 2021 • Hehe Fan, Xin Yu, Yuhang Ding, Yi Yang, Mohan Kankanhalli
Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension.
Ranked #3 on 3D Action Recognition on NTU RGB+D
no code implementations • CVPR 2022 • Hehe Fan, Xiaojun Chang, Wanyue Zhang, Yi Cheng, Ying Sun, Mohan Kankanhalli
In this paper, we propose an unsupervised domain adaptation method for deep point cloud representation learning.
1 code implementation • CVPR 2021 • Hehe Fan, Yi Yang, Mohan Kankanhalli
To capture the dynamics in point cloud videos, point tracking is usually employed.
Ranked #4 on 3D Action Recognition on NTU RGB+D
2 code implementations • 18 Oct 2019 • Hehe Fan, Yi Yang
We apply PointRNN, PointGRU and PointLSTM to moving point cloud prediction, which aims to predict the future trajectories of points in a set given their history movements.
1 code implementation • 6 Aug 2019 • Qianyu Feng, Yu Wu, Hehe Fan, Chenggang Yan, Yi Yang
By this novel cascaded captioning-revising mechanism, CRN can accurately describe images with unseen objects.
1 code implementation • ICCV 2019 • Qianyu Feng, Guoliang Kang, Hehe Fan, Yi Yang
In this paper, we exploit the semantic structure of open set data from two aspects: 1) Semantic Categorical Alignment, which aims to achieve good separability of target known classes by categorically aligning the centroid of target with the source.
1 code implementation • 9 Jul 2019 • Yuhang Ding, Hehe Fan, Mingliang Xu, Yi Yang
However, a problem of the adaptive selection is that, when an image has too many neighborhoods, it is more likely to attract other images as its neighborhoods.
no code implementations • 20 Apr 2019 • Hehe Fan, Linchao Zhu, Yi Yang
Predicting future frames in videos has become a promising direction of research for both computer vision and robot learning communities.
no code implementations • ICCV 2017 • Hehe Fan, Xiaojun Chang, De Cheng, Yi Yang, Dong Xu, Alexander G. Hauptmann
relevant) to the given event class, we formulate this task as a multi-instance learning (MIL) problem by taking each video as a bag and the video shots in each video as instances.
1 code implementation • 30 May 2017 • Hehe Fan, Liang Zheng, Yi Yang
Progressively, pedestrian clustering and the CNN model are improved simultaneously until algorithm convergence.
Ranked #12 on Unsupervised Person Re-Identification on DukeMTMC-reID