no code implementations • 10 Aug 2024 • Yiying Yang, Fukun Yin, Jiayuan Fan, Xin Chen, Wanzhang Li, Gang Yu
As Artificial Intelligence Generated Content (AIGC) advances, a variety of methods have been developed to generate text, images, videos, and 3D objects from single or multimodal inputs, contributing efforts to emulate human-like cognitive content creation.
1 code implementation • 17 Jun 2024 • Mingsheng Li, Lin Zhang, Mingzhen Zhu, Zilong Huang, Gang Yu, Jiayuan Fan, Tao Chen
In this paper, for the first time, we introduce language guidance to the distillation process and propose a new method named Language-Guided Distillation (LGD) system, which uses category names of the target downstream task to help refine the knowledge transferred between the teacher and student.
no code implementations • 11 Jun 2024 • Jiamu Sheng, Jingyi Zhou, Jiong Wang, Peng Ye, Jiayuan Fan
Finally, the adaptive global-local fusion is proposed to dynamically combine global Mamba features and local convolution features for a global-local spectral-spatial representation.
no code implementations • 5 Jun 2024 • Minglei Li, Peng Ye, Yongqi Huang, Lin Zhang, Tao Chen, Tong He, Jiayuan Fan, Wanli Ouyang
Parameter-efficient fine-tuning (PEFT) has become increasingly important as foundation models continue to grow in both popularity and size.
1 code implementation • 2 Apr 2024 • Biao Jiang, Xin Chen, Chi Zhang, Fukun Yin, Zhuoyuan Li, Gang Yu, Jiayuan Fan
However, this proficiency remains largely unexplored in other multimodal generative models, particularly in human motion models.
no code implementations • 3 Mar 2024 • Zhende Song, Chenchen Wang, Jiamu Sheng, Chi Zhang, Gang Yu, Jiayuan Fan, Tao Chen
Development of multimodal models has marked a significant step forward in how machines understand videos.
1 code implementation • CVPR 2024 • Sijin Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao Chen
Recent progress in Large Multimodal Models (LMM) has opened up great possibilities for various applications in the field of human-machine interactions.
no code implementations • 21 Dec 2023 • Jingdong Zhang, Jiayuan Fan, Peng Ye, Bo Zhang, Hancheng Ye, Baopu Li, Yancheng Cai, Tao Chen
Multi-task dense prediction aims at handling multiple pixel-wise prediction tasks within a unified network simultaneously for visual scene understanding.
1 code implementation • 30 Nov 2023 • Sijin Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao Chen
However, developing LMMs that can comprehend, reason, and plan in complex and diverse 3D environments remains a challenging topic, especially considering the demand for understanding permutation-invariant point cloud 3D representations of the 3D scene.
no code implementations • 29 Nov 2023 • Fukun Yin, Xin Chen, Chi Zhang, Biao Jiang, Zibo Zhao, Jiayuan Fan, Gang Yu, Taihao Li, Tao Chen
The advent of large language models, enabling flexibility through instruction-driven approaches, has revolutionized many traditional generative tasks, but large models for 3D data, particularly in comprehensively handling 3D shapes with other modalities, are still under-explored.
no code implementations • 23 Oct 2023 • Yiying Yang, Wen Liu, Fukun Yin, Xin Chen, Gang Yu, Jiayuan Fan, Tao Chen
Recent advancements in implicit neural representations have contributed to high-fidelity surface reconstruction and photorealistic novel view synthesis.
1 code implementation • 15 Jun 2023 • Jingyi Zhou, Jiamu Sheng, Jiayuan Fan, Peng Ye, Tong He, Bin Wang, Tao Chen
To address this issue, we propose a novel diffusion-based feature learning framework that explores Multi-Timestep Multi-Stage Diffusion features for HSI classification for the first time, called MTMSD.
no code implementations • CVPR 2023 • Chong Yu, Tao Chen, Zhongxue Gan, Jiayuan Fan
Moreover, GPUSQ-ViT can boost actual deployment performance by 1. 39-1. 79 times and 3. 22-3. 43 times of latency and throughput on A100 GPU, and 1. 57-1. 69 times and 2. 11-2. 51 times improvement of latency and throughput on AGX Orin.
1 code implementation • 31 Mar 2023 • Chuangguan Ye, Hongyuan Zhu, Yongbin Liao, Yanggang Zhang, Tao Chen, Jiayuan Fan
Due to the emergence of powerful computing resources and large-scale annotated datasets, deep learning has seen wide applications in our daily life.
2 code implementations • 21 Mar 2023 • Hancheng Ye, Bo Zhang, Tao Chen, Jiayuan Fan, Bin Wang
Global channel pruning (GCP) aims to remove a subset of channels (filters) across different layers from a deep model without hurting the performance.
no code implementations • 23 Feb 2023 • Lin Zhan, Jiayuan Fan, Peng Ye, JianJian Cao
To address the above issues, we propose a multi-stage search architecture in order to overcome asymmetric spectral-spatial dimensions and capture significant features.
Hyperspectral Image Classification
Neural Architecture Search
no code implementations • 20 Feb 2023 • Jiamu Sheng, Jiayuan Fan, Peng Ye, JianJian Cao
Despite substantial progress in no-reference image quality assessment (NR-IQA), previous training models often suffer from over-fitting due to the limited scale of used datasets, resulting in model performance bottlenecks.
no code implementations • ICCV 2023 • Chongshan Lu, Fukun Yin, Xin Chen, Tao Chen, Gang Yu, Jiayuan Fan
Meanwhile, a new benchmark for several outdoor NeRF-based tasks is established, such as novel view synthesis, surface reconstruction, and multi-modal NeRF.
no code implementations • 15 Nov 2022 • Weimin Wu, Jiayuan Fan, Tao Chen, Hancheng Ye, Bo Zhang, Baopu Li
To enhance the model, adaptability between domains and reduce the computational cost when deploying the ensemble model, we propose a novel framework, namely Instance aware Model Ensemble With Distillation, IMED, which fuses multiple UDA component models adaptively according to different instances and distills these components into a small model.
no code implementations • 10 Aug 2022 • Peng Ye, Baopu Li, Tao Chen, Jiayuan Fan, Zhen Mei, Chen Lin, Chongyan Zuo, Qinghua Chi, Wanli Ouyan
In this paper, we intend to search an optimal network structure that can run in real-time for this problem.
1 code implementation • 2 Jul 2022 • Bo Zhang, Jiakang Yuan, Baopu Li, Tao Chen, Jiayuan Fan, Botian Shi
Few-shot fine-grained learning aims to classify a query image into one of a set of support categories with fine-grained differences.
1 code implementation • 3 Mar 2022 • Peng Ye, Baopu Li, Yikang Li, Tao Chen, Jiayuan Fan, Wanli Ouyang
Neural Architecture Search~(NAS) has attracted increasingly more attention in recent years because of its capability to design deep neural networks automatically.
1 code implementation • CVPR 2022 • Peng Ye, Baopu Li, Yikang Li, Tao Chen, Jiayuan Fan, Wanli Ouyang
Neural Architecture Search (NAS) has attracted increasingly more attention in recent years because of its capability to design deep neural network automatically.
1 code implementation • 30 Nov 2021 • Yongbin Liao, Hongyuan Zhu, Yanggang Zhang, Chuangguan Ye, Tao Chen, Jiayuan Fan
For stage two, the bounding box proposals with SPCR are grouped into some subsets, and the instance masks are mined inside each subset with a novel semantic propagation module and a property consistency graph module.
no code implementations • 30 Aug 2021 • Bo Zhang, Tao Chen, Bin Wang, Xiaofeng Wu, Liming Zhang, Jiayuan Fan
Unsupervised domain adaptive object detection aims to adapt a well-trained detector from its original source domain with rich labeled data to a new target domain with unlabeled data.
no code implementations • 30 Aug 2021 • Yike Wu, Bo Zhang, Gang Yu, Weixi Zhang, Bin Wang, Tao Chen, Jiayuan Fan
The goal of few-shot fine-grained image classification is to recognize rarely seen fine-grained objects in the query set, given only a few samples of this class in the support set.
no code implementations • 16 Mar 2021 • Qihang Yang, Tao Chen, Jiayuan Fan, Ye Lu, Chongyan Zuo, Qinghua Chi
Due to real-time image semantic segmentation needs on power constrained edge devices, there has been an increasing desire to design lightweight semantic segmentation neural network, to simultaneously reduce computational cost and increase inference speed.
no code implementations • 1 Sep 2020 • Jingchen Sun, Jiming Chen, Tao Chen, Jiayuan Fan, Shibo He
Vision-based dynamic pedestrian intrusion detection (PID), judging whether pedestrians intrude an area-of-interest (AoI) by a moving camera, is an important task in mobile surveillance.
1 code implementation • 7 Apr 2020 • Jingjing Chen, Jichao Zhang, Enver Sangineto, Jiayuan Fan, Tao Chen, Nicu Sebe
In this paper, we propose to alleviate these problems by means of a novel gaze redirection framework which exploits both a numerical and a pictorial direction guidance, jointly with a coarse-to-fine learning strategy.