no code implementations • 9 Jun 2025 • BoYu Chen, Siran Chen, Kunchang Li, Qinglin Xu, Yu Qiao, Yali Wang
To fill this gap, we propose a unified Super Encoding Network (SEN) for video understanding, which builds up such distinct interactions through recursive association of multi-modal encoders in the foundation models.
1 code implementation • 20 Mar 2025 • Xinlong Zhai, Chunchen Wang, Ruijia Wang, Jiazheng Kang, Shujie Li, BoYu Chen, Tengfei Ma, Zikai Zhou, Cheng Yang, Chuan Shi
Extensive experiments on 3 real-world datasets under different extents of input data scarcity and/or label scarcity demonstrate our model outperforms states of the art significantly and steadily, with a maximum improvement of 53. 53%.
no code implementations • 13 Mar 2025 • BoYu Chen, Zhengrong Yue, Siran Chen, Zikang Wang, Yang Liu, Peng Li, Yali Wang
In order to better address long video tasks, we introduce LVAgent, the first framework enabling multi-round dynamic collaboration of MLLM agents in long video understanding.
Computational Efficiency
Optical Character Recognition (OCR)
+2
1 code implementation • 18 Feb 2025 • BoYu Chen, Zirui Guo, Zidan Yang, Yuluo Chen, Junze Chen, Zhenghao Liu, Chuan Shi, Cheng Yang
Typical RAG approaches split the text database into chunks, organizing them in a flat structure for efficient searches.
no code implementations • 10 Oct 2024 • BoYu Chen, Ameenat L. Solebo, Weiye Bao, Paul Taylor
To that end, we propose an MIQA framework based on a concept from causal inference: Probability of Necessity and Sufficiency (PNS).
no code implementations • 29 Aug 2024 • BoYu Chen, Junjie Liu, Zhu Li, Mengyue Yang
We address these challenges by first conceptualizing multimodal representations as comprising modality-invariant and modality-specific components.
1 code implementation • 25 Jun 2024 • BoYu Chen, Ameenat L. Solebo, Paul Taylor
This framework consists of a zero-shot chamber segmentation module and a cell detection module.
no code implementations • 18 Jun 2024 • BoYu Chen, Peike Li, Yao Yao, Alex Wang
In this paper, we propose a novel method for customized text-to-music generation, which can capture the concept from a two-minute reference music and generate a new piece of music conforming to the concept.
no code implementations • 29 Feb 2024 • BoYu Chen, Siran Chen, Kunchang Li, Qinglin Xu, Yu Qiao, Yali Wang
Finally, we blend external multimodal knowledge in Adapt stage, by inserting multimodal knowledge adaptation modules into networks.
1 code implementation • 29 Oct 2023 • Yao Yao, Peike Li, BoYu Chen, Alex Wang
With rapid advances in generative artificial intelligence, the text-to-music synthesis task has emerged as a promising direction for music generation.
1 code implementation • 15 Aug 2023 • BoYu Chen, Hanxuan Chen, Jiao He, Fengyu Sun, Shangling Jui
We present a simple yet novel parameterized form of linear mapping to achieves remarkable network compression performance: a pseudo SVD called Ternary SVD (TSVD).
2 code implementations • 9 Aug 2023 • Peike Li, BoYu Chen, Yao Yao, Yikai Wang, Allen Wang, Alex Wang
Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization.
Ranked #5 on
Text-to-Music Generation
on MusicCaps
no code implementations • 25 Jul 2023 • Yi Yu, Wenlian Lu, BoYu Chen
We propose theoretical analyses of a modified natural gradient descent method in the neural network function space based on the eigendecompositions of neural tangent kernel and Fisher information matrix.
no code implementations • 18 Oct 2022 • Jiajun Zhang, BoYu Chen, Zhilong Ji, Jinfeng Bai, Zonghai Hu
This paper describes the approach we have taken in the challenge.
no code implementations • 28 Sep 2022 • BoYu Chen, Yu Qiao, Yali Wang
Second, these activities are naturally distributed in a long-tailed way.
no code implementations • 2 May 2022 • Xinjia Li, BoYu Chen, Wenlian Lu
The FedDKD introduces a module of decentralized knowledge distillation (DKD) to distill the knowledge of the local models to train the global model by approaching the neural network map average based on the metric of divergence defined in the loss function, other than only averaging parameters as done in literature.
1 code implementation • 10 Mar 2022 • BoYu Chen, Peixia Li, Lei Bai, Lei Qiao, Qiuhong Shen, Bo Li, Weihao Gan, Wei Wu, Wanli Ouyang
Exploiting a general-purpose neural architecture to replace hand-wired designs or inductive biases has recently drawn extensive interest.
no code implementations • 23 Nov 2021 • Ru Peng, Nankai Lin, Yi Fang, Shengyi Jiang, Tianyong Hao, BoYu Chen, Junbo Zhao
However, succeeding researches pointed out that limited by the uncontrolled nature of attention computation, the NMT model requires an external syntax to capture the deep syntactic awareness.
1 code implementation • ICCV 2021 • BoYu Chen, Peixia Li, Baopu Li, Chen Lin, Chuming Li, Ming Sun, Junjie Yan, Wanli Ouyang
We present BN-NAS, neural architecture search with Batch Normalization (BN-NAS), to accelerate neural architecture search (NAS).
no code implementations • 7 Aug 2021 • BoYu Chen, Peixia Li, Baopu Li, Chuming Li, Lei Bai, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang
Then, a compact set of the possible combinations for different token pooling and attention sharing mechanisms are constructed.
2 code implementations • ICCV 2021 • BoYu Chen, Peixia Li, Chuming Li, Baopu Li, Lei Bai, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang
We introduce the first Neural Architecture Search (NAS) method to find a better transformer architecture for image recognition.
Ranked #548 on
Image Classification
on ImageNet
1 code implementation • 5 Jul 2021 • Xin Cai, BoYu Chen, Jiabei Zeng, Jiajun Zhang, Yunjia Sun, Xiao Wang, Zhilong Ji, Xiao Liu, Xilin Chen, Shiguang Shan
This paper presents a method for gaze estimation according to face images.
1 code implementation • ICCV 2021 • Yuanzheng Ci, Chen Lin, Ming Sun, BoYu Chen, Hongwen Zhang, Wanli Ouyang
The automation of neural architecture design has been a coveted alternative to human experts.
no code implementations • ECCV 2018 • Boyu Chen, Dong Wang, Peixia Li, Shuang Wang, Huchuan Lu
In this work, we propose a novel tracking algorithm with real-time performance based on the âActor-Criticâ framework.
1 code implementation • 22 May 2018 • Boyu Chen, Wenlian Lu, Ernest Fokoue
Meta-learning is a promising method to achieve efficient training method towards deep neural net and has been attracting increases interests in recent years.