no code implementations • 2 Nov 2024 • Bin Lei, Yuchen Li, Yiming Zeng, Tao Ren, Yi Luo, Tianyu Shi, Zitian Gao, Zeyu Hu, Weitai Kang, Qiuwu Chen
Despite the impressive capabilities of large language models (LLMs), they currently exhibit two primary limitations, \textbf{\uppercase\expandafter{\romannumeral 1}}: They struggle to \textbf{autonomously solve the real world engineering problem}.
1 code implementation • 30 Sep 2024 • Weitai Kang, Haifeng Huang, Yuzhang Shang, Mubarak Shah, Yan Yan
RIG generates two key instruction data: 1) the Adversarial Instruction-following data, which features mixed negative and positive samples to enhance the model's discriminative understanding.
no code implementations • 19 Sep 2024 • Yuzhang Shang, Bingxin Xu, Weitai Kang, Mu Cai, Yuheng Li, Zehao Wen, Zhen Dong, Kurt Keutzer, Yong Jae Lee, Yan Yan
In this paper, we first identify the primary challenges in interpolating Video-LLMs: (1) the video encoder and modality alignment projector are fixed, preventing the integration of additional frames into Video-LLMs, and (2) the LLM backbone is limited in its content length capabilities, which complicates the processing of an increased number of video tokens.
1 code implementation • 3 Jul 2024 • Weitai Kang, Gaowen Liu, Mubarak Shah, Yan Yan
Specifically, we propose the Multi-layer Multi-task Encoder-Decoder as the target grounding stage, where we learn a regression query and multiple segmentation queries to ground the target by regression and segmentation of the box in each decoding layer, respectively.
no code implementations • 3 Jul 2024 • Weitai Kang, Luowei Zhou, Junyi Wu, Changchang Sun, Yan Yan
Building upon this, we further propose a novel framework named Attention-Driven Constraint Balancing (AttBalance) to optimize the behavior of visual features within language-relevant regions.
no code implementations • 3 Jul 2024 • Weitai Kang, Mengxue Qu, Yunchao Wei, Yan Yan
Building upon this, ACTRESS consists of an active sampling strategy and a selective retraining strategy.
no code implementations • 28 May 2024 • Weitai Kang, Mengxue Qu, Jyoti Kini, Yunchao Wei, Mubarak Shah, Yan Yan
To achieve detection based on human intention, it relies on humans to observe the scene, reason out the target that aligns with their intention ("pillow" in this case), and finally provide a reference to the AI system, such as "A pillow on the couch".
no code implementations • CVPR 2024 • Junyi Wu, Weitai Kang, Hao Tang, Yuan Hong, Yan Yan
In contrast, our proposed SaCo offers a reliable faithfulness measurement, establishing a robust metric for interpretations.
no code implementations • CVPR 2024 • Junyi Wu, Bin Duan, Weitai Kang, Hao Tang, Yan Yan
To incorporate the influence of token transformation into interpretation, we propose TokenTM, a novel post-hoc explanation method that utilizes our introduced measurement of token transformation effects.