Search Results for author: Weitai Kang

Found 9 papers, 2 papers with code

Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage

no code implementations2 Nov 2024 Bin Lei, Yuchen Li, Yiming Zeng, Tao Ren, Yi Luo, Tianyu Shi, Zitian Gao, Zeyu Hu, Weitai Kang, Qiuwu Chen

Despite the impressive capabilities of large language models (LLMs), they currently exhibit two primary limitations, \textbf{\uppercase\expandafter{\romannumeral 1}}: They struggle to \textbf{autonomously solve the real world engineering problem}.

Management Retrieval

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

1 code implementation30 Sep 2024 Weitai Kang, Haifeng Huang, Yuzhang Shang, Mubarak Shah, Yan Yan

RIG generates two key instruction data: 1) the Adversarial Instruction-following data, which features mixed negative and positive samples to enhance the model's discriminative understanding.

Instruction Following Language Modelling +1

Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner

no code implementations19 Sep 2024 Yuzhang Shang, Bingxin Xu, Weitai Kang, Mu Cai, Yuheng Li, Zehao Wen, Zhen Dong, Kurt Keutzer, Yong Jae Lee, Yan Yan

In this paper, we first identify the primary challenges in interpolating Video-LLMs: (1) the video encoder and modality alignment projector are fixed, preventing the integration of additional frames into Video-LLMs, and (2) the LLM backbone is limited in its content length capabilities, which complicates the processing of an increased number of video tokens.

SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding

1 code implementation3 Jul 2024 Weitai Kang, Gaowen Liu, Mubarak Shah, Yan Yan

Specifically, we propose the Multi-layer Multi-task Encoder-Decoder as the target grounding stage, where we learn a regression query and multiple segmentation queries to ground the target by regression and segmentation of the box in each decoding layer, respectively.

object-detection Object Detection +3

Visual Grounding with Attention-Driven Constraint Balancing

no code implementations3 Jul 2024 Weitai Kang, Luowei Zhou, Junyi Wu, Changchang Sun, Yan Yan

Building upon this, we further propose a novel framework named Attention-Driven Constraint Balancing (AttBalance) to optimize the behavior of visual features within language-relevant regions.

Object object-detection +2

ACTRESS: Active Retraining for Semi-supervised Visual Grounding

no code implementations3 Jul 2024 Weitai Kang, Mengxue Qu, Yunchao Wei, Yan Yan

Building upon this, ACTRESS consists of an active sampling strategy and a selective retraining strategy.

Binary Classification Visual Grounding

Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention

no code implementations28 May 2024 Weitai Kang, Mengxue Qu, Jyoti Kini, Yunchao Wei, Mubarak Shah, Yan Yan

To achieve detection based on human intention, it relies on humans to observe the scene, reason out the target that aligns with their intention ("pillow" in this case), and finally provide a reference to the AI system, such as "A pillow on the couch".

3D Object Detection 3D visual grounding +2

On the Faithfulness of Vision Transformer Explanations

no code implementations CVPR 2024 Junyi Wu, Weitai Kang, Hao Tang, Yuan Hong, Yan Yan

In contrast, our proposed SaCo offers a reliable faithfulness measurement, establishing a robust metric for interpretations.

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

no code implementations CVPR 2024 Junyi Wu, Bin Duan, Weitai Kang, Hao Tang, Yan Yan

To incorporate the influence of token transformation into interpretation, we propose TokenTM, a novel post-hoc explanation method that utilizes our introduced measurement of token transformation effects.

Cannot find the paper you are looking for? You can Submit a new open access paper.