Search Results for author: Jiaming Han

Found 15 papers, 14 papers with code

Multimodal Long Video Modeling Based on Temporal Dynamic Context

1 code implementation14 Apr 2025 Haoran Hao, Jiaming Han, Yiyuan Zhang, Xiangyu Yue

Secondly, we propose a novel temporal context compressor to reduce the number of tokens within each segment.

Video Understanding

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

no code implementations23 Feb 2025 Yunhai Feng, Jiaming Han, Zhuoran Yang, Xiangyu Yue, Sergey Levine, Jianlan Luo

Solving complex long-horizon robotic manipulation problems requires sophisticated high-level planning capabilities, the ability to reason about the physical world, and reactively choose appropriate motor skills.

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

1 code implementation3 Dec 2024 Kaixiong Gong, Kaituo Feng, Bohao Li, Yibing Wang, Mofan Cheng, Shijia Yang, Jiaming Han, Benyou Wang, Yutong Bai, Zhuoran Yang, Xiangyu Yue

Recently, multimodal large language models (MLLMs), such as GPT-4o, Gemini 1. 5 Pro, and Reka Core, have expanded their capabilities to include vision and audio modalities.

Multiple-choice

RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models

1 code implementation17 Oct 2024 Haoran Hao, Jiaming Han, Changsheng Li, Yu-Feng Li, Xiangyu Yue

To further improve generation quality and alignment with user-specific information, we design a pipeline for data collection and create a specialized dataset for personalized training of MLLMs.

Image Captioning Question Answering +1

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

3 code implementations28 Apr 2023 Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao

This strategy effectively alleviates the interference between the two tasks of image-text alignment and instruction following and achieves strong multi-modal reasoning with only a small-scale image-text and instruction dataset.

Instruction Following model +8

Few-Shot Object Detection via Variational Feature Aggregation

1 code implementation31 Jan 2023 Jiaming Han, Yuqiang Ren, Jian Ding, Ke Yan, Gui-Song Xia

As few-shot object detectors are often trained with abundant base samples and fine-tuned on few-shot novel examples, the learned models are usually biased to base classes and sensitive to the variance of novel examples.

Few-Shot Object Detection Meta-Learning +3

ReDet: A Rotation-equivariant Detector for Aerial Object Detection

4 code implementations CVPR 2021 Jiaming Han, Jian Ding, Nan Xue, Gui-Song Xia

More precisely, we incorporate rotation-equivariant networks into the detector to extract rotation-equivariant features, which can accurately predict the orientation and lead to a huge reduction of model size.

Ranked #21 on Object Detection In Aerial Images on DOTA (using extra training data)

Object object-detection +1

Align Deep Features for Oriented Object Detection

3 code implementations21 Aug 2020 Jiaming Han, Jian Ding, Jie Li, Gui-Song Xia

However most of existing methods rely on heuristically defined anchors with different scales, angles and aspect ratios and usually suffer from severe misalignment between anchor boxes and axis-aligned convolutional features, which leads to the common inconsistency between the classification score and localization accuracy.

Ranked #26 on Object Detection In Aerial Images on DOTA (using extra training data)

Object object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.