Search Results for author: Chuofan Ma

Found 5 papers, 5 papers with code

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

1 code implementation19 Apr 2024 Chuofan Ma, Yi Jiang, Jiannan Wu, Zehuan Yuan, Xiaojuan Qi

We introduce Groma, a Multimodal Large Language Model (MLLM) with grounded and fine-grained visual perception ability.

Language Modelling Large Language Model +2

Recognize Any Regions

1 code implementation2 Nov 2023 Haosen Yang, Chuofan Ma, Bin Wen, Yi Jiang, Zehuan Yuan, Xiatian Zhu

Understanding the semantics of individual regions or patches within unconstrained images, such as in open-world object detection, represents a critical yet challenging task in computer vision.

object-detection Object Recognition +1

Rethinking Resolution in the Context of Efficient Video Recognition

1 code implementation26 Sep 2022 Chuofan Ma, Qiushan Guo, Yi Jiang, Zehuan Yuan, Ping Luo, Xiaojuan Qi

Our key finding is that the major cause of degradation is not information loss in the down-sampling process, but rather the mismatch between network architecture and input scale.

Knowledge Distillation Video Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.