1 code implementation • 19 Apr 2024 • Chuofan Ma, Yi Jiang, Jiannan Wu, Zehuan Yuan, Xiaojuan Qi
We introduce Groma, a Multimodal Large Language Model (MLLM) with grounded and fine-grained visual perception ability.
1 code implementation • 2 Nov 2023 • Haosen Yang, Chuofan Ma, Bin Wen, Yi Jiang, Zehuan Yuan, Xiatian Zhu
Understanding the semantics of individual regions or patches within unconstrained images, such as in open-world object detection, represents a critical yet challenging task in computer vision.
1 code implementation • NeurIPS 2023 • Chuofan Ma, Yi Jiang, Xin Wen, Zehuan Yuan, Xiaojuan Qi
CoDet then leverages visual similarities to discover the co-occurring objects and align them with the shared concept.
Ranked #2 on Open Vocabulary Object Detection on LVIS v1.0 (using extra training data)
1 code implementation • ICCV 2023 • Qiushan Guo, Chuofan Ma, Yi Jiang, Zehuan Yuan, Yizhou Yu, Ping Luo
Learning image classification and image generation using the same set of network parameters is a challenging problem.
1 code implementation • 26 Sep 2022 • Chuofan Ma, Qiushan Guo, Yi Jiang, Zehuan Yuan, Ping Luo, Xiaojuan Qi
Our key finding is that the major cause of degradation is not information loss in the down-sampling process, but rather the mismatch between network architecture and input scale.