no code implementations • 25 Jun 2019 • Jie Zhang, Bohao Li, Kexin Xiang, Xuegang Shi
Some have tried to combine expertise and signal processing to classify ECG signal by heart disease type.
1 code implementation • ECCV 2020 • Boyu Yang, Chang Liu, Bohao Li, Jianbin Jiao, Qixiang Ye
Few-shot segmentation is challenging because objects within the support and query images could significantly differ in appearance and pose.
2 code implementations • CVPR 2021 • Bohao Li, Boyu Yang, Chang Liu, Feng Liu, Rongrong Ji, Qixiang Ye
Few-shot object detection has made substantial progressby representing novel class objects using the feature representation learned upon a set of base class objects.
Ranked #14 on Few-Shot Object Detection on MS-COCO (10-shot)
no code implementations • 25 Sep 2022 • Renrui Zhang, Bohao Li, Wei zhang, Hao Dong, Hongsheng Li, Peng Gao, Yu Qiao
In this paper, we propose CoMo, a Collaboration of pre-trained Models that incorporates diverse prior knowledge from various pre-training paradigms for better few-shot learning.
1 code implementation • 15 Dec 2022 • Bohao Li, Chang Liu, Mengnan Shi, Xiaozhong Chen, Xiangyang Ji, Qixiang Ye
Adapting object detectors learned with sufficient supervision to novel classes under low data regimes is charming yet challenging.
3 code implementations • CVPR 2023 • Renrui Zhang, Xiangfei Hu, Bohao Li, Siyuan Huang, Hanqiu Deng, Hongsheng Li, Yu Qiao, Peng Gao
Our CaFo incorporates CLIP's language-contrastive knowledge, DINO's vision-contrastive knowledge, DALL-E's vision-generative knowledge, and GPT-3's language-generative knowledge.
2 code implementations • 30 Jul 2023 • Bohao Li, Rui Wang, Guangzhi Wang, Yuying Ge, Yixiao Ge, Ying Shan
Based on powerful Large Language Models (LLMs), recent generative Multimodal Large Language Models (MLLMs) have gained prominence as a pivotal research area, exhibiting remarkable capability for both comprehension and generation.
1 code implementation • 28 Nov 2023 • Bohao Li, Yuying Ge, Yixiao Ge, Guangzhi Wang, Rui Wang, Ruimao Zhang, Ying Shan
Multimodal large language models (MLLMs), building upon the foundation of powerful large language models (LLMs), have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal inputs (acting like a combination of GPT-4V and DALL-E 3).
1 code implementation • 11 Dec 2023 • Yi Chen, Yuying Ge, Yixiao Ge, Mingyu Ding, Bohao Li, Rui Wang, Ruifeng Xu, Ying Shan, Xihui Liu
Given diverse environmental inputs, including real-time task progress, visual observations, and open-form language instructions, a proficient task planner is expected to predict feasible actions, which is a feat inherently achievable by Multimodal Large Language Models (MLLMs).