3 code implementations • 18 Mar 2024 • Ruyi Xu, Yuan YAO, Zonghao Guo, Junbo Cui, Zanlin Ni, Chunjiang Ge, Tat-Seng Chua, Zhiyuan Liu, Maosong Sun, Gao Huang
To address the challenges, we present LLaVA-UHD, a large multimodal model that can efficiently perceive images in any aspect ratio and high resolution.
1 code implementation • 31 Jan 2024 • Yuzhong Zhao, Yue Liu, Zonghao Guo, Weijia Wu, Chen Gong, Fang Wan, Qixiang Ye
The multimodal model is constrained to generate captions within a few sub-spaces containing the control words, which increases the opportunity of hitting less frequent captions, alleviating the caption degeneration issue.
Ranked #1 on Dense Captioning on Visual Genome
no code implementations • CVPR 2023 • Mingxiang Liao, Zonghao Guo, Yuze Wang, Peng Yuan, Bailan Feng, Fang Wan
Pointly supervised instance segmentation (PSIS) learns to segment objects using a single point within the object extent as supervision.
no code implementations • 13 Aug 2022 • Yongqiang Mao, Zonghao Guo, Xiaonan Lu, Zhiqiang Yuan, Haowen Guo
With prototype-to-point globalization (Pr2PoG), the global perception is embedded to local point features based on similarity weights from sparse prototypes to dense point features.
3 code implementations • ICCV 2023 • Feng Liu, Xiaosong Zhang, Zhiliang Peng, Zonghao Guo, Fang Wan, Xiangyang Ji, Qixiang Ye
Except for the backbone networks, however, other components such as the detector head and the feature pyramid network (FPN) remain trained from scratch, which hinders fully tapping the potential of representation models.
Ranked #3 on Few-Shot Object Detection on MS-COCO (30-shot)
no code implementations • 11 Apr 2022 • Yongqiang Mao, Xian Sun, Kaiqiang Chen, Wenhui Diao, Zonghao Guo, Xiaonan Lu, Kun fu
Due to the unicity of receptive field, semantic segmentation of point clouds remains challenging for the expression of multi-receptive field features, which brings about the misclassification of instances with similar spatial structures.
1 code implementation • 6 Oct 2021 • Zhiliang Peng, Wei Huang, Zonghao Guo, Xiaosong Zhang, Jianbin Jiao, Qixiang Ye
We propose to jointly optimize empirical risks of the unbalanced and balanced domains and approximate their domain divergence by intra-class and inter-class distances, with the aim to adapt models trained on the long-tailed distribution to general distributions in an interpretable way.
2 code implementations • CVPR 2021 • Zonghao Guo, Chang Liu, Xiaosong Zhang, Jianbin Jiao, Xiangyang Ji, Qixiang Ye
Detecting oriented and densely packed objects remains challenging for spatial feature aliasing caused by the intersection of reception fields between objects.
Ranked #34 on Object Detection In Aerial Images on DOTA (using extra training data)