no code implementations • 28 May 2024 • Xiaohao Xu, Tianyi Zhang, Jinrong Yang, Matthew Johnson-Roberson, Xiaonan Huang
The original multi-modal signals serve as reconstruction targets for the rendered feature maps, facilitating self-supervised representation learning.
1 code implementation • 11 Dec 2023 • Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, Jinrong Yang, Jianjian Sun, Chunrui Han, Xiangyu Zhang
Accordingly, we propose Vary, an efficient and effective method to scale up the vision vocabulary of LVLMs.
Ranked #77 on Visual Question Answering on MM-Vet
no code implementations • 30 Nov 2023 • En Yu, Liang Zhao, Yana Wei, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao
Then, FIT requires MLLMs to first predict trajectories of related objects and then reason about potential future events based on them.
Ranked #92 on Visual Question Answering on MM-Vet
1 code implementation • 20 Sep 2023 • Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, HongYu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi
This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation.
Ranked #5 on Visual Question Answering on MMBench
1 code implementation • 22 Jul 2023 • Jing Hao, Moyun Liu, Jinrong Yang, Kuo Feng Hung
Comprehensive experiments are conducted on the large-scale glass segmentation dataset GSD-S. Our GEM establishes a new state-of-the-art performance with the help of these two VFMs, surpassing the best-reported method GlassSemNet with an IoU improvement of 2. 1%.
no code implementations • 18 Jul 2023 • Zhuoling Li, Chunrui Han, Zheng Ge, Jinrong Yang, En Yu, Haoqian Wang, Hengshuang Zhao, Xiangyu Zhang
Besides, GroupLane with ResNet18 still surpasses PersFormer by 4. 9% F1 score, while the inference speed is nearly 7x faster and the FLOPs is only 13. 3% of it.
no code implementations • 18 Jul 2023 • Liang Zhao, En Yu, Zheng Ge, Jinrong Yang, Haoran Wei, HongYu Zhou, Jianjian Sun, Yuang Peng, Runpei Dong, Chunrui Han, Xiangyu Zhang
Based on precise referring instruction, we propose ChatSpot, a unified end-to-end multimodal large language model that supports diverse forms of interactivity including mouse clicks, drag-and-drop, and drawing boxes, which provides a more flexible and seamless interactive experience.
no code implementations • 30 Jun 2023 • Weixin Mao, Jinrong Yang, Zheng Ge, Lin Song, HongYu Zhou, Tiezheng Mao, Zeming Li, Osamu Yoshie
In light of the success of sample mining techniques in 2D object detection, we propose a simple yet effective mining strategy for improving depth perception in 3D object detection.
no code implementations • 9 Apr 2023 • Yinhao Li, Jinrong Yang, Jianjian Sun, Han Bao, Zheng Ge, Li Xiao
Bounded by the inherent ambiguity of depth perception, contemporary multi-view 3D object detection methods fall into the performance bottleneck.
no code implementations • 10 Mar 2023 • Chunrui Han, Jinrong Yang, Jianjian Sun, Zheng Ge, Runpei Dong, HongYu Zhou, Weixin Mao, Yuang Peng, Xiangyu Zhang
In this paper, we explore an embarrassingly simple long-term recurrent fusion strategy built upon the LSS-based methods and find it already able to enjoy the merits from both sides, i. e., rich long-term information and efficient fusion pipeline.
no code implementations • 3 Dec 2022 • En Yu, Songtao Liu, Zhuoling Li, Jinrong Yang, Zeming Li, Shoudong Han, Wenbing Tao
VLM joints the information in the generated visual prompts and the textual prompts from a pre-defined Trackbook to obtain instance-level pseudo textual description, which is domain invariant to different tracking scenes.
no code implementations • 15 Nov 2022 • Jinrong Yang, Tiancai Wang, Zheng Ge, Weixin Mao, Xiaoping Li, Xiangyu Zhang
We propose a temporal 2D transformation to bridge the 3D predictions with temporal 2D labels.
3 code implementations • 21 Sep 2022 • Yinhao Li, Han Bao, Zheng Ge, Jinrong Yang, Jianjian Sun, Zeming Li
To this end, we introduce an effective temporal stereo method to dynamically select the scale of matching candidates, enable to significantly reduce computation overhead.
Ranked #11 on 3D Object Detection on nuScenes Camera Only
no code implementations • 1 Sep 2022 • Pan Wang, Liangliang Ren, Shengkai Wu, Jinrong Yang, En Yu, Hangcheng Yu, Xiaoping Li
The point cloud based 3D single object tracking has drawn increasing attention.
no code implementations • 23 Aug 2022 • Jinrong Yang, En Yu, Zeming Li, Xiaoping Li, Wenbing Tao
Recent advanced works generally employ a series of object attributes, e. g., position, size, velocity, and appearance, to provide the clues for the association in 3D MOT.
1 code implementation • 22 Jul 2022 • Jinrong Yang, Lin Song, Songtao Liu, Weixin Mao, Zeming Li, Xiaoping Li, Hongbin Sun, Jian Sun, Nanning Zheng
Many point-based 3D detectors adopt point-feature sampling strategies to drop some points for efficient inference.
no code implementations • 21 Jul 2022 • Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Jian Sun
In this paper, we explore the performance of real time models on this metric and endow the models with the capacity of predicting the future, significantly improving the results for streaming perception.
2 code implementations • 21 Jun 2022 • Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, Zeming Li
In this research, we propose a new 3D object detector with a trustworthy depth estimation, dubbed BEVDepth, for camera-based Bird's-Eye-View (BEV) 3D object detection.
Ranked #4 on 3D Object Detection on Rope3D
no code implementations • 12 Jun 2022 • Lijun Gou, Jinrong Yang, Hangcheng Yu, Pan Wang, Xiaoping Li, Chao Deng
Then, a Semantic Consistency Feature Alignment Model (SCFAM) based on mixed-classes $H-divergence$ was also presented.
1 code implementation • CVPR 2022 • Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Jian Sun
In this paper, instead of searching trade-offs between accuracy and speed like previous works, we point out that endowing real-time models with the ability to predict the future is the key to dealing with this problem.
Ranked #1 on Real-Time Object Detection on Argoverse-HD (Full-Stack, Val) (sAP metric, using extra training data)
1 code implementation • NeurIPS 2021 • Xu Luo, Longhui Wei, Liangjian Wen, Jinrong Yang, Lingxi Xie, Zenglin Xu, Qi Tian
The category gap between training and evaluation has been characterised as one of the main obstacles to the success of Few-Shot Learning (FSL).
no code implementations • 25 Mar 2021 • Shengkai Wu, Jinrong Yang, Hangcheng Yu, Lijun Gou, Xiaoping Li
This results in two problems: (1) only one anchor is assigned to most of the slender objects which leads to insufficient supervision information for the slender objects during training and the performance on the slender objects is hurt; (2) IoU can not accurately represent the alignment degree between the receptive field of the feature at the anchor's center and the object.
1 code implementation • 19 Mar 2021 • Lijun Gou, Shengkai Wu, Jinrong Yang, Hangcheng Yu, Chenxi Lin, Xiaoping Li, Chao Deng
To solve this problem, a novel image synthesis method is proposed to replace the foreground texture of the source datasets with the texture of the target datasets.
1 code implementation • 25 Feb 2021 • Jinrong Yang, Shengkai Wu, Lijun Gou, Hangcheng Yu, Chenxi Lin, Jiazhuo Wang, Minxuan Li, Xiaoping Li
In this paper, we present a large-scale carton dataset named Stacked Carton Dataset(SCD) with the goal of advancing the state-of-the-art in carton detection.
no code implementations • 15 Aug 2019 • Shengkai Wu, Jinrong Yang, Xinggang Wang, Xiaoping Li
The IoU-balanced localization loss decreases the gradient of examples with low IoU and increases the gradient of examples with high IoU, which can improve the localization accuracy of models.