Search Results for author: Jinrong Yang

Found 23 papers, 9 papers with code

Merlin:Empowering Multimodal LLMs with Foresight Minds

no code implementations30 Nov 2023 En Yu, Liang Zhao, Yana Wei, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao

Then, FIT requires MLLMs to first predict trajectories of related objects and then reason about potential future events based on them.

Visual Question Answering

DreamLLM: Synergistic Multimodal Comprehension and Creation

1 code implementation20 Sep 2023 Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, HongYu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi

This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation.

 Ranked #1 on Visual Question Answering on MMBench (GPT-3.5 score metric)

multimodal generation Visual Question Answering +2

GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping

no code implementations18 Jul 2023 Zhuoling Li, Chunrui Han, Zheng Ge, Jinrong Yang, En Yu, Haoqian Wang, Hengshuang Zhao, Xiangyu Zhang

Besides, GroupLane with ResNet18 still surpasses PersFormer by 4. 9% F1 score, while the inference speed is nearly 7x faster and the FLOPs is only 13. 3% of it.

3D Lane Detection

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

no code implementations18 Jul 2023 Liang Zhao, En Yu, Zheng Ge, Jinrong Yang, Haoran Wei, HongYu Zhou, Jianjian Sun, Yuang Peng, Runpei Dong, Chunrui Han, Xiangyu Zhang

Based on precise referring instruction, we propose ChatSpot, a unified end-to-end multimodal large language model that supports diverse forms of interactivity including mouse clicks, drag-and-drop, and drawing boxes, which provides a more flexible and seamless interactive experience.

Instruction Following Language Modelling +1

GMM: Delving into Gradient Aware and Model Perceive Depth Mining for Monocular 3D Detection

no code implementations30 Jun 2023 Weixin Mao, Jinrong Yang, Zheng Ge, Lin Song, HongYu Zhou, Tiezheng Mao, Zeming Li, Osamu Yoshie

In light of the success of sample mining techniques in 2D object detection, we propose a simple yet effective mining strategy for improving depth perception in 3D object detection.

3D Object Detection Depth Estimation +3

BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo

no code implementations9 Apr 2023 Yinhao Li, Jinrong Yang, Jianjian Sun, Han Bao, Zheng Ge, Li Xiao

Bounded by the inherent ambiguity of depth perception, contemporary multi-view 3D object detection methods fall into the performance bottleneck.

3D Object Detection Depth Estimation +2

Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception

no code implementations10 Mar 2023 Chunrui Han, Jinrong Yang, Jianjian Sun, Zheng Ge, Runpei Dong, HongYu Zhou, Weixin Mao, Yuang Peng, Xiangyu Zhang

In this paper, we explore an embarrassingly simple long-term recurrent fusion strategy built upon the LSS-based methods and find it already able to enjoy the merits from both sides, i. e., rich long-term information and efficient fusion pipeline.

motion prediction object-detection +1

Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation

no code implementations3 Dec 2022 En Yu, Songtao Liu, Zhuoling Li, Jinrong Yang, Zeming Li, Shoudong Han, Wenbing Tao

VLM joints the information in the generated visual prompts and the textual prompts from a pre-defined Trackbook to obtain instance-level pseudo textual description, which is domain invariant to different tracking scenes.

Domain Generalization Multi-Object Tracking +1

Towards 3D Object Detection with 2D Supervision

no code implementations15 Nov 2022 Jinrong Yang, Tiancai Wang, Zheng Ge, Weixin Mao, Xiaoping Li, Xiangyu Zhang

We propose a temporal 2D transformation to bridge the 3D predictions with temporal 2D labels.

3D Object Detection Object +1

BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo

3 code implementations21 Sep 2022 Yinhao Li, Han Bao, Zheng Ge, Jinrong Yang, Jianjian Sun, Zeming Li

To this end, we introduce an effective temporal stereo method to dynamically select the scale of matching candidates, enable to significantly reduce computation overhead.

3D Object Detection Depth Estimation +1

Quality Matters: Embracing Quality Clues for Robust 3D Multi-Object Tracking

no code implementations23 Aug 2022 Jinrong Yang, En Yu, Zeming Li, Xiaoping Li, Wenbing Tao

Recent advanced works generally employ a series of object attributes, e. g., position, size, velocity, and appearance, to provide the clues for the association in 3D MOT.

3D Multi-Object Tracking 3D Object Detection +2

DBQ-SSD: Dynamic Ball Query for Efficient 3D Object Detection

1 code implementation22 Jul 2022 Jinrong Yang, Lin Song, Songtao Liu, Weixin Mao, Zeming Li, Xiaoping Li, Hongbin Sun, Jian Sun, Nanning Zheng

Many point-based 3D detectors adopt point-feature sampling strategies to drop some points for efficient inference.

3D Object Detection object-detection

StreamYOLO: Real-time Object Detection for Streaming Perception

no code implementations21 Jul 2022 Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Jian Sun

In this paper, we explore the performance of real time models on this metric and endow the models with the capacity of predicting the future, significantly improving the results for streaming perception.

Autonomous Driving Object +2

BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection

2 code implementations21 Jun 2022 Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, Zeming Li

In this research, we propose a new 3D object detector with a trustworthy depth estimation, dubbed BEVDepth, for camera-based Bird's-Eye-View (BEV) 3D object detection.

3D Object Detection Depth Estimation +1

Real-time Object Detection for Streaming Perception

1 code implementation CVPR 2022 Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Jian Sun

In this paper, instead of searching trade-offs between accuracy and speed like previous works, we point out that endowing real-time models with the ability to predict the future is the key to dealing with this problem.

 Ranked #1 on Real-Time Object Detection on Argoverse-HD (Full-Stack, Val) (sAP metric, using extra training data)

Autonomous Driving Object +2

Gaussian Guided IoU: A Better Metric for Balanced Learning on Object Detection

no code implementations25 Mar 2021 Shengkai Wu, Jinrong Yang, Hangcheng Yu, Lijun Gou, Xiaoping Li

This results in two problems: (1) only one anchor is assigned to most of the slender objects which leads to insufficient supervision information for the slender objects during training and the performance on the slender objects is hurt; (2) IoU can not accurately represent the alignment degree between the receptive field of the feature at the anchor's center and the object.

object-detection Object Detection

Carton dataset synthesis method for domain shift based on foreground texture decoupling and replacement

1 code implementation19 Mar 2021 Lijun Gou, Shengkai Wu, Jinrong Yang, Hangcheng Yu, Chenxi Lin, Xiaoping Li, Chao Deng

To solve this problem, a novel image synthesis method is proposed to replace the foreground texture of the source datasets with the texture of the target datasets.

Image Generation object-detection +1

SCD: A Stacked Carton Dataset for Detection and Segmentation

1 code implementation25 Feb 2021 Jinrong Yang, Shengkai Wu, Lijun Gou, Hangcheng Yu, Chenxi Lin, Jiazhuo Wang, Minxuan Li, Xiaoping Li

In this paper, we present a large-scale carton dataset named Stacked Carton Dataset(SCD) with the goal of advancing the state-of-the-art in carton detection.

Instance Segmentation Semantic Segmentation

IoU-balanced Loss Functions for Single-stage Object Detection

no code implementations15 Aug 2019 Shengkai Wu, Jinrong Yang, Xinggang Wang, Xiaoping Li

The IoU-balanced localization loss decreases the gradient of examples with low IoU and increases the gradient of examples with high IoU, which can improve the localization accuracy of models.

Classification General Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.