Search Results for author: Jinrong Yang

Found 23 papers, 9 papers with code

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

1 code implementation • 11 Dec 2023 • Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, Jinrong Yang, Jianjian Sun, Chunrui Han, Xiangyu Zhang

Accordingly, we propose Vary, an efficient and effective method to scale up the vision vocabulary of LVLMs.

Ranked #50 on Visual Question Answering on MM-Vet

Optical Character Recognition (OCR) Visual Question Answering

1,539

Paper
Code

Merlin:Empowering Multimodal LLMs with Foresight Minds

no code implementations • 30 Nov 2023 • En Yu, Liang Zhao, Yana Wei, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao

Then, FIT requires MLLMs to first predict trajectories of related objects and then reason about potential future events based on them.

Ranked #59 on Visual Question Answering on MM-Vet

Visual Question Answering

Paper
Add Code

DreamLLM: Synergistic Multimodal Comprehension and Creation

1 code implementation • 20 Sep 2023 • Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, HongYu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi

This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation.

Ranked #1 on Visual Question Answering on MMBench (GPT-3.5 score metric)

multimodal generation Visual Question Answering +2

300

Paper
Code

GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping

no code implementations • 18 Jul 2023 • Zhuoling Li, Chunrui Han, Zheng Ge, Jinrong Yang, En Yu, Haoqian Wang, Hengshuang Zhao, Xiangyu Zhang

Besides, GroupLane with ResNet18 still surpasses PersFormer by 4. 9% F1 score, while the inference speed is nearly 7x faster and the FLOPs is only 13. 3% of it.

3D Lane Detection

Paper
Add Code

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

no code implementations • 18 Jul 2023 • Liang Zhao, En Yu, Zheng Ge, Jinrong Yang, Haoran Wei, HongYu Zhou, Jianjian Sun, Yuang Peng, Runpei Dong, Chunrui Han, Xiangyu Zhang

Based on precise referring instruction, we propose ChatSpot, a unified end-to-end multimodal large language model that supports diverse forms of interactivity including mouse clicks, drag-and-drop, and drawing boxes, which provides a more flexible and seamless interactive experience.

Instruction Following Language Modelling +1

Paper
Add Code

GMM: Delving into Gradient Aware and Model Perceive Depth Mining for Monocular 3D Detection

no code implementations • 30 Jun 2023 • Weixin Mao, Jinrong Yang, Zheng Ge, Lin Song, HongYu Zhou, Tiezheng Mao, Zeming Li, Osamu Yoshie

In light of the success of sample mining techniques in 2D object detection, we propose a simple yet effective mining strategy for improving depth perception in 3D object detection.

3D Object Detection Depth Estimation +3

Paper
Add Code

BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo

no code implementations • 9 Apr 2023 • Yinhao Li, Jinrong Yang, Jianjian Sun, Han Bao, Zheng Ge, Li Xiao

Bounded by the inherent ambiguity of depth perception, contemporary multi-view 3D object detection methods fall into the performance bottleneck.

3D Object Detection Depth Estimation +2

Paper
Add Code

Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception

no code implementations • 10 Mar 2023 • Chunrui Han, Jinrong Yang, Jianjian Sun, Zheng Ge, Runpei Dong, HongYu Zhou, Weixin Mao, Yuang Peng, Xiangyu Zhang

In this paper, we explore an embarrassingly simple long-term recurrent fusion strategy built upon the LSS-based methods and find it already able to enjoy the merits from both sides, i. e., rich long-term information and efficient fusion pipeline.

motion prediction object-detection +1

Paper
Add Code

Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation

no code implementations • 3 Dec 2022 • En Yu, Songtao Liu, Zhuoling Li, Jinrong Yang, Zeming Li, Shoudong Han, Wenbing Tao

VLM joints the information in the generated visual prompts and the textual prompts from a pre-defined Trackbook to obtain instance-level pseudo textual description, which is domain invariant to different tracking scenes.

Domain Generalization Multi-Object Tracking +1

Paper
Add Code

Towards 3D Object Detection with 2D Supervision

no code implementations • 15 Nov 2022 • Jinrong Yang, Tiancai Wang, Zheng Ge, Weixin Mao, Xiaoping Li, Xiangyu Zhang

We propose a temporal 2D transformation to bridge the 3D predictions with temporal 2D labels.

3D Object Detection Object +1

Paper
Add Code

BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stereo

3 code implementations • 21 Sep 2022 • Yinhao Li, Han Bao, Zheng Ge, Jinrong Yang, Jianjian Sun, Zeming Li

To this end, we introduce an effective temporal stereo method to dynamically select the scale of matching candidates, enable to significantly reduce computation overhead.

Ranked #11 on 3D Object Detection on nuScenes Camera Only

3D Object Detection Depth Estimation +1

655

Paper
Code

Implicit and Efficient Point Cloud Completion for 3D Single Object Tracking

no code implementations • 1 Sep 2022 • Pan Wang, Liangliang Ren, Shengkai Wu, Jinrong Yang, En Yu, Hangcheng Yu, Xiaoping Li

The point cloud based 3D single object tracking has drawn increasing attention.

3D Single Object Tracking Object Tracking +2

Paper
Add Code

Quality Matters: Embracing Quality Clues for Robust 3D Multi-Object Tracking

no code implementations • 23 Aug 2022 • Jinrong Yang, En Yu, Zeming Li, Xiaoping Li, Wenbing Tao

Recent advanced works generally employ a series of object attributes, e. g., position, size, velocity, and appearance, to provide the clues for the association in 3D MOT.

3D Multi-Object Tracking 3D Object Detection +2

Paper
Add Code

DBQ-SSD: Dynamic Ball Query for Efficient 3D Object Detection

1 code implementation • 22 Jul 2022 • Jinrong Yang, Lin Song, Songtao Liu, Weixin Mao, Zeming Li, Xiaoping Li, Hongbin Sun, Jian Sun, Nanning Zheng

Many point-based 3D detectors adopt point-feature sampling strategies to drop some points for efficient inference.

3D Object Detection object-detection

Paper
Code

StreamYOLO: Real-time Object Detection for Streaming Perception

no code implementations • 21 Jul 2022 • Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Jian Sun

In this paper, we explore the performance of real time models on this metric and endow the models with the capacity of predicting the future, significantly improving the results for streaming perception.

Autonomous Driving Object +2

Paper
Add Code

BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection

2 code implementations • 21 Jun 2022 • Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, Zeming Li

In this research, we propose a new 3D object detector with a trustworthy depth estimation, dubbed BEVDepth, for camera-based Bird's-Eye-View (BEV) 3D object detection.

Ranked #4 on 3D Object Detection on Rope3D

3D Object Detection Depth Estimation +1

655

Paper
Code

A Semantic Consistency Feature Alignment Object Detection Model Based on Mixed-Class Distribution Metrics

no code implementations • 12 Jun 2022 • Lijun Gou, Jinrong Yang, Hangcheng Yu, Pan Wang, Xiaoping Li, Chao Deng

Then, a Semantic Consistency Feature Alignment Model (SCFAM) based on mixed-classes $H-divergence$ was also presented.

Instance Segmentation Object +4

Paper
Add Code

Real-time Object Detection for Streaming Perception

1 code implementation • CVPR 2022 • Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Jian Sun

In this paper, instead of searching trade-offs between accuracy and speed like previous works, we point out that endowing real-time models with the ability to predict the future is the key to dealing with this problem.

Ranked #1 on Real-Time Object Detection on Argoverse-HD (Full-Stack, Val) (sAP metric, using extra training data)

Autonomous Driving Object +2

297

Paper
Code

Rectifying the Shortcut Learning of Background for Few-Shot Learning

1 code implementation • NeurIPS 2021 • Xu Luo, Longhui Wei, Liangjian Wen, Jinrong Yang, Lingxi Xie, Zenglin Xu, Qi Tian

The category gap between training and evaluation has been characterised as one of the main obstacles to the success of Few-Shot Learning (FSL).

Ranked #20 on Few-Shot Image Classification on Mini-Imagenet 5-way (5-shot)

Few-Shot Image Classification Few-Shot Learning

101

Paper
Code

Gaussian Guided IoU: A Better Metric for Balanced Learning on Object Detection

no code implementations • 25 Mar 2021 • Shengkai Wu, Jinrong Yang, Hangcheng Yu, Lijun Gou, Xiaoping Li

This results in two problems: (1) only one anchor is assigned to most of the slender objects which leads to insufficient supervision information for the slender objects during training and the performance on the slender objects is hurt; (2) IoU can not accurately represent the alignment degree between the receptive field of the feature at the anchor's center and the object.

object-detection Object Detection

Paper
Add Code

Carton dataset synthesis method for domain shift based on foreground texture decoupling and replacement

1 code implementation • 19 Mar 2021 • Lijun Gou, Shengkai Wu, Jinrong Yang, Hangcheng Yu, Chenxi Lin, Xiaoping Li, Chao Deng

To solve this problem, a novel image synthesis method is proposed to replace the foreground texture of the source datasets with the texture of the target datasets.

Image Generation object-detection +1

Paper
Code

SCD: A Stacked Carton Dataset for Detection and Segmentation

1 code implementation • 25 Feb 2021 • Jinrong Yang, Shengkai Wu, Lijun Gou, Hangcheng Yu, Chenxi Lin, Jiazhuo Wang, Minxuan Li, Xiaoping Li

In this paper, we present a large-scale carton dataset named Stacked Carton Dataset(SCD) with the goal of advancing the state-of-the-art in carton detection.

Instance Segmentation Semantic Segmentation

Paper
Code

IoU-balanced Loss Functions for Single-stage Object Detection

no code implementations • 15 Aug 2019 • Shengkai Wu, Jinrong Yang, Xinggang Wang, Xiaoping Li

The IoU-balanced localization loss decreases the gradient of examples with low IoU and increases the gradient of examples with high IoU, which can improve the localization accuracy of models.

Classification General Classification +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.