Search Results for author: Yanwei Li

Found 24 papers, 17 papers with code

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

2 code implementations • 27 Mar 2024 • Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

Ranked #8 on Visual Question Answering on MM-Vet

Image Comprehension Visual Dialog +1

2,792

Paper
Code

RL-GPT: Integrating Reinforcement Learning and Code-as-policy

no code implementations • 29 Feb 2024 • Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia

To seamlessly integrate both modalities, we introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

2 code implementations • 28 Nov 2023 • Yanwei Li, Chengyao Wang, Jiaya Jia

Current VLMs, while proficient in tasks like image captioning and visual question answering, face computational burdens when processing long videos due to the excessive visual tokens.

Ranked #5 on Video-based Generative Performance Benchmarking on VideoInstruct

Image Captioning Video-based Generative Performance Benchmarking +2

832

Paper
Code

LISA: Reasoning Segmentation via Large Language Model

2 code implementations • 1 Aug 2023 • Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia

In this work, we propose a new segmentation task -- reasoning segmentation.

Language Modelling Large Language Model +3

1,443

Paper
Code

Democratizing Pathological Image Segmentation with Lay Annotators via Molecular-empowered Learning

1 code implementation • 31 May 2023 • Ruining Deng, Yanwei Li, Peize Li, Jiacheng Wang, Lucas W. Remedios, Saydolimkhon Agzamkhodjaev, Zuhayr Asad, Quan Liu, Can Cui, Yaohong Wang, Yihan Wang, Yucheng Tang, Haichun Yang, Yuankai Huo

The contribution of this paper is threefold: (1) We proposed a molecular-empowered learning scheme for multi-class cell segmentation using partial labels from lay annotators; (2) The proposed method integrated Giga-pixel level molecular-morphology cross-modality registration, molecular-informed annotation, and molecular-oriented segmentation model, so as to achieve significantly superior performance via 3 lay annotators as compared with 2 experienced pathologists; (3) A deep corrective learning (learning with imperfect label) method is proposed to further improve the segmentation performance using partially annotated noisy data.

Cell Segmentation Image Segmentation +3

Paper
Code

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction

1 code implementation • NeurIPS 2023 • Rui Yang, Lin Song, Yanwei Li, Sijie Zhao, Yixiao Ge, Xiu Li, Ying Shan

This paper aims to efficiently enable Large Language Models (LLMs) to use multimodal tools.

Image Generation Instruction Following +3

725

Paper
Code

End-to-end 3D Tracking with Decoupled Queries

no code implementations • ICCV 2023 • Yanwei Li, Zhiding Yu, Jonah Philion, Anima Anandkumar, Sanja Fidler, Jiaya Jia, Jose Alvarez

In this work, we present an end-to-end framework for camera-based 3D multi-object tracking, called DQTrack.

3D Multi-Object Tracking

Paper
Add Code

Diversified Dynamic Routing for Vision Tasks

no code implementations • 26 Sep 2022 • Botos Csaba, Adel Bibi, Yanwei Li, Philip Torr, Ser-Nam Lim

Deep learning models for vision tasks are trained on large datasets under the assumption that there exists a universal representation that can be used to make predictions for all samples.

Instance Segmentation object-detection +2

Paper
Add Code

Unifying Voxel-based Representation with Transformer for 3D Object Detection

1 code implementation • 1 Jun 2022 • Yanwei Li, Yilun Chen, Xiaojuan Qi, Zeming Li, Jian Sun, Jiaya Jia

To this end, the modality-specific space is first designed to represent different inputs in the voxel feature space.

3D Object Detection Object +3

214

Paper
Code

Voxel Field Fusion for 3D Object Detection

1 code implementation • CVPR 2022 • Yanwei Li, Xiaojuan Qi, Yukang Chen, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia

In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion.

3D Object Detection Data Augmentation +2

Paper
Code

Focal Sparse Convolutional Networks for 3D Object Detection

2 code implementations • CVPR 2022 • Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun, Jiaya Jia

In this paper, we introduce two new modules to enhance the capability of Sparse CNNs, both are based on making feature sparsity learnable with position-wise importance prediction.

3D Object Detection Object +1

359

Paper
Code

Multi-Scale Aligned Distillation for Low-Resolution Detection

2 code implementations • CVPR 2021 • Lu Qi, Jason Kuen, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya Jia

However, this option traditionally hurts the detection performance much.

Knowledge Distillation object-detection +1

129

Paper
Code

Fully Convolutional Networks for Panoptic Segmentation with Point-based Supervision

1 code implementation • 17 Aug 2021 • Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, Yukang Chen, Lu Qi, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia

In particular, Panoptic FCN encodes each object instance or stuff category with the proposed kernel generator and produces the prediction by convolving the high-resolution feature directly.

Panoptic Segmentation Segmentation +1

388

Paper
Code

Scale-aware Automatic Augmentation for Object Detection

1 code implementation • CVPR 2021 • Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei LI, Jiaya Jia

We propose Scale-aware AutoAug to learn data augmentation policies for object detection.

Data Augmentation Instance Segmentation +5

196

Paper
Code

Fine-Grained Dynamic Head for Object Detection

1 code implementation • NeurIPS 2020 • Lin Song, Yanwei Li, Zhengkai Jiang, Zeming Li, Hongbin Sun, Jian Sun, Nanning Zheng

To this end, we propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance, which further releases the ability of multi-scale feature representation.

Object object-detection +1

Paper
Code

Rethinking Learnable Tree Filter for Generic Feature Transform

1 code implementation • NeurIPS 2020 • Lin Song, Yanwei Li, Zhengkai Jiang, Zeming Li, Xiangyu Zhang, Hongbin Sun, Jian Sun, Nanning Zheng

The Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation.

Instance Segmentation object-detection +3

Paper
Code

Fully Convolutional Networks for Panoptic Segmentation

6 code implementations • CVPR 2021 • Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia

In this paper, we present a conceptually simple, strong, and efficient framework for panoptic segmentation, called Panoptic FCN.

Ranked #1 on Panoptic Segmentation on COCO minival (SQ metric)

Panoptic Segmentation Segmentation

388

Paper
Code

Dynamic Scale Training for Object Detection

4 code implementations • 26 Apr 2020 • Yukang Chen, Peizhen Zhang, Zeming Li, Yanwei Li, Xiangyu Zhang, Lu Qi, Jian Sun, Jiaya Jia

We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.

Instance Segmentation Model Optimization +4

Paper
Code

Learning Dynamic Routing for Semantic Segmentation

1 code implementation • CVPR 2020 • Yanwei Li, Lin Song, Yukang Chen, Zeming Li, Xiangyu Zhang, Xingang Wang, Jian Sun

To demonstrate the superiority of the dynamic property, we compare with several static architectures, which can be modeled as special cases in the routing space.

Segmentation Semantic Segmentation

378

Paper
Code

Learnable Tree Filter for Structure-preserving Feature Transform

1 code implementation • NeurIPS 2019 • Lin Song, Yanwei Li, Zeming Li, Gang Yu, Hongbin Sun, Jian Sun, Nanning Zheng

To this end, tree filtering modules are embedded to formulate a unified framework for semantic segmentation.

Semantic Segmentation

140

Paper
Code

FastPose: Towards Real-time Pose Estimation and Tracking via Scale-normalized Multi-task Networks

no code implementations • 15 Aug 2019 • Jiabin Zhang, Zheng Zhu, Wei Zou, Peng Li, Yanwei Li, Hu Su, Guan Huang

Given the results of MTN, we adopt an occlusion-aware Re-ID feature strategy in the pose tracking module, where pose information is utilized to infer the occlusion state to make better use of Re-ID feature.

Human Detection Multi-Person Pose Estimation +3

Paper
Add Code

State-aware Re-identification Feature for Multi-target Multi-camera Tracking

no code implementations • 4 Jun 2019 • Peng Li, Jiabin Zhang, Zheng Zhu, Yanwei Li, Lu Jiang, Guan Huang

Multi-target Multi-camera Tracking (MTMCT) aims to extract the trajectories from videos captured by a set of cameras.

Paper
Add Code

Identity-Enhanced Network for Facial Expression Recognition

no code implementations • 11 Dec 2018 • Yanwei Li, Xingang Wang, Shilei Zhang, Lingxi Xie, Wenqi Wu, Hongyuan Yu, Zheng Zhu

Facial expression recognition is a challenging task, arguably because of large intra-class variations and high inter-class similarities.

Facial Expression Recognition Facial Expression Recognition (FER) +1

Paper
Add Code

Attention-guided Unified Network for Panoptic Segmentation

no code implementations • CVPR 2019 • Yanwei Li, Xinze Chen, Zheng Zhu, Lingxi Xie, Guan Huang, Dalong Du, Xingang Wang

This paper studies panoptic segmentation, a recently proposed task which segments foreground (FG) objects at the instance level as well as background (BG) contents at the semantic level.

Ranked #24 on Panoptic Segmentation on COCO test-dev

Panoptic Segmentation Segmentation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.