Search Results for author: Shilong Liu

Found 36 papers, 26 papers with code

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

1 code implementation • 21 Mar 2024 • Qing Jiang, Feng Li, Zhaoyang Zeng, Tianhe Ren, Shilong Liu, Lei Zhang

Recognizing the complementary strengths and weaknesses of both text and visual prompts, we introduce T-Rex2 that synergizes both prompts within a single model through contrastive learning.

Contrastive Learning Descriptive +3

1,876

Paper
Code

TAPTR: Tracking Any Point with Transformers as Detection

no code implementations • 19 Mar 2024 • Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Lei Zhang

Based on the observation that point tracking bears a great resemblance to object detection and tracking, we borrow designs from DETR-like algorithms to address the task of TAP.

object-detection Object Detection +2

Paper
Add Code

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

1 code implementation • 25 Jan 2024 • Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang

We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM).

Segmentation

13,548

Paper
Code

Interfacing Foundation Models' Embeddings

1 code implementation • 12 Dec 2023 • Xueyan Zou, Linjie Li, JianFeng Wang, Jianwei Yang, Mingyu Ding, Zhengyuan Yang, Feng Li, Hao Zhang, Shilong Liu, Arul Aravinthan, Yong Jae Lee, Lijuan Wang

The proposed interface is adaptive to new tasks, and new models.

Decoder Image Segmentation +3

Paper
Code

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

1 code implementation • 5 Dec 2023 • Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang

To address this issue, we have created GVC data that allows for the combination of grounding and chat capabilities.

Decoder

243

Paper
Code

Visual In-Context Prompting

3 code implementations • 22 Nov 2023 • Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao

In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain.

Decoder Segmentation +1

1,934

Paper
Code

T-Rex: Counting by Visual Prompting

no code implementations • 22 Nov 2023 • Qing Jiang, Feng Li, Tianhe Ren, Shilong Liu, Zhaoyang Zeng, Kent Yu, Lei Zhang

Guided by the visual feedback from T-Rex, users can also interactively refine the counting results by prompting on missing or falsely-detected objects.

Object Object Counting +4

Paper
Add Code

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

1 code implementation • 9 Nov 2023 • Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li

LLaVA-Plus is a general-purpose multimodal assistant that expands the capabilities of large multimodal models.

Ranked #1 on LMM real-life tasks on Leaderboard

Instruction Following LLM real-life tasks +3

630

Paper
Code

InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image

1 code implementation • 6 Nov 2023 • Jianhui Li, Shilong Liu, Zidong Liu, Yikai Wang, Kaiwen Zheng, Jinghui Xu, Jianmin Li, Jun Zhu

With the success of Neural Radiance Field (NeRF) in 3D-aware portrait editing, a variety of works have achieved promising results regarding both quality and 3D consistency.

Paper
Code

TOSS:High-quality Text-guided Novel View Synthesis from a Single Image

no code implementations • 16 Oct 2023 • Yukai Shi, Jianan Wang, He Cao, Boshi Tang, Xianbiao Qi, Tianyu Yang, Yukun Huang, Shilong Liu, Lei Zhang, Heung-Yeung Shum

In this paper, we present TOSS, which introduces text to the task of novel view synthesis (NVS) from just a single RGB image.

Image-to-Image Translation Novel View Synthesis

Paper
Add Code

Neural Interactive Keypoint Detection

1 code implementation • ICCV 2023 • Jie Yang, Ailing Zeng, Feng Li, Shilong Liu, Ruimao Zhang, Lei Zhang

Click-Pose explores how user feedback can cooperate with a neural keypoint detector to correct the predicted keypoints in an interactive way for a faster and more effective annotation process.

Decoder Keypoint Detection

Paper
Code

A Unified Interactive Model Evaluation for Classification, Object Detection, and Instance Segmentation in Computer Vision

no code implementations • 9 Aug 2023 • Changjian Chen, Yukai Guo, Fengyuan Tian, Shilong Liu, Weikai Yang, Zhaowei Wang, Jing Wu, Hang Su, Hanspeter Pfister, Shixia Liu

Existing model evaluation tools mainly focus on evaluating classification models, leaving a gap in evaluating more complex models, such as object detection.

Instance Segmentation object-detection +2

Paper
Add Code

DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting

no code implementations • ICCV 2023 • Hongyang Li, Hao Zhang, Zhaoyang Zeng, Shilong Liu, Feng Li, Tianhe Ren, Lei Zhang

Existing feature lifting approaches, such as Lift-Splat-based and 2D attention-based, either use estimated depth to get pseudo LiDAR features and then splat them to a 3D space, which is a one-pass operation without feature refinement, or ignore depth and lift features by 2D attention mechanisms, which achieve finer semantics while suffering from a depth ambiguity problem.

3D Object Detection object-detection

Paper
Add Code

Semantic-SAM: Segment and Recognize Anything at Any Granularity

1 code implementation • 10 Jul 2023 • Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.

Image Segmentation Segmentation +1

1,934

Paper
Code

detrex: Benchmarking Detection Transformers

1 code implementation • 12 Jun 2023 • Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang

To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation.

Benchmarking object-detection +2

1,828

Paper
Code

Recognize Anything: A Strong Image Tagging Model

2 code implementations • 6 Jun 2023 • Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, Lei Zhang

We are releasing the RAM at \url{https://recognize-anything. github. io/} to foster the advancements of large models in computer vision.

Semantic Parsing

2,437

Paper
Code

A Strong and Reproducible Object Detector with Only Public Datasets

3 code implementations • 25 Apr 2023 • Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang

This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64. 6 AP on COCO val2017 and 64. 8 AP on COCO test-dev using only 700M parameters without any test time augmentation.

Ranked #5 on Object Detection on COCO minival (using extra training data)

object-detection Object Detection

650

Paper
Code

PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image

1 code implementation • CVPR 2023 • Jianhui Li, Jianmin Li, Haoji Zhang, Shilong Liu, Zhengyi Wang, Zihao Xiao, Kaiwen Zheng, Jun Zhu

As for imprecise image editing, we attribute the problem to the gap between the latent space of real images and that of generated images.

Attribute

Paper
Code

Detection Transformer with Stable Matching

2 code implementations • ICCV 2023 • Shilong Liu, Tianhe Ren, Jiayu Chen, Zhaoyang Zeng, Hao Zhang, Feng Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang

We point out that the unstable matching in DETR is caused by a multi-optimization path problem, which is highlighted by the one-to-one matching design in DETR.

Decoder Position

179

Paper
Code

A Simple Framework for Open-Vocabulary Segmentation and Detection

2 code implementations • ICCV 2023 • Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang

We present OpenSeeD, a simple Open-vocabulary Segmentation and Detection framework that jointly learns from different segmentation and detection datasets.

Ranked #2 on Instance Segmentation on ADE20K val (using extra training data)

Instance Segmentation Panoptic Segmentation +2

1,249

Paper
Code

MP-Former: Mask-Piloted Transformer for Image Segmentation

1 code implementation • CVPR 2023 • Hao Zhang, Feng Li, Huaizhe xu, Shijia Huang, Shilong Liu, Lionel M. Ni, Lei Zhang

We present a mask-piloted Transformer which improves masked-attention in Mask2Former for image segmentation.

Decoder Image Segmentation +2

106

Paper
Code

Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR

1 code implementation • 13 Mar 2023 • Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, Lionel M. Ni

Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance.

object-detection Object Detection

176

Paper
Code

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

7 code implementations • 9 Mar 2023 • Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang

To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion.

Ranked #1 on Zero-Shot Object Detection on MSCOCO

Decoder Referring Expression +3

125,385

Paper
Code

Introducing Depth into Transformer-based 3D Object Detection

no code implementations • 25 Feb 2023 • Hao Zhang, Hongyang Li, Ailing Zeng, Feng Li, Shilong Liu, Xingyu Liao, Lei Zhang

To address the second issue, we introduce an auxiliary learning task called Depth-aware Negative Suppression loss.

3D Object Detection Auxiliary Learning +3

Paper
Add Code

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation

3 code implementations • 3 Feb 2023 • Jie Yang, Ailing Zeng, Shilong Liu, Feng Li, Ruimao Zhang, Lei Zhang

This paper presents a novel end-to-end framework with Explicit box Detection for multi-person Pose estimation, called ED-Pose, where it unifies the contextual learning between human-level (global) and keypoint-level (local) information.

Ranked #2 on 2D Human Pose Estimation on Human-Art

2D Human Pose Estimation Decoder +4

138

Paper
Code

Lite DETR: An Interleaved Multi-Scale Encoder for Efficient DETR

no code implementations • CVPR 2023 • Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, Lionel M. Ni

Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance.

object-detection Object Detection

Paper
Add Code

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

1 code implementation • 28 Nov 2022 • Shilong Liu, Yaoyuan Liang, Feng Li, Shijia Huang, Hao Zhang, Hang Su, Jun Zhu, Lei Zhang

As phrase extraction can be regarded as a $1$D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction.

Ranked #7 on Referring Expression Comprehension on RefCOCO

object-detection Object Detection +4

Paper
Code

A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation

no code implementations • 15 Nov 2022 • Shijia Huang, Feng Li, Hao Zhang, Shilong Liu, Lei Zhang, LiWei Wang

Our mutual supervision contains two directions.

Reference Expression Generation Referring Expression +2

Paper
Add Code

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

9 code implementations • CVPR 2023 • Feng Li, Hao Zhang, Huaizhe xu, Shilong Liu, Lei Zhang, Lionel M. Ni, Heung-Yeung Shum

In this paper we present Mask DINO, a unified object detection and segmentation framework.

Ranked #1 on Panoptic Segmentation on COCO test-dev

Image Segmentation Instance Segmentation +3

12,088

Paper
Code

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

15 code implementations • 7 Mar 2022 • Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum

Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results.

Ranked #1 on Real-Time Object Detection on COCO 2017 val

Real-Time Object Detection

13,548

Paper
Code

Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

no code implementations • 3 Mar 2022 • Feng Li, Hao Zhang, Yi-Fan Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, Pengchuan Zhang, Lei Zhang

This survey is inspired by the remarkable progress in both computer vision and natural language processing, and recent trends shifting from single modality processing to multiple modality comprehension.

Few-Shot Learning Representation Learning

Paper
Add Code

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

16 code implementations • CVPR 2022 • Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang

Our method is universal and can be easily plugged into any DETR-like methods by adding dozens of lines of code to achieve a remarkable improvement.

Decoder Object Detection

1,987

Paper
Code

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

7 code implementations • ICLR 2022 • Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang

We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR.

Ranked #11 on 2D Object Detection on SARDet-100K

Object Detection

1,828

Paper
Code

Unrestricted Adversarial Attacks on ImageNet Competition

1 code implementation • 17 Oct 2021 • Yuefeng Chen, Xiaofeng Mao, Yuan He, Hui Xue, Chao Li, Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, Hang Su, Jun Zhu, Fangcheng Liu, Chao Zhang, Hongyang Zhang, Yichi Zhang, Shilong Liu, Chang Liu, Wenzhao Xiang, Yajie Wang, Huipeng Zhou, Haoran Lyu, Yidan Xu, Zixuan Xu, Taoyu Zhu, Wenjun Li, Xianfeng Gao, Guoqiu Wang, Huanqian Yan, Ying Guo, Chaoning Zhang, Zheng Fang, Yang Wang, Bingyang Fu, Yunfei Zheng, Yekui Wang, Haorong Luo, Zhen Yang

Many works have investigated the adversarial attacks or defenses under the settings where a bounded and imperceptible perturbation can be added to the input.

Adversarial Attack

Paper
Code

Query2Label: A Simple Transformer Way to Multi-Label Classification

2 code implementations • 22 Jul 2021 • Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, Jun Zhu

The use of Transformer is rooted in the need of extracting local discriminative features adaptively for different labels, which is a strongly desired property due to the existence of multiple objects in one image.

Ranked #1 on Multi-Label Classification on PASCAL VOC 2012

Classification Decoder +1

382

Paper
Code

Unsupervised Part Segmentation through Disentangling Appearance and Shape

no code implementations • CVPR 2021 • Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, Jun Zhu

We study the problem of unsupervised discovery and segmentation of object parts, which, as an intermediate local representation, are capable of finding intrinsic object structure and providing more explainable recognition results.

Ranked #1 on Unsupervised Facial Landmark Detection on AFLW Unaligned

Disentanglement Object +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.