1 code implementation • 21 Mar 2024 • Qing Jiang, Feng Li, Zhaoyang Zeng, Tianhe Ren, Shilong Liu, Lei Zhang
Recognizing the complementary strengths and weaknesses of both text and visual prompts, we introduce T-Rex2 that synergizes both prompts within a single model through contrastive learning.
no code implementations • 19 Mar 2024 • Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Lei Zhang
Based on the observation that point tracking bears a great resemblance to object detection and tracking, we borrow designs from DETR-like algorithms to address the task of TAP.
1 code implementation • 25 Jan 2024 • Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang
We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM).
1 code implementation • 12 Dec 2023 • Xueyan Zou, Linjie Li, JianFeng Wang, Jianwei Yang, Mingyu Ding, Zhengyuan Yang, Feng Li, Hao Zhang, Shilong Liu, Arul Aravinthan, Yong Jae Lee, Lijuan Wang
The proposed interface is adaptive to new tasks, and new models.
1 code implementation • 5 Dec 2023 • Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang
To address this issue, we have created GVC data that allows for the combination of grounding and chat capabilities.
3 code implementations • 22 Nov 2023 • Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao
In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain.
no code implementations • 22 Nov 2023 • Qing Jiang, Feng Li, Tianhe Ren, Shilong Liu, Zhaoyang Zeng, Kent Yu, Lei Zhang
Guided by the visual feedback from T-Rex, users can also interactively refine the counting results by prompting on missing or falsely-detected objects.
1 code implementation • 9 Nov 2023 • Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li
LLaVA-Plus is a general-purpose multimodal assistant that expands the capabilities of large multimodal models.
Ranked #1 on LMM real-life tasks on Leaderboard
1 code implementation • 6 Nov 2023 • Jianhui Li, Shilong Liu, Zidong Liu, Yikai Wang, Kaiwen Zheng, Jinghui Xu, Jianmin Li, Jun Zhu
With the success of Neural Radiance Field (NeRF) in 3D-aware portrait editing, a variety of works have achieved promising results regarding both quality and 3D consistency.
no code implementations • 16 Oct 2023 • Yukai Shi, Jianan Wang, He Cao, Boshi Tang, Xianbiao Qi, Tianyu Yang, Yukun Huang, Shilong Liu, Lei Zhang, Heung-Yeung Shum
In this paper, we present TOSS, which introduces text to the task of novel view synthesis (NVS) from just a single RGB image.
1 code implementation • ICCV 2023 • Jie Yang, Ailing Zeng, Feng Li, Shilong Liu, Ruimao Zhang, Lei Zhang
Click-Pose explores how user feedback can cooperate with a neural keypoint detector to correct the predicted keypoints in an interactive way for a faster and more effective annotation process.
no code implementations • 9 Aug 2023 • Changjian Chen, Yukai Guo, Fengyuan Tian, Shilong Liu, Weikai Yang, Zhaowei Wang, Jing Wu, Hang Su, Hanspeter Pfister, Shixia Liu
Existing model evaluation tools mainly focus on evaluating classification models, leaving a gap in evaluating more complex models, such as object detection.
no code implementations • ICCV 2023 • Hongyang Li, Hao Zhang, Zhaoyang Zeng, Shilong Liu, Feng Li, Tianhe Ren, Lei Zhang
Existing feature lifting approaches, such as Lift-Splat-based and 2D attention-based, either use estimated depth to get pseudo LiDAR features and then splat them to a 3D space, which is a one-pass operation without feature refinement, or ignore depth and lift features by 2D attention mechanisms, which achieve finer semantics while suffering from a depth ambiguity problem.
1 code implementation • 10 Jul 2023 • Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao
In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.
1 code implementation • 12 Jun 2023 • Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang
To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation.
2 code implementations • 6 Jun 2023 • Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, Lei Zhang
We are releasing the RAM at \url{https://recognize-anything. github. io/} to foster the advancements of large models in computer vision.
3 code implementations • 25 Apr 2023 • Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang
This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64. 6 AP on COCO val2017 and 64. 8 AP on COCO test-dev using only 700M parameters without any test time augmentation.
Ranked #5 on Object Detection on COCO minival (using extra training data)
1 code implementation • CVPR 2023 • Jianhui Li, Jianmin Li, Haoji Zhang, Shilong Liu, Zhengyi Wang, Zihao Xiao, Kaiwen Zheng, Jun Zhu
As for imprecise image editing, we attribute the problem to the gap between the latent space of real images and that of generated images.
2 code implementations • ICCV 2023 • Shilong Liu, Tianhe Ren, Jiayu Chen, Zhaoyang Zeng, Hao Zhang, Feng Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang
We point out that the unstable matching in DETR is caused by a multi-optimization path problem, which is highlighted by the one-to-one matching design in DETR.
2 code implementations • ICCV 2023 • Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang
We present OpenSeeD, a simple Open-vocabulary Segmentation and Detection framework that jointly learns from different segmentation and detection datasets.
Ranked #2 on Instance Segmentation on ADE20K val (using extra training data)
1 code implementation • CVPR 2023 • Hao Zhang, Feng Li, Huaizhe xu, Shijia Huang, Shilong Liu, Lionel M. Ni, Lei Zhang
We present a mask-piloted Transformer which improves masked-attention in Mask2Former for image segmentation.
1 code implementation • 13 Mar 2023 • Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, Lionel M. Ni
Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance.
7 code implementations • 9 Mar 2023 • Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang
To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion.
Ranked #1 on Zero-Shot Object Detection on MSCOCO
no code implementations • 25 Feb 2023 • Hao Zhang, Hongyang Li, Ailing Zeng, Feng Li, Shilong Liu, Xingyu Liao, Lei Zhang
To address the second issue, we introduce an auxiliary learning task called Depth-aware Negative Suppression loss.
3 code implementations • 3 Feb 2023 • Jie Yang, Ailing Zeng, Shilong Liu, Feng Li, Ruimao Zhang, Lei Zhang
This paper presents a novel end-to-end framework with Explicit box Detection for multi-person Pose estimation, called ED-Pose, where it unifies the contextual learning between human-level (global) and keypoint-level (local) information.
Ranked #2 on 2D Human Pose Estimation on Human-Art
no code implementations • CVPR 2023 • Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, Lionel M. Ni
Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance.
1 code implementation • 28 Nov 2022 • Shilong Liu, Yaoyuan Liang, Feng Li, Shijia Huang, Hao Zhang, Hang Su, Jun Zhu, Lei Zhang
As phrase extraction can be regarded as a $1$D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction.
Ranked #7 on Referring Expression Comprehension on RefCOCO
no code implementations • 15 Nov 2022 • Shijia Huang, Feng Li, Hao Zhang, Shilong Liu, Lei Zhang, LiWei Wang
Our mutual supervision contains two directions.
9 code implementations • CVPR 2023 • Feng Li, Hao Zhang, Huaizhe xu, Shilong Liu, Lei Zhang, Lionel M. Ni, Heung-Yeung Shum
In this paper we present Mask DINO, a unified object detection and segmentation framework.
Ranked #1 on Panoptic Segmentation on COCO test-dev
15 code implementations • 7 Mar 2022 • Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum
Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results.
Ranked #1 on Real-Time Object Detection on COCO 2017 val
no code implementations • 3 Mar 2022 • Feng Li, Hao Zhang, Yi-Fan Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, Pengchuan Zhang, Lei Zhang
This survey is inspired by the remarkable progress in both computer vision and natural language processing, and recent trends shifting from single modality processing to multiple modality comprehension.
16 code implementations • CVPR 2022 • Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang
Our method is universal and can be easily plugged into any DETR-like methods by adding dozens of lines of code to achieve a remarkable improvement.
7 code implementations • ICLR 2022 • Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang
We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR.
Ranked #11 on 2D Object Detection on SARDet-100K
1 code implementation • 17 Oct 2021 • Yuefeng Chen, Xiaofeng Mao, Yuan He, Hui Xue, Chao Li, Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, Hang Su, Jun Zhu, Fangcheng Liu, Chao Zhang, Hongyang Zhang, Yichi Zhang, Shilong Liu, Chang Liu, Wenzhao Xiang, Yajie Wang, Huipeng Zhou, Haoran Lyu, Yidan Xu, Zixuan Xu, Taoyu Zhu, Wenjun Li, Xianfeng Gao, Guoqiu Wang, Huanqian Yan, Ying Guo, Chaoning Zhang, Zheng Fang, Yang Wang, Bingyang Fu, Yunfei Zheng, Yekui Wang, Haorong Luo, Zhen Yang
Many works have investigated the adversarial attacks or defenses under the settings where a bounded and imperceptible perturbation can be added to the input.
2 code implementations • 22 Jul 2021 • Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, Jun Zhu
The use of Transformer is rooted in the need of extracting local discriminative features adaptively for different labels, which is a strongly desired property due to the existence of multiple objects in one image.
Ranked #1 on Multi-Label Classification on PASCAL VOC 2012
no code implementations • CVPR 2021 • Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, Jun Zhu
We study the problem of unsupervised discovery and segmentation of object parts, which, as an intermediate local representation, are capable of finding intrinsic object structure and providing more explainable recognition results.