no code implementations • 26 Dec 2024 • Yabing Wang, Zhuotao Tian, Qingpei Guo, Zheng Qin, Sanping Zhou, Ming Yang, Le Wang
It consists of the query adaption module that can be seamlessly integrated into CLIP and generate the referential query to provide the prior context for decoder, along with a task-specific decoder.
1 code implementation • 5 Dec 2024 • Senqiao Yang, Yukang Chen, Zhuotao Tian, Chengyao Wang, Jingyao Li, Bei Yu, Jiaya Jia
To address this, we introduce VisionZip, a simple yet effective method that selects a set of informative tokens for input to the language model, reducing visual token redundancy and improving efficiency while maintaining model performance.
Ranked #172 on Visual Question Answering on MM-Vet
1 code implementation • 4 Nov 2024 • Yijun Liu, Jiequan Cui, Zhuotao Tian, Senqiao Yang, Qingdong He, Xiaoling Wang, Jingyong Su
We observe that, with the cross-entropy loss, model predictions are optimized to align with the corresponding labels via increasing logit magnitude or refining logit direction.
1 code implementation • 11 Jul 2024 • Tong Shao, Zhuotao Tian, Hang Zhao, Jingyong Su
CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities.
1 code implementation • 7 Jul 2024 • Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Hantao Zhou, Hengshuang Zhao, Xiu Li, Jiaya Jia
To address this problem efficiently, we propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of VLMs from a perspective of avoiding information interference.
1 code implementation • 26 Jun 2024 • Xin Lai, Zhuotao Tian, Yukang Chen, Senqiao Yang, Xiangru Peng, Jiaya Jia
Mathematical reasoning presents a significant challenge for Large Language Models (LLMs) due to the extensive and precise chain of reasoning required for accuracy.
Ranked #11 on Arithmetic Reasoning on GSM8K (using extra training data)
1 code implementation • 11 Apr 2024 • Bohao Peng, Zhuotao Tian, Shu Liu, MingChang Yang, Jiaya Jia
In this study, we introduce the Scalable Language Model (SLM) to overcome these limitations within a more challenging and generalized setting, representing a significant advancement toward practical applications for continual learning.
1 code implementation • CVPR 2024 • Senqiao Yang, Zhuotao Tian, Li Jiang, Jiaya Jia
This paper introduces Unified Language-driven Zero-shot Domain Adaptation (ULDA), a novel task setting that enables a single model to adapt to diverse target domains without explicit domain-ID knowledge.
1 code implementation • CVPR 2024 • Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao, Zhuotao Tian, Jiaya Jia
This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module to greatly enhance the adaptivity of sparse CNNs at minimal computational cost.
Ranked #5 on 3D Semantic Segmentation on SemanticKITTI (val mIoU metric)
1 code implementation • CVPR 2024 • Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bohao Peng, Hengshuang Zhao, Jiaya Jia
To address this issue, we propose GroupContrast, a novel approach that combines segment grouping and semantic-aware contrastive learning.
no code implementations • CVPR 2024 • Sitong Wu, Haoru Tan, Zhuotao Tian, Yukang Chen, Xiaojuan Qi, Jiaya Jia
We discover that the lack of consideration for sample-wise affinity consistency across modalities in existing training objectives is the central cause.
1 code implementation • 28 Dec 2023 • Senqiao Yang, Tianyuan Qu, Xin Lai, Zhuotao Tian, Bohao Peng, Shu Liu, Jiaya Jia
While LISA effectively bridges the gap between segmentation and large language models to enable reasoning segmentation, it poses certain limitations: unable to distinguish different instances of the target region, and constrained by the pre-defined textual response formats.
1 code implementation • CVPR 2024 • Xiaoyang Wu, Zhuotao Tian, Xin Wen, Bohao Peng, Xihui Liu, Kaicheng Yu, Hengshuang Zhao
In contrast, such privilege has not yet fully benefited 3D deep learning, mainly due to the limited availability of large-scale 3D datasets.
Ranked #3 on 3D Semantic Segmentation on SemanticKITTI (val mIoU metric, using extra training data)
1 code implementation • 6 Aug 2023 • Zhenhua Ning, Zhuotao Tian, Guangming Lu, Wenjie Pei
Although extensive research has been conducted on 3D point cloud segmentation, effectively adapting generic models to novel categories remains a formidable challenge.
2 code implementations • CVPR 2024 • Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia
In this work, we propose a new segmentation task -- reasoning segmentation.
no code implementations • 27 Jun 2023 • Bohao Peng, Zhuotao Tian, Xiaoyang Wu, Chengyao Wang, Shu Liu, Jingyong Su, Jiaya Jia
We hope our work can benefit broader industrial applications where novel classes with limited annotations are required to be decently identified.
4 code implementations • 23 May 2023 • Jiequan Cui, Zhuotao Tian, Zhisheng Zhong, Xiaojuan Qi, Bei Yu, Hanwang Zhang
In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error (wMSE) loss and 2) a Cross-Entropy loss incorporating soft labels.
1 code implementation • CVPR 2023 • Bohao Peng, Zhuotao Tian, Xiaoyang Wu, Chenyao Wang, Shu Liu, Jingyong Su, Jiaya Jia
Few-shot semantic segmentation (FSS) aims to form class-agnostic models segmenting unseen classes with only a handful of annotations.
Ranked #8 on Few-Shot Semantic Segmentation on PASCAL-5i (1-Shot)
2 code implementations • 21 Mar 2023 • Zhuotao Tian, Jiequan Cui, Li Jiang, Xiaojuan Qi, Xin Lai, Yixin Chen, Shu Liu, Jiaya Jia
Semantic segmentation is still a challenging task for parsing diverse contexts in different scenes, thus the fixed classifier might not be able to well address varying feature distributions during testing.
no code implementations • 28 Sep 2022 • Jianhui Liu, Yukang Chen, Xiaoqing Ye, Zhuotao Tian, Xiao Tan, Xiaojuan Qi
3D scenes are dominated by a large number of background points, which is redundant for the detection task that mainly needs to focus on foreground objects.
4 code implementations • 26 Sep 2022 • Jiequan Cui, Zhisheng Zhong, Zhuotao Tian, Shu Liu, Bei Yu, Jiaya Jia
Based on theoretical analysis, we observe that supervised contrastive loss tends to bias high-frequency classes and thus increases the difficulty of imbalanced learning.
Ranked #7 on Long-tail Learning on iNaturalist 2018
1 code implementation • 21 Sep 2022 • Dong Zhang, Yi Lin, Hao Chen, Zhuotao Tian, Xin Yang, Jinhui Tang, Kwang Ting Cheng
Over the past few years, the rapid development of deep learning technologies for computer vision has significantly improved the performance of medical image segmentation (MedISeg).
1 code implementation • 20 Jul 2022 • Xin Lai, Zhuotao Tian, Xiaogang Xu, Yingcong Chen, Shu Liu, Hengshuang Zhao, LiWei Wang, Jiaya Jia
Unsupervised domain adaptation in semantic segmentation has been raised to alleviate the reliance on expensive pixel-wise annotations.
5 code implementations • 5 Apr 2022 • Jiequan Cui, Yuhui Yuan, Zhisheng Zhong, Zhuotao Tian, Han Hu, Stephen Lin, Jiaya Jia
In this paper, we study the problem of class imbalance in semantic segmentation.
Ranked #22 on Semantic Segmentation on ADE20K
no code implementations • 2 Mar 2022 • Yixin Chen, Zhuotao Tian, Pengguang Chen, Shu Liu, Jiaya Jia
We revisit the one- and two-stage detector distillation tasks and present a simple and efficient semantic-aware framework to fill the gap between them.
2 code implementations • ICCV 2021 • Li Jiang, Shaoshuai Shi, Zhuotao Tian, Xin Lai, Shu Liu, Chi-Wing Fu, Jiaya Jia
To address the high cost and challenges of 3D point-level labeling, we present a method for semi-supervised point cloud semantic segmentation to adopt unlabeled point clouds in training to boost the model performance.
1 code implementation • 28 Sep 2021 • Xiaoliu Luo, Zhuotao Tian, Taiping Zhang, Bei Yu, Yuan Yan Tang, Jiaya Jia
In this work, we revisit the prior mask guidance proposed in ``Prior Guided Feature Enrichment Network for Few-Shot Segmentation''.
2 code implementations • CVPR 2021 • Xin Lai, Zhuotao Tian, Li Jiang, Shu Liu, Hengshuang Zhao, LiWei Wang, Jiaya Jia
Semantic segmentation has made tremendous progress in recent years.
5 code implementations • 26 Jan 2021 • Jiequan Cui, Shu Liu, Zhuotao Tian, Zhisheng Zhong, Jiaya Jia
From this perspective, the trivial solution utilizes different branches for the head, medium, and tail classes respectively, and then sums their outputs as the final results is not feasible.
Ranked #22 on Long-tail Learning on CIFAR-10-LT (ρ=10)
1 code implementation • CVPR 2022 • Zhuotao Tian, Xin Lai, Li Jiang, Shu Liu, Michelle Shu, Hengshuang Zhao, Jiaya Jia
Then, since context is essential for semantic segmentation, we propose the Context-Aware Prototype Learning (CAPL) that significantly improves performance by 1) leveraging the co-occurrence prior knowledge from support samples, and 2) dynamically enriching contextual information to the classifier, conditioned on the content of each query image.
3 code implementations • 4 Aug 2020 • Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Zhicheng Yang, Ruiyu Li, Jiaya Jia
It consists of novel designs of (1) a training-free prior mask generation method that not only retains generalization power but also improves model performance and (2) Feature Enrichment Module (FEM) that overcomes spatial inconsistency by adaptively enriching query features with support features and prior masks.
Ranked #71 on Few-Shot Semantic Segmentation on COCO-20i (1-shot)
no code implementations • 27 Jun 2019 • Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Jiaze Wang, Ruiyu Li, Xiaoyong Shen, Jiaya Jia
Albeit intensively studied, false prediction and unclear boundaries are still major issues of salient object detection.
no code implementations • CVPR 2019 • Zhuotao Tian, Michelle Shu, Pengyuan Lyu, Ruiyu Li, Chao Zhou, Xiaoyong Shen, Jiaya Jia
We address the problem of detecting scene text in arbitrary shapes, which is a challenging task due to the high variety and complexity of the scene.