Search Results for author: Jiale Cao

Found 37 papers, 19 papers with code

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

no code implementations7 Nov 2024 Shehan Munasinghe, Hanan Gani, Wenqi Zhu, Jiale Cao, Eric Xing, Fahad Shahbaz Khan, Salman Khan

To enable fine-grained grounding, we curate a multimodal dataset featuring detailed visually-grounded conversations using a semiautomatic annotation pipeline, resulting in a diverse set of 38k video-QA triplets along with 83k objects and 671k masks.

Decoder Language Modeling +5

DB-SAM: Delving into High Quality Universal Medical Image Segmentation

1 code implementation5 Oct 2024 Chao Qin, Jiale Cao, Huazhu Fu, Fahad Shahbaz Khan, Rao Muhammad Anwer

On 21 3D medical image segmentation tasks, our proposed DB-SAM achieves an absolute gain of 8. 8%, compared to a recent medical SAM adapter in the literature.

Image Segmentation Medical Image Segmentation +2

iSeg: An Iterative Refinement-based Framework for Training-free Segmentation

1 code implementation5 Sep 2024 Lin Sun, Jiale Cao, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang

Leveraging the entropy-reduced self-attention module, our iSeg stably improves refined cross-attention map with iterative refinement.

Image Generation Segmentation +1

Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective

no code implementations24 Jul 2024 Jingren Liu, Zhong Ji, Yunlong Yu, Jiale Cao, Yanwei Pang, Jungong Han, Xuelong Li

This work provides a theoretical foundation for understanding and improving PEFT-CL models, offering insights into the interplay between feature representation, task orthogonality, and generalization, contributing to the development of more efficient continual learning systems.

Continual Learning parameter-efficient fine-tuning

Multi-Granularity Language-Guided Multi-Object Tracking

1 code implementation7 Jun 2024 Yuhao Li, Muzammal Naseer, Jiale Cao, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan

To this end, we propose a new multi-object tracking framework, named LG-MOT, that explicitly leverages language information at different levels of granularity (scene-and instance-level) and combines it with standard visual features to obtain discriminative representations.

Multi-Object Tracking Object +1

VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection

no code implementations15 Apr 2024 Bonan Ding, Jin Xie, Jing Nie, Jiale Cao, Xuelong Li, Yanwei Pang

Therefore, an effective solution involves transforming monocular images into LiDAR-like representations and employing a LiDAR-based 3D object detector to predict the 3D coordinates of objects.

Autonomous Driving Monocular 3D Object Detection +2

Implicit and Explicit Language Guidance for Diffusion-based Visual Perception

no code implementations11 Apr 2024 Hefeng Wang, Jiale Cao, Jin Xie, Aiping Yang, Yanwei Pang

The explicit branch utilizes the ground-truth labels of corresponding images as text prompts to condition feature extraction of diffusion model.

Depth Estimation Image Generation +1

SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior

no code implementations29 Mar 2024 Zhongrui Yu, Haoran Wang, Jinze Yang, Hanzhang Wang, Zeke Xie, Yunfeng Cai, Jiale Cao, Zhong Ji, Mingming Sun

To tackle this problem, we propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model along with complementary multi-modal data.

Autonomous Driving Neural Rendering +1

CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation

1 code implementation19 Mar 2024 Wenqi Zhu, Jiale Cao, Jin Xie, Shuangming Yang, Yanwei Pang

The experiments are performed on various video instance segmentation datasets, which demonstrate the effectiveness of our proposed method, especially for novel categories.

Decoder Instance Segmentation +5

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

1 code implementation CVPR 2024 Bin Xie, Jiale Cao, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang

In this paper, we propose a simple encoder-decoder, named SED, for open-vocabulary semantic segmentation, which comprises a hierarchical encoder-based cost map generation and a gradual fusion decoder with category early rejection.

Decoder Open Vocabulary Semantic Segmentation +2

Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects

no code implementations22 Sep 2023 Feng Yan, Xiaoheng Jiang, Yang Lu, Lisha Cui, Shupan Li, Jiale Cao, Mingliang Xu, DaCheng Tao

To this end, we develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure.

Decoder Defect Detection +1

CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation

no code implementations22 Sep 2023 Xiaoheng Jiang, Kaiyi Guo, Yang Lu, Feng Yan, Hao liu, Jiale Cao, Mingliang Xu, DaCheng Tao

To address these issues, we propose a transformer network with multi-stage CNN (Convolutional Neural Network) feature injection for surface defect segmentation, which is a UNet-like structure named CINFormer.

Defect Detection

A Spatial-Temporal Deformable Attention based Framework for Breast Lesion Detection in Videos

1 code implementation9 Sep 2023 Chao Qin, Jiale Cao, Huazhu Fu, Rao Muhammad Anwer, Fahad Shahbaz Khan

Existing video-based breast lesion detection approaches typically perform temporal feature aggregation of deep backbone features based on the self-attention operation.

Decoder Lesion Detection

DFormer: Diffusion-guided Transformer for Universal Image Segmentation

1 code implementation6 Jun 2023 Hefeng Wang, Jiale Cao, Rao Muhammad Anwer, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang

Our DFormer outperforms the recent diffusion-based panoptic segmentation method Pix2Seq-D with a gain of 3. 6% on MS COCO val2017 set.

Decoder Denoising +4

Transformer-based stereo-aware 3D object detection from binocular images

no code implementations24 Apr 2023 Hanqing Sun, Yanwei Pang, Jiale Cao, Jin Xie, Xuelong Li

In this paper, we explore the model design of Transformers in binocular 3D object detection, focusing particularly on extracting and encoding task-specific image correspondence information.

3D Object Detection Object +1

LEAPS: End-to-End One-Step Person Search With Learnable Proposals

no code implementations21 Mar 2023 Zhiqiang Dong, Jiale Cao, Rao Muhammad Anwer, Jin Xie, Fahad Khan, Yanwei Pang

Given a set of sparse and learnable proposals, LEAPS employs a dynamic person search head to directly perform person detection and corresponding re-id feature generation without non-maximum suppression post-processing.

Human Detection Person Search

3D Vision with Transformers: A Survey

1 code implementation8 Aug 2022 Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang

The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field.

Pose Estimation Survey

PSTR: End-to-End One-Step Person Search With Transformers

1 code implementation CVPR 2022 Jiale Cao, Yanwei Pang, Rao Muhammad Anwer, Hisham Cholakkal, Jin Xie, Mubarak Shah, Fahad Shahbaz Khan

We propose a novel one-step transformer-based person search framework, PSTR, that jointly performs person detection and re-identification (re-id) in a single architecture.

Decoder Human Detection +1

Video Instance Segmentation via Multi-scale Spatio-temporal Split Attention Transformer

1 code implementation24 Mar 2022 Omkar Thawakar, Sanath Narayan, Jiale Cao, Hisham Cholakkal, Rao Muhammad Anwer, Muhammad Haris Khan, Salman Khan, Michael Felsberg, Fahad Shahbaz Khan

When using the ResNet50 backbone, our MS-STS achieves a mask AP of 50. 1 %, outperforming the best reported results in literature by 2. 7 % and by 4. 8 % at higher overlap threshold of AP_75, while being comparable in model size and speed on Youtube-VIS 2019 val.

Instance Segmentation Semantic Segmentation +2

Shape Prior Non-Uniform Sampling Guided Real-time Stereo 3D Object Detection

no code implementations18 Jun 2021 Aqi Gao, Jiale Cao, Yanwei Pang

Compared with the baseline RTS3D, our proposed method has 2. 57% improvement on AP3d almost without extra network parameters.

3D Object Detection Object +1

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection

1 code implementation3 Dec 2020 Tiancai Wang, Tong Yang, Jiale Cao, Xiangyu Zhang

Object detectors usually achieve promising results with the supervision of complete instance annotations.

MULTI-VIEW LEARNING Object +4

From Handcrafted to Deep Features for Pedestrian Detection: A Survey

2 code implementations1 Oct 2020 Jiale Cao, Yanwei Pang, Jin Xie, Fahad Shahbaz Khan, Ling Shao

In addition to single-spectral pedestrian detection, we also review multi-spectral pedestrian detection, which provides more robust features for illumination variance.

Pedestrian Detection Survey

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

1 code implementation ECCV 2020 Jiale Cao, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao

In terms of real-time capabilities, SipMask outperforms YOLACT with an absolute gain of 3. 0% (mask AP) under similar settings, while operating at comparable speed on a Titan Xp.

object-detection Object Detection +4

NETNet: Neighbor Erasing and Transferring Network for Better Single Shot Object Detection

no code implementations CVPR 2020 Yazhao Li, Yanwei Pang, Jianbing Shen, Jiale Cao, Ling Shao

With this observation, we propose a new Neighbor Erasing and Transferring (NET) mechanism to reconfigure the pyramid features and explore scale-aware features.

Object object-detection +1

Hierarchical Shot Detector

1 code implementation ICCV 2019 Jiale Cao, Yanwei Pang, Jungong Han, Xuelong Li

To further solve the second problem, a hierarchical shot detector (HSD) is proposed, which stacks two ROC modules and one feature enhanced module.

General Classification object-detection +2

Triply Supervised Decoder Networks for Joint Detection and Segmentation

no code implementations CVPR 2019 Jiale Cao, Yanwei Pang, Xuelong. Li

Experimental results on the VOC2007 and VOC2012 datasets demonstrate that the proposed TripleNet is able to improve both the detection and segmentation accuracies without adding extra computational costs.

Decoder object-detection +4

Exploring Multi-Branch and High-Level Semantic Networks for Improving Pedestrian Detection

no code implementations3 Apr 2018 Jiale Cao, Yanwei Pang, Xuelong. Li

In this paper, we propose a multi-branch and high-level semantic network by gradually splitting a base network into multiple different branches.

object-detection Object Detection +1

Learning Multilayer Channel Features for Pedestrian Detection

no code implementations1 Mar 2016 Jiale Cao, Yanwei Pang, Xuelong. Li

For example, CNN classifies these proposals by the full-connected layer features while proposal scores and the features in the inner-layers of CNN are ignored.

Pedestrian Detection

Learning Sampling Distributions for Efficient Object Detection

no code implementations23 Aug 2015 Yanwei Pang, Jiale Cao, Xuelong. Li

Multistage particle windows (MPW), proposed by Gualdi et al., is an algorithm of fast and accurate object detection.

Face Detection Object +2

Cascade Learning by Optimally Partitioning

no code implementations18 Aug 2015 Yanwei Pang, Jiale Cao, Xuelong. Li

iCascade searches the optimal number ri of weak classifiers of each stage i by directly minimizing the computation cost of the cascade.

Face Detection object-detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.