Search Results for author: Zhaowei Cai

Found 18 papers, 12 papers with code

Mixed-Query Transformer: A Unified Image Segmentation Architecture

no code implementations • 6 Apr 2024 • Pei Wang, Zhaowei Cai, Hao Yang, Ashwin Swaminathan, R. Manmatha, Stefano Soatto

Existing unified image segmentation models either employ a unified architecture across multiple tasks but use separate weights tailored to each dataset, or apply a single set of weights to multiple datasets but are limited to a single task.

Data Augmentation Image Segmentation +2

Paper
Add Code

Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts

1 code implementation • 11 May 2023 • Zhaoyang Zhang, Yantao Shen, Kunyu Shi, Zhaowei Cai, Jun Fang, Siqi Deng, Hao Yang, Davide Modolo, Zhuowen Tu, Stefano Soatto

We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks which may interfere with each other, resulting in a single model which we named Musketeer.

Language Modelling

Paper
Code

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

1 code implementation • CVPR 2023 • Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha

In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks.

Ranked #1 on Referring Expression Segmentation on ReferIt (using extra training data)

Image Segmentation Quantization +6

108

Paper
Code

Semi-supervised Vision Transformers at Scale

1 code implementation • 11 Aug 2022 • Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto

We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks.

Ranked #3 on Semi-Supervised Image Classification on ImageNet - 10% labeled data

Inductive Bias Semi-Supervised Image Classification

Paper
Code

Masked Vision and Language Modeling for Multi-modal Representation Learning

no code implementations • 3 Aug 2022 • Gukyeong Kwon, Zhaowei Cai, Avinash Ravichandran, Erhan Bas, Rahul Bhotika, Stefano Soatto

Instead of developing masked language modeling (MLM) and masked image modeling (MIM) independently, we propose to build joint masked vision and language modeling, where the masked signal of one modality is reconstructed with the help from another modality.

Language Modelling Masked Language Modeling +1

Paper
Add Code

Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark

1 code implementation • 22 Jul 2022 • Kibok Lee, Hao Yang, Satyaki Chakraborty, Zhaowei Cai, Gurumurthy Swaminathan, Avinash Ravichandran, Onkar Dabeer

Most existing works on few-shot object detection (FSOD) focus on a setting where both pre-training and few-shot learning datasets are from a similar domain.

Few-Shot Learning Few-Shot Object Detection +1

Paper
Code

X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks

no code implementations • 12 Apr 2022 • Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika, Stefano Soatto

In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image.

Paper
Add Code

Omni-DETR: Omni-Supervised Object Detection with Transformers

1 code implementation • CVPR 2022 • Pei Wang, Zhaowei Cai, Hao Yang, Gurumurthy Swaminathan, Nuno Vasconcelos, Bernt Schiele, Stefano Soatto

This is enabled by a unified architecture, Omni-DETR, based on the recent progress on student-teacher framework and end-to-end transformer based object detection.

Ranked #14 on Semi-Supervised Object Detection on COCO 2% labeled data

Object object-detection +2

Paper
Code

Contrastive Neighborhood Alignment

no code implementations • 6 Jan 2022 • Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto

We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model.

Paper
Add Code

Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning

1 code implementation • CVPR 2021 • Zhaowei Cai, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Zhuowen Tu, Stefano Soatto

We present a plug-in replacement for batch normalization (BN) called exponential moving average normalization (EMAN), which improves the performance of existing student-teacher based self- and semi-supervised learning techniques.

Ranked #2 on Semi-Supervised Image Classification on ImageNet - 0.2% labeled data

Self-Supervised Learning Semi-Supervised Image Classification

Paper
Code

Rethinking Differentiable Search for Mixed-Precision Neural Networks

2 code implementations • CVPR 2020 • Zhaowei Cai, Nuno Vasconcelos

Low-precision networks, with weights and activations quantized to low bit-width, are widely used to accelerate inference on edge devices.

Combinatorial Optimization

Paper
Code

Cascade R-CNN: High Quality Object Detection and Instance Segmentation

4 code implementations • 24 Jun 2019 • Zhaowei Cai, Nuno Vasconcelos

In object detection, the intersection over union (IoU) threshold is frequently used to define positives/negatives.

Ranked #4 on Instance Segmentation on BDD100K val

Instance Segmentation object-detection +3

27,806

Paper
Code

Towards Universal Object Detection by Domain Attention

1 code implementation • CVPR 2019 • Xudong Wang, Zhaowei Cai, Dashan Gao, Nuno Vasconcelos

Experiments, on a newly established universal object detection benchmark of 11 diverse datasets, show that the proposed detector outperforms a bank of individual detectors, a multi-domain detector, and a baseline universal detector, with a 1. 3x parameter increase over a single-domain baseline detector.

Object object-detection +1

Paper
Code

Cascade R-CNN: Delving into High Quality Object Detection

8 code implementations • CVPR 2018 • Zhaowei Cai, Nuno Vasconcelos

In object detection, an intersection over union (IoU) threshold is required to define positives and negatives.

Ranked #4 on 2D Object Detection on SARDet-100K

Object Object Detection +1

27,806

Paper
Code

Deep Learning with Low Precision by Half-wave Gaussian Quantization

1 code implementation • CVPR 2017 • Zhaowei Cai, Xiaodong He, Jian Sun, Nuno Vasconcelos

The problem of quantizing the activations of a deep neural network is considered.

Quantization

118

Paper
Code

A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection

1 code implementation • 25 Jul 2016 • Zhaowei Cai, Quanfu Fan, Rogerio S. Feris, Nuno Vasconcelos

A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection.

Ranked #24 on Pedestrian Detection on Caltech

Face Detection Feature Upsampling +4

402

Paper
Code

UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking

no code implementations • 13 Nov 2015 • Longyin Wen, Dawei Du, Zhaowei Cai, Zhen Lei, Ming-Ching Chang, Honggang Qi, Jongwoo Lim, Ming-Hsuan Yang, Siwei Lyu

In this work, we perform a comprehensive quantitative study on the effects of object detection accuracy to the overall MOT performance, using the new large-scale University at Albany DETection and tRACking (UA-DETRAC) benchmark dataset.

Multi-Object Tracking Object +2

Paper
Add Code

Learning Complexity-Aware Cascades for Deep Pedestrian Detection

no code implementations • ICCV 2015 • Zhaowei Cai, Mohammad Saberian, Nuno Vasconcelos

CompACT cascades are shown to seek an optimal trade-off between accuracy and complexity by pushing features of higher complexity to the later cascade stages, where only a few difficult candidate patches remain to be classified.

Ranked #26 on Pedestrian Detection on Caltech

Pedestrian Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.