Search Results for author: Zhaowei Cai

Found 17 papers, 11 papers with code

Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts

no code implementations11 May 2023 Zhaoyang Zhang, Yantao Shen, Kunyu Shi, Zhaowei Cai, Jun Fang, Siqi Deng, Hao Yang, Davide Modolo, Zhuowen Tu, Stefano Soatto

We present a sequence-to-sequence vision-language model whose parameters are jointly trained on all tasks (all for one) and fully shared among multiple tasks (one for all), resulting in a single model which we named Musketeer.

Language Modelling

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

1 code implementation CVPR 2023 Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha

In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks.

 Ranked #1 on Referring Expression Segmentation on ReferIt (using extra training data)

Image Segmentation Quantization +6

Semi-supervised Vision Transformers at Scale

1 code implementation11 Aug 2022 Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto

We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks.

Inductive Bias Semi-Supervised Image Classification

Masked Vision and Language Modeling for Multi-modal Representation Learning

no code implementations3 Aug 2022 Gukyeong Kwon, Zhaowei Cai, Avinash Ravichandran, Erhan Bas, Rahul Bhotika, Stefano Soatto

Instead of developing masked language modeling (MLM) and masked image modeling (MIM) independently, we propose to build joint masked vision and language modeling, where the masked signal of one modality is reconstructed with the help from another modality.

Language Modelling Masked Language Modeling +1

Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark

1 code implementation22 Jul 2022 Kibok Lee, Hao Yang, Satyaki Chakraborty, Zhaowei Cai, Gurumurthy Swaminathan, Avinash Ravichandran, Onkar Dabeer

Most existing works on few-shot object detection (FSOD) focus on a setting where both pre-training and few-shot learning datasets are from a similar domain.

Few-Shot Learning Few-Shot Object Detection +1

X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks

no code implementations12 Apr 2022 Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika, Stefano Soatto

In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image.

Omni-DETR: Omni-Supervised Object Detection with Transformers

1 code implementation CVPR 2022 Pei Wang, Zhaowei Cai, Hao Yang, Gurumurthy Swaminathan, Nuno Vasconcelos, Bernt Schiele, Stefano Soatto

This is enabled by a unified architecture, Omni-DETR, based on the recent progress on student-teacher framework and end-to-end transformer based object detection.

Object object-detection +2

Contrastive Neighborhood Alignment

no code implementations6 Jan 2022 Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto

We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model.

Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning

1 code implementation CVPR 2021 Zhaowei Cai, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Zhuowen Tu, Stefano Soatto

We present a plug-in replacement for batch normalization (BN) called exponential moving average normalization (EMAN), which improves the performance of existing student-teacher based self- and semi-supervised learning techniques.

Self-Supervised Learning Semi-Supervised Image Classification

Rethinking Differentiable Search for Mixed-Precision Neural Networks

2 code implementations CVPR 2020 Zhaowei Cai, Nuno Vasconcelos

Low-precision networks, with weights and activations quantized to low bit-width, are widely used to accelerate inference on edge devices.

Combinatorial Optimization

Cascade R-CNN: High Quality Object Detection and Instance Segmentation

4 code implementations24 Jun 2019 Zhaowei Cai, Nuno Vasconcelos

In object detection, the intersection over union (IoU) threshold is frequently used to define positives/negatives.

Instance Segmentation object-detection +4

Towards Universal Object Detection by Domain Attention

1 code implementation CVPR 2019 Xudong Wang, Zhaowei Cai, Dashan Gao, Nuno Vasconcelos

Experiments, on a newly established universal object detection benchmark of 11 diverse datasets, show that the proposed detector outperforms a bank of individual detectors, a multi-domain detector, and a baseline universal detector, with a 1. 3x parameter increase over a single-domain baseline detector.

Object object-detection +1

Cascade R-CNN: Delving into High Quality Object Detection

8 code implementations CVPR 2018 Zhaowei Cai, Nuno Vasconcelos

In object detection, an intersection over union (IoU) threshold is required to define positives and negatives.

Object Object Detection +1

A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection

1 code implementation25 Jul 2016 Zhaowei Cai, Quanfu Fan, Rogerio S. Feris, Nuno Vasconcelos

A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection.

Face Detection Object +3

UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking

no code implementations13 Nov 2015 Longyin Wen, Dawei Du, Zhaowei Cai, Zhen Lei, Ming-Ching Chang, Honggang Qi, Jongwoo Lim, Ming-Hsuan Yang, Siwei Lyu

In this work, we perform a comprehensive quantitative study on the effects of object detection accuracy to the overall MOT performance, using the new large-scale University at Albany DETection and tRACking (UA-DETRAC) benchmark dataset.

Multi-Object Tracking Object +2

Learning Complexity-Aware Cascades for Deep Pedestrian Detection

no code implementations ICCV 2015 Zhaowei Cai, Mohammad Saberian, Nuno Vasconcelos

CompACT cascades are shown to seek an optimal trade-off between accuracy and complexity by pushing features of higher complexity to the later cascade stages, where only a few difficult candidate patches remain to be classified.

Pedestrian Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.