no code implementations • 6 Apr 2024 • Pei Wang, Zhaowei Cai, Hao Yang, Ashwin Swaminathan, R. Manmatha, Stefano Soatto
Existing unified image segmentation models either employ a unified architecture across multiple tasks but use separate weights tailored to each dataset, or apply a single set of weights to multiple datasets but are limited to a single task.
1 code implementation • 11 May 2023 • Zhaoyang Zhang, Yantao Shen, Kunyu Shi, Zhaowei Cai, Jun Fang, Siqi Deng, Hao Yang, Davide Modolo, Zhuowen Tu, Stefano Soatto
We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks which may interfere with each other, resulting in a single model which we named Musketeer.
1 code implementation • CVPR 2023 • Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha
In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks.
Ranked #1 on Referring Expression Segmentation on ReferIt (using extra training data)
1 code implementation • 11 Aug 2022 • Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto
We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks.
no code implementations • 3 Aug 2022 • Gukyeong Kwon, Zhaowei Cai, Avinash Ravichandran, Erhan Bas, Rahul Bhotika, Stefano Soatto
Instead of developing masked language modeling (MLM) and masked image modeling (MIM) independently, we propose to build joint masked vision and language modeling, where the masked signal of one modality is reconstructed with the help from another modality.
1 code implementation • 22 Jul 2022 • Kibok Lee, Hao Yang, Satyaki Chakraborty, Zhaowei Cai, Gurumurthy Swaminathan, Avinash Ravichandran, Onkar Dabeer
Most existing works on few-shot object detection (FSOD) focus on a setting where both pre-training and few-shot learning datasets are from a similar domain.
no code implementations • 12 Apr 2022 • Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika, Stefano Soatto
In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image.
1 code implementation • CVPR 2022 • Pei Wang, Zhaowei Cai, Hao Yang, Gurumurthy Swaminathan, Nuno Vasconcelos, Bernt Schiele, Stefano Soatto
This is enabled by a unified architecture, Omni-DETR, based on the recent progress on student-teacher framework and end-to-end transformer based object detection.
Ranked #14 on Semi-Supervised Object Detection on COCO 2% labeled data
no code implementations • 6 Jan 2022 • Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto
We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model.
1 code implementation • CVPR 2021 • Zhaowei Cai, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Zhuowen Tu, Stefano Soatto
We present a plug-in replacement for batch normalization (BN) called exponential moving average normalization (EMAN), which improves the performance of existing student-teacher based self- and semi-supervised learning techniques.
Self-Supervised Learning Semi-Supervised Image Classification
2 code implementations • CVPR 2020 • Zhaowei Cai, Nuno Vasconcelos
Low-precision networks, with weights and activations quantized to low bit-width, are widely used to accelerate inference on edge devices.
4 code implementations • 24 Jun 2019 • Zhaowei Cai, Nuno Vasconcelos
In object detection, the intersection over union (IoU) threshold is frequently used to define positives/negatives.
Ranked #4 on Instance Segmentation on BDD100K val
1 code implementation • CVPR 2019 • Xudong Wang, Zhaowei Cai, Dashan Gao, Nuno Vasconcelos
Experiments, on a newly established universal object detection benchmark of 11 diverse datasets, show that the proposed detector outperforms a bank of individual detectors, a multi-domain detector, and a baseline universal detector, with a 1. 3x parameter increase over a single-domain baseline detector.
8 code implementations • CVPR 2018 • Zhaowei Cai, Nuno Vasconcelos
In object detection, an intersection over union (IoU) threshold is required to define positives and negatives.
Ranked #4 on 2D Object Detection on SARDet-100K
1 code implementation • CVPR 2017 • Zhaowei Cai, Xiaodong He, Jian Sun, Nuno Vasconcelos
The problem of quantizing the activations of a deep neural network is considered.
1 code implementation • 25 Jul 2016 • Zhaowei Cai, Quanfu Fan, Rogerio S. Feris, Nuno Vasconcelos
A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection.
Ranked #24 on Pedestrian Detection on Caltech
no code implementations • 13 Nov 2015 • Longyin Wen, Dawei Du, Zhaowei Cai, Zhen Lei, Ming-Ching Chang, Honggang Qi, Jongwoo Lim, Ming-Hsuan Yang, Siwei Lyu
In this work, we perform a comprehensive quantitative study on the effects of object detection accuracy to the overall MOT performance, using the new large-scale University at Albany DETection and tRACking (UA-DETRAC) benchmark dataset.
no code implementations • ICCV 2015 • Zhaowei Cai, Mohammad Saberian, Nuno Vasconcelos
CompACT cascades are shown to seek an optimal trade-off between accuracy and complexity by pushing features of higher complexity to the later cascade stages, where only a few difficult candidate patches remain to be classified.
Ranked #26 on Pedestrian Detection on Caltech