Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance

1 code implementation24 Feb 2022 Zhuoning Yuan, Yuexin Wu, Zihao Qiu, Xianzhi Du, Lijun Zhang, Denny Zhou, Tianbao Yang

From the optimization perspective, we explain why existing methods such as SimCLR requires a large batch size in order to achieve a satisfactory result.

Auto-scaling Vision Transformers without Training

1 code implementation ICLR 2022 Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou

The motivation comes from two pain spots: 1) the lack of efficient and principled methods for designing and scaling ViTs; 2) the tremendous computational cost of training ViT that is much heavier than its convolution counterpart.

A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation

no code implementations17 Dec 2021 Wuyang Chen, Xianzhi Du, Fan Yang, Lucas Beyer, Xiaohua Zhai, Tsung-Yi Lin, Huizhong Chen, Jing Li, Xiaodan Song, Zhangyang Wang, Denny Zhou

In this paper, we comprehensively study three architecture design choices on ViT -- spatial reduction, doubled channels, and multiscale features -- and demonstrate that a vanilla ViT architecture can fulfill this goal without handcrafting multiscale features, maintaining the original ViT design philosophy.

Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text

no code implementations14 Dec 2021 Qing Li, Boqing Gong, Yin Cui, Dan Kondratyuk, Xianzhi Du, Ming-Hsuan Yang, Matthew Brown

The experiments show that the resultant unified foundation transformer works surprisingly well on both the vision-only and text-only tasks, and the proposed knowledge distillation and gradient masking strategy can effectively lift the performance to approach the level of separately-trained models.

Revisiting 3D ResNets for Video Recognition

1 code implementation3 Sep 2021 Xianzhi Du, Yeqing Li, Yin Cui, Rui Qian, Jing Li, Irwan Bello

A recent work from Bello shows that training and scaling strategies may be more significant than model architectures for visual recognition.

Dilated SpineNet for Semantic Segmentation

no code implementations23 Mar 2021 Abdullah Rashwan, Xianzhi Du, Xiaoqi Yin, Jing Li

Scale-permuted networks have shown promising results on object bounding box detection and instance segmentation.

Revisiting ResNets: Improved Training and Scaling Strategies

4 code implementations NeurIPS 2021 Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph

Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

5 code implementations CVPR 2020 Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song

We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search.

TW-SMNet: Deep Multitask Learning of Tele-Wide Stereo Matching

no code implementations11 Jun 2019 Mostafa El-Khamy, Haoyu Ren, Xianzhi Du, Jungwon Lee

In this paper, we introduce the problem of estimating the real world depth of elements in a scene captured by two cameras with different field of views, where the first field of view (FOV) is a Wide FOV (WFOV) captured by a wide angle lens, and the second FOV is contained in the first FOV and is captured by a tele zoom lens.

AMNet: Deep Atrous Multiscale Stereo Disparity Estimation Networks

no code implementations19 Apr 2019 Xianzhi Du, Mostafa El-Khamy, Jungwon Lee

A stacked atrous multiscale network is proposed to aggregate rich multiscale contextual information from the cost volume which allows for estimating the disparity with high accuracy at multiple scales.

Fused Deep Neural Networks for Efficient Pedestrian Detection

no code implementations2 May 2018 Xianzhi Du, Mostafa El-Khamy, Vlad I. Morariu, Jungwon Lee, Larry Davis

The classification system further classifies the generated candidates based on opinions of multiple deep verification networks and a fusion network which utilizes a novel soft-rejection fusion method to adjust the confidence in the detection results.

Boundary-sensitive Network for Portrait Segmentation

no code implementations22 Dec 2017 Xianzhi Du, Xiaolong Wang, Dawei Li, Jingwen Zhu, Serafettin Tasci, Cameron Upright, Stephen Walsh, Larry Davis

Compared to the general semantic segmentation problem, portrait segmentation has higher precision requirement on boundary area.

Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection

no code implementations11 Oct 2016 Xianzhi Du, Mostafa El-Khamy, Jungwon Lee, Larry S. Davis

A single shot deep convolutional network is trained as a object detector to generate all possible pedestrian candidates of different sizes and occlusions.

A Graphical Model Approach for Matching Partial Signatures

no code implementations CVPR 2015 Xianzhi Du, David Doermann, Wael Abd-Almageed

In this paper, we present a novel partial signature matching method using graphical models.

