Search Results for author: Saining Xie

Found 33 papers, 27 papers with code

CiT: Curation in Training for Effective Vision-Language Data

1 code implementation5 Jan 2023 Hu Xu, Saining Xie, Po-Yao Huang, Licheng Yu, Russell Howes, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer

Large vision-language models are generally applicable to many downstream tasks, but come at an exorbitant training cost that only large institutions can afford.

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

6 code implementations2 Jan 2023 Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie

This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation.

Ranked #2 on Object Detection on COCO 2017 val (box AP metric)

Object Detection Representation Learning +2

Scalable Diffusion Models with Transformers

1 code implementation19 Dec 2022 William Peebles, Saining Xie

We explore a new class of diffusion models based on the transformer architecture.

Image Generation

Exploring Long-Sequence Masked Autoencoders

1 code implementation13 Oct 2022 Ronghang Hu, Shoubhik Debnath, Saining Xie, Xinlei Chen

Masked Autoencoding (MAE) has emerged as an effective approach for pre-training representations across multiple domains.

Object Detection Semantic Segmentation

A ConvNet for the 2020s

40 code implementations CVPR 2022 Zhuang Liu, Hanzi Mao, Chao-yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.

Ranked #4 on Domain Generalization on ImageNet-Sketch (using extra training data)

Domain Generalization Image Classification +2

A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision

no code implementations27 Dec 2021 Ajinkya Tejankar, Maziar Sanjabi, Bichen Wu, Saining Xie, Madian Khabsa, Hamed Pirsiavash, Hamed Firooz

In this paper, we focus on teasing out what parts of the language supervision are essential for training zero-shot image classification models.

Classification Image Captioning +3

SLIP: Self-supervision meets Language-Image Pre-training

1 code implementation23 Dec 2021 Norman Mu, Alexander Kirillov, David Wagner, Saining Xie

Across ImageNet and a battery of additional datasets, we find that SLIP improves accuracy by a large margin.

Multi-Task Learning Representation Learning +1

Benchmarking Detection Transfer Learning with Vision Transformers

1 code implementation22 Nov 2021 Yanghao Li, Saining Xie, Xinlei Chen, Piotr Dollar, Kaiming He, Ross Girshick

The complexity of object detection methods can make this benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.

Benchmarking object-detection +3

Pri3D: Can 3D Priors Help 2D Representation Learning?

1 code implementation ICCV 2021 Ji Hou, Saining Xie, Benjamin Graham, Angela Dai, Matthias Nießner

Inspired by these advances in geometric understanding, we aim to imbue image-based perception with representations learned under geometric constraints.

Contrastive Learning Instance Segmentation +4

On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness

1 code implementation NeurIPS 2021 Eric Mintun, Alexander Kirillov, Saining Xie

Invariance to a broad array of image corruptions, such as warping, noise, or color shifts, is an important aspect of building robust models in computer vision.

Is Robustness Robust? On the interaction between augmentations and corruptions

no code implementations1 Jan 2021 Eric Mintun, Alexander Kirillov, Saining Xie

Invariance to a broad array of image corruptions, such as warping, noise, or color shifts, is an important aspect of building robust models in computer vision.

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts

1 code implementation CVPR 2021 Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie

The rapid progress in 3D scene understanding has come with growing demand for data; however, collecting and annotating 3D scenes (e. g. point clouds) are notoriously hard.

3D Semantic Segmentation Instance Segmentation +1

PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding

1 code implementation ECCV 2020 Saining Xie, Jiatao Gu, Demi Guo, Charles R. Qi, Leonidas J. Guibas, Or Litany

To this end, we select a suite of diverse datasets and tasks to measure the effect of unsupervised pre-training on a large source set of 3D scenes.

Point Cloud Pre-training Representation Learning +3

Graph Structure of Neural Networks

3 code implementations ICML 2020 Jiaxuan You, Jure Leskovec, Kaiming He, Saining Xie

Neural networks are often represented as graphs of connections between neurons.

Are Labels Necessary for Neural Architecture Search?

2 code implementations ECCV 2020 Chenxi Liu, Piotr Dollár, Kaiming He, Ross Girshick, Alan Yuille, Saining Xie

Existing neural network architectures in computer vision -- whether designed by humans or by machines -- were typically found using both images and their associated labels.

Neural Architecture Search

Neural Architecture Search by Learning Action Space for Monte Carlo Tree Search

no code implementations25 Sep 2019 Linnan Wang, Saining Xie, Teng Li, Rodrigo Fonseca, Yuandong Tian

As a result, using manually designed action space to perform NAS often leads to sample-inefficient explorations of architectures and thus can be sub-optimal.

Neural Architecture Search

Sample-Efficient Neural Architecture Search by Learning Action Space

1 code implementation17 Jun 2019 Linnan Wang, Saining Xie, Teng Li, Rodrigo Fonseca, Yuandong Tian

To improve the sample efficiency, this paper proposes Latent Action Neural Architecture Search (LaNAS), which learns actions to recursively partition the search space into good or bad regions that contain networks with similar performance metrics.

Neural Architecture Search

On Network Design Spaces for Visual Recognition

4 code implementations ICCV 2019 Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, Piotr Dollár

Compared to current methodologies of comparing point and curve estimates of model families, distribution estimates paint a more complete picture of the entire design landscape.

Neural Architecture Search

Sample-Efficient Neural Architecture Search by Learning Action Space for Monte Carlo Tree Search

1 code implementation1 Jan 2019 Linnan Wang, Saining Xie, Teng Li, Rodrigo Fonseca, Yuandong Tian

To improve the sample efficiency, this paper proposes Latent Action Neural Architecture Search (LaNAS), which learns actions to recursively partition the search space into good or bad regions that contain networks with similar performance metrics.

Image Classification Neural Architecture Search

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

1 code implementation ECCV 2018 Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, Kevin Murphy

Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification.

Ranked #23 on Action Recognition on UCF101 (using extra training data)

Action Classification Action Detection +6

Top-Down Learning for Structured Labeling with Convolutional Pseudoprior

no code implementations23 Nov 2015 Saining Xie, Xun Huang, Zhuowen Tu

Current practice in convolutional neural networks (CNN) remains largely bottom-up and the role of top-down process in CNN for pattern analysis and visual inference is not very clear.

Hyper-Class Augmented and Regularized Deep Learning for Fine-Grained Image Classification

no code implementations CVPR 2015 Saining Xie, Tianbao Yang, Xiaoyu Wang, Yuanqing Lin

We demonstrate the success of the proposed framework on two small-scale fine-grained datasets (Stanford Dogs and Stanford Cars) and on a large-scale car dataset that we collected.

Fine-Grained Image Classification General Classification +3

Holistically-Nested Edge Detection

16 code implementations ICCV 2015 Saining Xie, Zhuowen Tu

We develop a new edge detection algorithm that tackles two important issues in this long-standing vision problem: (1) holistic image training and prediction; and (2) multi-scale and multi-level feature learning.

Boundary Detection Edge Detection

Deeply-Supervised Nets

1 code implementation18 Sep 2014 Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu

Our proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent.

Classification General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.