1 code implementation • 5 Jan 2023 • Hu Xu, Saining Xie, Po-Yao Huang, Licheng Yu, Russell Howes, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer
Large vision-language models are generally applicable to many downstream tasks, but come at an exorbitant training cost that only large institutions can afford.
6 code implementations • 2 Jan 2023 • Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie
This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation.
Ranked #2 on
Object Detection
on COCO 2017 val
(box AP metric)
1 code implementation • 19 Dec 2022 • William Peebles, Saining Xie
We explore a new class of diffusion models based on the transformer architecture.
Ranked #2 on
Image Generation
on ImageNet 512x512
1 code implementation • 13 Oct 2022 • Ronghang Hu, Shoubhik Debnath, Saining Xie, Xinlei Chen
Masked Autoencoding (MAE) has emerged as an effective approach for pre-training representations across multiple domains.
40 code implementations • CVPR 2022 • Zhuang Liu, Hanzi Mao, Chao-yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie
The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.
Ranked #4 on
Domain Generalization
on ImageNet-Sketch
(using extra training data)
no code implementations • 27 Dec 2021 • Ajinkya Tejankar, Maziar Sanjabi, Bichen Wu, Saining Xie, Madian Khabsa, Hamed Pirsiavash, Hamed Firooz
In this paper, we focus on teasing out what parts of the language supervision are essential for training zero-shot image classification models.
1 code implementation • 23 Dec 2021 • Norman Mu, Alexander Kirillov, David Wagner, Saining Xie
Across ImageNet and a battery of additional datasets, we find that SLIP improves accuracy by a large margin.
4 code implementations • CVPR 2022 • Chen Wei, Haoqi Fan, Saining Xie, Chao-yuan Wu, Alan Yuille, Christoph Feichtenhofer
We present Masked Feature Prediction (MaskFeat) for self-supervised pre-training of video models.
Ranked #4 on
Action Recognition
on AVA v2.2
(using extra training data)
1 code implementation • 22 Nov 2021 • Yanghao Li, Saining Xie, Xinlei Chen, Piotr Dollar, Kaiming He, Ross Girshick
The complexity of object detection methods can make this benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.
35 code implementations • CVPR 2022 • Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Ranked #1 on
Out-of-Distribution Generalization
on ImageNet-W
1 code implementation • ICCV 2021 • Ji Hou, Saining Xie, Benjamin Graham, Angela Dai, Matthias Nießner
Inspired by these advances in geometric understanding, we aim to imbue image-based perception with representations learned under geometric constraints.
7 code implementations • ICCV 2021 • Xinlei Chen, Saining Xie, Kaiming He
In this work, we go back to basics and investigate the effects of several fundamental components for training self-supervised ViT.
Ranked #1 on
Out-of-Distribution Generalization
on ImageNet-W
Out-of-Distribution Generalization
Self-Supervised Image Classification
+1
1 code implementation • NeurIPS 2021 • Eric Mintun, Alexander Kirillov, Saining Xie
Invariance to a broad array of image corruptions, such as warping, noise, or color shifts, is an important aspect of building robust models in computer vision.
no code implementations • 1 Jan 2021 • Eric Mintun, Alexander Kirillov, Saining Xie
Invariance to a broad array of image corruptions, such as warping, noise, or color shifts, is an important aspect of building robust models in computer vision.
1 code implementation • CVPR 2021 • Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie
The rapid progress in 3D scene understanding has come with growing demand for data; however, collecting and annotating 3D scenes (e. g. point clouds) are notoriously hard.
Ranked #2 on
3D Semantic Segmentation
on ScanNet200
1 code implementation • ECCV 2020 • Saining Xie, Jiatao Gu, Demi Guo, Charles R. Qi, Leonidas J. Guibas, Or Litany
To this end, we select a suite of diverse datasets and tasks to measure the effect of unsupervised pre-training on a large source set of 3D scenes.
3 code implementations • ICML 2020 • Jiaxuan You, Jure Leskovec, Kaiming He, Saining Xie
Neural networks are often represented as graphs of connections between neurons.
1 code implementation • CVPR 2020 • Alvin Wan, Xiaoliang Dai, Peizhao Zhang, Zijian He, Yuandong Tian, Saining Xie, Bichen Wu, Matthew Yu, Tao Xu, Kan Chen, Peter Vajda, Joseph E. Gonzalez
We propose a masking mechanism for feature map reuse, so that memory and computational costs stay nearly constant as the search space expands.
Ranked #67 on
Neural Architecture Search
on ImageNet
2 code implementations • ECCV 2020 • Chenxi Liu, Piotr Dollár, Kaiming He, Ross Girshick, Alan Yuille, Saining Xie
Existing neural network architectures in computer vision -- whether designed by humans or by machines -- were typically found using both images and their associated labels.
44 code implementations • CVPR 2020 • Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick
This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning.
Ranked #11 on
Contrastive Learning
on imagenet-1k
3 code implementations • ICLR 2020 • Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis
The long-tail distribution of the visual world poses great challenges for deep learning based classification models on how to handle the class imbalance problem.
Ranked #3 on
Long-tail learning with class descriptors
on CUB-LT
no code implementations • 25 Sep 2019 • Linnan Wang, Saining Xie, Teng Li, Rodrigo Fonseca, Yuandong Tian
As a result, using manually designed action space to perform NAS often leads to sample-inefficient explorations of architectures and thus can be sub-optimal.
1 code implementation • 17 Jun 2019 • Linnan Wang, Saining Xie, Teng Li, Rodrigo Fonseca, Yuandong Tian
To improve the sample efficiency, this paper proposes Latent Action Neural Architecture Search (LaNAS), which learns actions to recursively partition the search space into good or bad regions that contain networks with similar performance metrics.
4 code implementations • ICCV 2019 • Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, Piotr Dollár
Compared to current methodologies of comparing point and curve estimates of model families, distribution estimates paint a more complete picture of the entire design landscape.
10 code implementations • ICCV 2019 • Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He
In this paper, we explore a more diverse set of connectivity patterns through the lens of randomly wired neural networks.
Ranked #115 on
Neural Architecture Search
on ImageNet
1 code implementation • 1 Jan 2019 • Linnan Wang, Saining Xie, Teng Li, Rodrigo Fonseca, Yuandong Tian
To improve the sample efficiency, this paper proposes Latent Action Neural Architecture Search (LaNAS), which learns actions to recursively partition the search space into good or bad regions that contain networks with similar performance metrics.
Ranked #15 on
Image Classification
on CIFAR-10
no code implementations • CVPR 2018 • Saining Xie, Sainan Liu, Zeyu Chen, Zhuowen Tu
We tackle the problem of point cloud recognition.
1 code implementation • ECCV 2018 • Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, Kevin Murphy
Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification.
Ranked #23 on
Action Recognition
on UCF101
(using extra training data)
48 code implementations • CVPR 2017 • Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He
Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set.
Ranked #3 on
Image Classification
on GasHisSDB
no code implementations • 23 Nov 2015 • Saining Xie, Xun Huang, Zhuowen Tu
Current practice in convolutional neural networks (CNN) remains largely bottom-up and the role of top-down process in CNN for pattern analysis and visual inference is not very clear.
no code implementations • CVPR 2015 • Saining Xie, Tianbao Yang, Xiaoyu Wang, Yuanqing Lin
We demonstrate the success of the proposed framework on two small-scale fine-grained datasets (Stanford Dogs and Stanford Cars) and on a large-scale car dataset that we collected.
16 code implementations • ICCV 2015 • Saining Xie, Zhuowen Tu
We develop a new edge detection algorithm that tackles two important issues in this long-standing vision problem: (1) holistic image training and prediction; and (2) multi-scale and multi-level feature learning.
1 code implementation • 18 Sep 2014 • Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu
Our proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent.
Ranked #25 on
Image Classification
on SVHN