Search Results for author: Alan Yuille

Found 245 papers, 128 papers with code

Boundary Detection Benchmarking: Beyond F-Measures

no code implementations • CVPR 2013 • Xiaodi Hou, Alan Yuille, Christof Koch

For an ill-posed problem like boundary detection, human labeled datasets play a critical role.

Paper
Add Code

Bottom-Up Segmentation for Top-Down Detection

no code implementations • CVPR 2013 • Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun

When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP.

Clustering object-detection +3

Paper
Add Code

The Role of Context for Object Detection and Semantic Segmentation in the Wild

no code implementations • CVPR 2014 • Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Nam-Gyu Cho, Seong-Whan Lee, Sanja Fidler, Raquel Urtasun, Alan Yuille

In this paper we study the role of context in existing state-of-the-art detection and segmentation approaches.

object-detection Object Detection +2

Paper
Add Code

Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts

no code implementations • CVPR 2014 • Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, Alan Yuille

Our model automatically decouples the holistic object or body parts from the model when they are hard to detect.

Object Semantic Part Detection

Paper
Add Code

Parsing Semantic Parts of Cars Using Graphical Models and Segment Appearance Consistency

no code implementations • 9 Jun 2014 • Wenhao Lu, Xiaochen Lian, Alan Yuille

A novel mixture of graphical models is proposed, which dynamically couples the landmarks to a hierarchy of segments.

Paper
Add Code

Human-Machine CRFs for Identifying Bottlenecks in Holistic Scene Understanding

no code implementations • 16 Jun 2014 • Roozbeh Mottaghi, Sanja Fidler, Alan Yuille, Raquel Urtasun, Devi Parikh

Recent trends in image understanding have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning, and local appearance based classifiers.

Object object-detection +4

Paper
Add Code

Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations

no code implementations • NeurIPS 2014 • Xianjie Chen, Alan Yuille

More precisely, we specify a graphical model for human pose which exploits the fact the local image measurements can be used both to detect parts (or joints) and also to predict the spatial relationships between them (Image Dependent Pairwise Relations).

Ranked #18 on Pose Estimation on Leeds Sports Poses

Pose Estimation

Paper
Add Code

Parsing Occluded People by Flexible Compositions

no code implementations • CVPR 2015 • Xianjie Chen, Alan Yuille

We model humans using a graphical model which has a tree structure building on recent work [32, 6] and exploit the connectivity prior that, even in presence of occlusion, the visible nodes form a connected subtree of the graphical model.

Paper
Add Code

Semantic Part Segmentation using Compositional Model combining Shape and Appearance

no code implementations • CVPR 2015 • Jianyu Wang, Alan Yuille

This is more challenging than standard object detection, object segmentation and pose estimation tasks because semantic parts of animals often have similar appearance and highly varying shapes.

Object object-detection +4

Paper
Add Code

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

2 code implementations • 20 Dec 2014 • Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions.

8k Image Captioning +1

109

Paper
Code

Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images

1 code implementation • ICCV 2015 • Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille

In particular, we propose a transposed weight sharing scheme, which not only improves performance on image captioning, but also makes the model more suitable for the novel concept learning task.

Image Captioning Novel Concepts +1

109

Paper
Code

Joint Object and Part Segmentation using Deep Learned Potentials

no code implementations • ICCV 2015 • Peng Wang, Xiaohui Shen, Zhe Lin, Scott Cohen, Brian Price, Alan Yuille

Segmenting semantic objects from images and parsing them into their respective semantic parts are fundamental steps towards detailed object understanding in computer vision.

Object Segmentation +1

Paper
Add Code

Pose-Guided Human Parsing with Deep Learned Features

no code implementations • 17 Aug 2015 • Fangting Xia, Jun Zhu, Peng Wang, Alan Yuille

Parsing human body into semantic regions is crucial to human-centric analysis.

Human Parsing

Paper
Add Code

Generation and Comprehension of Unambiguous Object Descriptions

1 code implementation • CVPR 2016 • Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, Kevin Murphy

We propose a method that can generate an unambiguous description (known as a referring expression) of a specific object or region in an image, and which can also comprehend or interpret such an expression to infer which object is being described.

Image Captioning Object +1

157

Paper
Code

DOC: Deep OCclusion Estimation From a Single Image

no code implementations • 20 Nov 2015 • Peng Wang, Alan Yuille

In this paper we propose a deep network architecture, called DOC, which acts on a single image, detects object boundaries and estimates the border ownership (i. e. which side of the boundary is foreground and which is background).

Occlusion Estimation

Paper
Add Code

Ground-truth dataset and baseline evaluations for image base-detail separation algorithms

no code implementations • 21 Nov 2015 • Xuan Dong, Boyan Bonev, Weixin Li, Weichao Qiu, Xianjie Chen, Alan Yuille

Base-detail separation is a fundamental computer vision problem consisting of modeling a smooth base layer with the coarse structures, and a detail layer containing the texture-like structures.

Paper
Add Code

Unsupervised learning of object semantic parts from internal states of CNNs by population encoding

1 code implementation • 21 Nov 2015 • Jianyu Wang, Zhishuai Zhang, Cihang Xie, Vittal Premachandran, Alan Yuille

We address the key question of how object part representations can be found from the internal states of CNNs that are trained for high-level tasks, such as object classification.

Clustering Keypoint Detection +1

Paper
Code

Fidelity-Naturalness Evaluation of Single Image Super Resolution

no code implementations • 21 Nov 2015 • Xuan Dong, Yu Zhu, Weixin Li, Lingxi Xie, Alex Wong, Alan Yuille

In this paper, we proposed to use both fidelity (the difference with original images) and naturalness (human visual perception of super resolved images) for evaluation.

Image Quality Assessment Image Super-Resolution

Paper
Add Code

Multi-Instance Visual-Semantic Embedding

no code implementations • 22 Dec 2015 • Zhou Ren, Hailin Jin, Zhe Lin, Chen Fang, Alan Yuille

Visual-semantic embedding models have been recently proposed and shown to be effective for image classification and zero-shot learning, by mapping images into a continuous semantic label space.

General Classification Image Classification +1

Paper
Add Code

InterActive: Inter-Layer Activeness Propagation

no code implementations • CVPR 2016 • Lingxi Xie, Liang Zheng, Jingdong Wang, Alan Yuille, Qi Tian

An increasing number of computer vision tasks can be tackled with deep features, which are the intermediate outputs of a pre-trained Convolutional Neural Network.

Descriptive General Classification

Paper
Add Code

Attention Correctness in Neural Image Captioning

no code implementations • 31 May 2016 • Chenxi Liu, Junhua Mao, Fei Sha, Alan Yuille

Attention mechanisms have recently been introduced in deep learning for various tasks in natural language processing and computer vision.

Image Captioning

Paper
Add Code

Geometric Neural Phrase Pooling: Modeling the Spatial Co-occurrence of Neurons

no code implementations • 21 Jul 2016 • Lingxi Xie, Qi Tian, John Flynn, Jingdong Wang, Alan Yuille

For this, we consider the neurons in the hidden layer as neural words, and construct a set of geometric neural phrases on top of them.

Image Classification

Paper
Add Code

UnrealCV: Connecting Computer Vision to Unreal Engine

1 code implementation • 5 Sep 2016 • Weichao Qiu, Alan Yuille

Computer graphics can not only generate synthetic images and ground truth but it also offers the possibility of constructing virtual worlds in which: (i) an agent can perceive, navigate, and take actions guided by AI algorithms, (ii) properties of the worlds can be modified (e. g., material and reflectance), (iii) physical simulations can be performed, and (iv) algorithms can be learnt and evaluated.

Navigate Physical Simulations

1,830

Paper
Code

DeepSkeleton: Learning Multi-task Scale-associated Deep Side Outputs for Object Skeleton Extraction in Natural Images

1 code implementation • 13 Sep 2016 • Wei Shen, Kai Zhao, Yuan Jiang, Yan Wang, Xiang Bai, Alan Yuille

By observing the relationship between the receptive field sizes of the different layers in the network and the skeleton scales they can capture, we introduce two scale-associated side outputs to each stage of the network.

Multi-Task Learning Object +3

Paper
Code

Symmetric Non-Rigid Structure from Motion for Category-Specific Object Structure Estimation

no code implementations • 22 Sep 2016 • Yuan Gao, Alan Yuille

This paper addresses the estimation of 3D structures of symmetric objects from multiple images of the same object category, e. g. different cars, seen from various viewpoints.

Paper
Add Code

Training and Evaluating Multimodal Word Embeddings with Large-scale Web Annotated Images

no code implementations • NeurIPS 2016 • Junhua Mao, Jiajing Xu, Yushi Jing, Alan Yuille

In this paper, we focus on training and evaluating effective word embeddings with both text and visual information.

Image Retrieval Sentence +1

Paper
Add Code

UnrealStereo: Controlling Hazardous Factors to Analyze Stereo Vision

no code implementations • 14 Dec 2016 • Yi Zhang, Weichao Qiu, Qi Chen, Xiaolin Hu, Alan Yuille

We generate a large synthetic image dataset with automatically computed hazardous regions and analyze algorithms on these regions.

Image Generation

Paper
Add Code

MAT: A Multimodal Attentive Translator for Image Captioning

no code implementations • 18 Feb 2017 • Chang Liu, Fuchun Sun, Changhu Wang, Feng Wang, Alan Yuille

In this way, the sequential representation of an image can be naturally translated to a sequence of words, as the target sequence of the RNN model.

Caption Generation Image Captioning +2

Paper
Add Code

Label Distribution Learning Forests

no code implementations • NeurIPS 2017 • Wei Shen, Kai Zhao, Yilu Guo, Alan Yuille

This paper presents label distribution learning forests (LDLFs) - a novel label distribution learning algorithm based on differentiable decision trees, which have several advantages: 1) Decision trees have the potential to model any general form of label distributions by a mixture of leaf node predictions.

Ranked #11 on Age Estimation on MORPH album2 (Caucasian)

Age Estimation Representation Learning

Paper
Add Code

Deep Collaborative Learning for Visual Recognition

no code implementations • 3 Mar 2017 • Yan Wang, Lingxi Xie, Ya zhang, Wenjun Zhang, Alan Yuille

We formulate the function of a convolutional layer as learning a large visual vocabulary, and propose an alternative way, namely Deep Collaborative Learning (DCL), to reduce the computational complexity.

General Classification Image Classification

Paper
Add Code

Genetic CNN

1 code implementation • ICCV 2017 • Lingxi Xie, Alan Yuille

The deep Convolutional Neural Network (CNN) is the state-of-the-art solution for large-scale visual recognition.

Object Recognition

Paper
Code

SORT: Second-Order Response Transform for Visual Recognition

no code implementations • ICCV 2017 • Yan Wang, Lingxi Xie, Chenxi Liu, Ya zhang, Wenjun Zhang, Alan Yuille

In this paper, we reveal the importance and benefits of introducing second-order operations into deep neural networks.

Paper
Add Code

Recurrent Multimodal Interaction for Referring Image Segmentation

1 code implementation • ICCV 2017 • Chenxi Liu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Alan Yuille

In this paper we are interested in the problem of image segmentation given natural language descriptions, i. e. referring expressions.

Image Segmentation Segmentation +1

Paper
Code

Multi-stage Multi-recursive-input Fully Convolutional Networks for Neuronal Boundary Detection

no code implementations • ICCV 2017 • Wei Shen, Bin Wang, Yuan Jiang, Yan Wang, Alan Yuille

This design is biologically-plausible, as it likes a human visual system to compare different possible segmentation solutions to address the ambiguous boundary issue.

Boundary Detection Segmentation

Paper
Add Code

Adversarial Examples for Semantic Segmentation and Object Detection

2 code implementations • ICCV 2017 • Cihang Xie, Jian-Yu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, Alan Yuille

Our observation is that both segmentation and detection are based on classifying multiple targets on an image (e. g., the basic target is a pixel or a receptive field in segmentation, and an object proposal in detection), which inspires us to optimize a loss function over a set of pixels/proposals for generating adversarial perturbations.

Adversarial Attack Object +4

123

Paper
Code

Transfer of View-manifold Learning to Similarity Perception of Novel Objects

no code implementations • 31 Mar 2017 • Xingyu Lin, Hao Wang, Zhihao LI, Yimeng Zhang, Alan Yuille, Tai Sing Lee

We develop a model of perceptual similarity judgment based on re-training a deep convolution neural network (DCNN) that learns to associate different views of each 3D object to capture the notion of object persistence and continuity in our visual experience.

Metric Learning Object

Paper
Add Code

ScaleNet: Guiding Object Proposal Generation in Supermarkets and Beyond

no code implementations • ICCV 2017 • Siyuan Qiao, Wei Shen, Weichao Qiu, Chenxi Liu, Alan Yuille

We argue that estimation of object scales in images is helpful for generating object proposals, especially for supermarket images where object scales are usually within a small range.

Object Object Proposal Generation

Paper
Add Code

Few-Shot Image Recognition by Predicting Parameters from Activations

1 code implementation • CVPR 2018 • Siyuan Qiao, Chenxi Liu, Wei Shen, Alan Yuille

In this paper, we are interested in the few-shot learning problem.

Ranked #66 on Few-Shot Image Classification on Mini-Imagenet 5-way (5-shot)

Few-Shot Image Classification Few-Shot Learning

110

Paper
Code

Detecting Semantic Parts on Partially Occluded Objects

no code implementations • 25 Jul 2017 • Jianyu Wang, Cihang Xie, Zhishuai Zhang, Jun Zhu, Lingxi Xie, Alan Yuille

Our approach detects semantic parts by accumulating the confidence of local visual cues.

Clustering Semantic Part Detection

Paper
Add Code

Joint Multi-Person Pose Estimation and Semantic Part Segmentation

no code implementations • CVPR 2017 • Fangting Xia, Peng Wang, Xianjie Chen, Alan Yuille

To refine part segments, the refined pose and the original part potential are integrated through a Part FCN, where the skeleton feature from pose serves as additional regularization cues for part segments.

Ranked #5 on Human Part Segmentation on PASCAL-Part

Human Detection Multi-Person Pose Estimation

Paper
Add Code

Mitigating Adversarial Effects Through Randomization

2 code implementations • ICLR 2018 • Cihang Xie, Jian-Yu Wang, Zhishuai Zhang, Zhou Ren, Alan Yuille

Convolutional neural networks have demonstrated high accuracy on various tasks in recent years.

Adversarial Defense Image Classification

159

Paper
Code

Visual Concepts and Compositional Voting

no code implementations • 13 Nov 2017 • Jianyu Wang, Zhishuai Zhang, Cihang Xie, Yuyin Zhou, Vittal Premachandran, Jun Zhu, Lingxi Xie, Alan Yuille

We use clustering algorithms to study the population activities of the features and extract a set of visual concepts which we show are visually tight and correspond to semantic parts of vehicles.

Clustering Semantic Part Detection

Paper
Add Code

Few-shot Learning by Exploiting Visual Concepts within CNNs

no code implementations • 22 Nov 2017 • Boyang Deng, Qing Liu, Siyuan Qiao, Alan Yuille

In this work, we address these limitations of CNNs by developing novel, flexible, and interpretable models for few-shot learning.

Few-Shot Learning

Paper
Add Code

Gradually Updated Neural Networks for Large-Scale Image Recognition

no code implementations • ICML 2018 • Siyuan Qiao, Zhishuai Zhang, Wei Shen, Bo wang, Alan Yuille

Our method is by introducing computation orderings to the channels within convolutional layers or blocks, based on which we gradually compute the outputs in a channel-wise manner.

Paper
Add Code

Progressive Neural Architecture Search

18 code implementations • ECCV 2018 • Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy

We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms.

Ranked #15 on Neural Architecture Search on NAS-Bench-201, ImageNet-16-120 (Accuracy (Val) metric)

Evolutionary Algorithms General Classification +3

76,610

Paper
Code

Deep Regression Forests for Age Estimation

2 code implementations • CVPR 2018 • Wei Shen, Yilu Guo, Yan Wang, Kai Zhao, Bo wang, Alan Yuille

Age estimation from facial images is typically cast as a nonlinear regression problem.

Ranked #6 on Age Estimation on FGNET

Age Estimation regression

Paper
Code

Unleashing the Potential of CNNs for Interpretable Few-Shot Learning

no code implementations • ICLR 2018 • Boyang Deng, Qing Liu, Siyuan Qiao, Alan Yuille

Our models are based on the idea of encoding objects in terms of visual concepts, which are interpretable visual cues represented by the feature vectors within CNNs.

Few-Shot Learning

Paper
Add Code

Deep Co-Training for Semi-Supervised Image Recognition

1 code implementation • ECCV 2018 • Siyuan Qiao, Wei Shen, Zhishuai Zhang, Bo wang, Alan Yuille

We present Deep Co-Training, a deep learning based method inspired by the Co-Training framework.

Paper
Code

Improving Transferability of Adversarial Examples with Input Diversity

2 code implementations • CVPR 2019 • Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jian-Yu Wang, Zhou Ren, Alan Yuille

We hope that our proposed attack strategy can serve as a strong benchmark baseline for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in the future.

Adversarial Attack Image Classification

159

Paper
Code

Scene Graph Parsing as Dependency Parsing

2 code implementations • NAACL 2018 • Yu-Siang Wang, Chenxi Liu, Xiaohui Zeng, Alan Yuille

The scene graphs generated by our learned neural dependency parser achieve an F-score similarity of 49. 67% to ground truth graphs on our evaluation set, surpassing best previous approaches by 5%.

Dependency Parsing Image Retrieval +2

Paper
Code

Adversarial Attacks and Defences Competition

1 code implementation • 31 Mar 2018 • Alexey Kurakin, Ian Goodfellow, Samy Bengio, Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu Pang, Jun Zhu, Xiaolin Hu, Cihang Xie, Jian-Yu Wang, Zhishuai Zhang, Zhou Ren, Alan Yuille, Sangxia Huang, Yao Zhao, Yuzhe Zhao, Zhonglin Han, Junjiajia Long, Yerkebulan Berdibekov, Takuya Akiba, Seiya Tokui, Motoki Abe

To accelerate research on adversarial examples and robustness of machine learning classifiers, Google Brain organized a NIPS 2017 competition that encouraged researchers to develop new methods to generate adversarial examples as well as to develop new ways to defend against them.

BIG-bench Machine Learning

145

Paper
Code

SampleAhead: Online Classifier-Sampler Communication for Learning from Synthesized Data

no code implementations • 1 Apr 2018 • Qi Chen, Weichao Qiu, Yi Zhang, Lingxi Xie, Alan Yuille

But, this raises an important problem in active vision: given an {\bf infinite} data space, how to effectively sample a {\bf finite} subset to train a visual classifier?

Classification General Classification

Paper
Add Code

Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students

no code implementations • 15 May 2018 • Chenglin Yang, Lingxi Xie, Siyuan Qiao, Alan Yuille

We focus on the problem of training a deep neural network in generations.

General Classification Image Classification +1

Paper
Add Code

Resisting Large Data Variations via Introspective Transformation Network

no code implementations • 16 May 2018 • Yunhan Zhao, Ye Tian, Charless Fowlkes, Wei Shen, Alan Yuille

Experimental results verify that our approach significantly improves the ability of deep networks to resist large variations between training and testing data and achieves classification accuracy improvements on several benchmark datasets, including MNIST, affNIST, SVHN, CIFAR-10 and miniImageNet.

Data Augmentation Few-Shot Learning

Paper
Add Code

PCL: Proposal Cluster Learning for Weakly Supervised Object Detection

4 code implementations • 9 Jul 2018 • Peng Tang, Xinggang Wang, Song Bai, Wei Shen, Xiang Bai, Wenyu Liu, Alan Yuille

The iterative instance classifier refinement is implemented online using multiple streams in convolutional neural networks, where the first is an MIL network and the others are for instance classifier refinement supervised by the preceding one.

Ranked #1 on Weakly Supervised Object Detection on ImageNet

Multiple Instance Learning Object +3

245

Paper
Code

Rethinking Monocular Depth Estimation with Adversarial Training

no code implementations • 22 Aug 2018 • Richard Chen, Faisal Mahmood, Alan Yuille, Nicholas J. Durr

Most existing approaches treat depth estimation as a regression problem with a local pixel-wise loss function.

Monocular Depth Estimation

Paper
Add Code

Weakly Supervised Region Proposal Network and Object Detection

no code implementations • ECCV 2018 • Peng Tang, Xinggang Wang, Angtian Wang, Yongluan Yan, Wenyu Liu, Junzhou Huang, Alan Yuille

The Convolutional Neural Network (CNN) based region proposal generation method (i. e. region proposal network), trained using bounding box annotations, is an essential component in modern fully supervised object detectors.

Object object-detection +2

Paper
Add Code

Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding

1 code implementation • 14 Oct 2018 • Chenxu Luo, Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia, Alan Yuille

Performance on the five tasks of depth estimation, optical flow estimation, odometry, moving object segmentation and scene flow estimation shows that our approach outperforms other SoTA methods.

Ranked #1 on Scene Flow Estimation on KITTI 2015 Scene Flow Training

Depth Estimation Optical Flow Estimation +2

Paper
Code

OriNet: A Fully Convolutional Network for 3D Human Pose Estimation

1 code implementation • 12 Nov 2018 • Chenxu Luo, Xiao Chu, Alan Yuille

We use limb orientations as a new way to represent 3D poses and bind the orientation together with the bounding box of each limb region to better associate images and predictions.

Ranked #76 on 3D Human Pose Estimation on MPI-INF-3DHP (AUC metric)

3D Human Pose Estimation

Paper
Code

Semantic Part Detection via Matching: Learning to Generalize to Novel Viewpoints from Limited Training Data

1 code implementation • ICCV 2019 • Yutong Bai, Qing Liu, Lingxi Xie, Weichao Qiu, Yan Zheng, Alan Yuille

In particular, this enables images in the training dataset to be matched to a virtual 3D model of the object (for simplicity, we assume that the object viewpoint can be estimated by standard techniques).

Clustering Object +1

Paper
Code

Robust Face Detection via Learning Small Faces on Hard Images

1 code implementation • 28 Nov 2018 • Zhishuai Zhang, Wei Shen, Siyuan Qiao, Yan Wang, Bo wang, Alan Yuille

In this paper, we propose that the robustness of a face detector against hard faces can be improved by learning small faces on hard images.

Ranked #8 on Face Detection on WIDER Face (Hard)

Face Detection

139

Paper
Code

3D Semi-Supervised Learning with Uncertainty-Aware Multi-View Co-Training

no code implementations • 29 Nov 2018 • Yingda Xia, Fengze Liu, Dong Yang, Jinzheng Cai, Lequan Yu, Zhuotun Zhu, Daguang Xu, Alan Yuille, Holger Roth

Meanwhile, a fully-supervised method based on our approach achieved state-of-the-art performances on both the LiTS liver tumor segmentation and the Medical Segmentation Decathlon (MSD) challenge, demonstrating the robustness and value of our framework, even when fully supervised training is feasible.

Image Segmentation Medical Image Segmentation +3

Paper
Add Code

Neural Rejuvenation: Improving Deep Network Training by Enhancing Computational Resource Utilization

1 code implementation • CVPR 2019 • Siyuan Qiao, Zhe Lin, Jianming Zhang, Alan Yuille

By simply replacing standard optimizers with Neural Rejuvenation, we are able to improve the performances of neural networks by a very large margin while using similar training efforts and maintaining their original resource usages.

Network Pruning Neural Architecture Search

Paper
Code

Learning Transferable Adversarial Examples via Ghost Networks

1 code implementation • 9 Dec 2018 • Yingwei Li, Song Bai, Yuyin Zhou, Cihang Xie, Zhishuai Zhang, Alan Yuille

The critical principle of ghost networks is to apply feature-level perturbations to an existing model to potentially create a huge set of diverse models.

Adversarial Attack

Paper
Code

Feature Denoising for Improving Adversarial Robustness

2 code implementations • CVPR 2019 • Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, Kaiming He

This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks.

Ranked #1 on Adversarial Defense on CAAD 2018

Adversarial Defense Adversarial Robustness +2

671

Paper
Code

ELASTIC: Improving CNNs with Dynamic Scaling Policies

1 code implementation • CVPR 2019 • Huiyu Wang, Aniruddha Kembhavi, Ali Farhadi, Alan Yuille, Mohammad Rastegari

We formulate the scaling policy as a non-linear function inside the network's structure that (a) is learned from data, (b) is instance specific, (c) does not add extra computation, and (d) can be applied on any network architecture.

General Classification Multi-Label Classification +1

Paper
Code

CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions

3 code implementations • CVPR 2019 • Runtao Liu, Chenxi Liu, Yutong Bai, Alan Yuille

Yet there has been evidence that current benchmark datasets suffer from bias, and current state-of-the-art models cannot be easily evaluated on their intermediate reasoning process.

Ranked #1 on Referring Expression Segmentation on CLEVR-Ref+

Image Segmentation object-detection +8

Paper
Code

Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation

12 code implementations • CVPR 2019 • Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan Yuille, Li Fei-Fei

Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space.

Ranked #7 on Semantic Segmentation on PASCAL VOC 2012 val

Image Classification Image Segmentation +3

76,603

Paper
Code

Micro-Batch Training with Batch-Channel Normalization and Weight Standardization

8 code implementations • 25 Mar 2019 • Siyuan Qiao, Huiyu Wang, Chenxi Liu, Wei Shen, Alan Yuille

Batch Normalization (BN) has become an out-of-box technique to improve deep network training.

Ranked #76 on Instance Segmentation on COCO minival

Image Classification Instance Segmentation +5

48,386

Paper
Code

An Alarm System For Segmentation Algorithm Based On Shape Model

no code implementations • ICLR 2019 • Fengze Liu, Yingda Xia, Dong Yang, Alan Yuille, Daguang Xu

Motivated by this, in this paper, we learn a feature space using the shape information which is a strong prior shared among different datasets and robust to the appearance variation of input data. The shape feature is captured using a Variational Auto-Encoder (VAE) network that trained with only the ground truth masks.

Segmentation

Paper
Add Code

Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval

1 code implementation • ICCV 2019 • Qing Liu, Lingxi Xie, Huiyu Wang, Alan Yuille

Sketch-based image retrieval (SBIR) is widely recognized as an important vision problem which implies a wide range of real-world applications.

Domain Adaptation Retrieval +2

Paper
Code

Prior-aware Neural Network for Partially-Supervised Multi-Organ Segmentation

no code implementations • ICCV 2019 • Yuyin Zhou, Zhe Li, Song Bai, Chong Wang, Xinlei Chen, Mei Han, Elliot Fishman, Alan Yuille

Accurate multi-organ abdominal CT segmentation is essential to many clinical applications such as computer-aided intervention.

Medical Image Segmentation Organ Segmentation +2

Paper
Add Code

Structured Prediction using cGANs with Fusion Discriminator

no code implementations • ICLR 2019 • Faisal Mahmood, Wenhao Xu, Nicholas J. Durr, Jeremiah W. Johnson, Alan Yuille

We propose the fusion discriminator, a single unified framework for incorporating conditional information into a generative adversarial network (GAN) for a variety of distinct structured prediction tasks, including image synthesis, semantic segmentation, and depth estimation.

Depth Estimation Generative Adversarial Network +3

Paper
Add Code

Robustness of Object Recognition under Extreme Occlusion in Humans and Computational Models

1 code implementation • 11 May 2019 • Hongru Zhu, Peng Tang, Jeongho Park, Soojin Park, Alan Yuille

We test both humans and the above-mentioned computational models in a challenging task of object recognition under extreme occlusion, where target objects are heavily occluded by irrelevant real objects in real backgrounds.

Object Object Recognition

Paper
Code

Combining Compositional Models and Deep Networks For Robust Object Classification under Occlusion

no code implementations • 28 May 2019 • Adam Kortylewski, Qing Liu, Huiyu Wang, Zhishuai Zhang, Alan Yuille

In this work, we combine DCNNs and compositional object models to retain the best of both approaches: a discriminative model that is robust to partial occlusion and mask attacks.

General Classification Image Classification +1

Paper
Add Code

V-NAS: Neural Architecture Search for Volumetric Medical Image Segmentation

no code implementations • 6 Jun 2019 • Zhuotun Zhu, Chenxi Liu, Dong Yang, Alan Yuille, Daguang Xu

Deep learning algorithms, in particular 2D and 3D fully convolutional neural networks (FCNs), have rapidly become the mainstream methodology for volumetric medical image segmentation.

Image Segmentation Neural Architecture Search +3

Paper
Add Code

Intriguing properties of adversarial training at scale

no code implementations • ICLR 2020 • Cihang Xie, Alan Yuille

This two-domain hypothesis may explain the issue of BN when training with a mixture of clean and adversarial images, as estimating normalization statistics of this mixture distribution is challenging.

Adversarial Robustness

Paper
Add Code

Multi-Scale Attentional Network for Multi-Focal Segmentation of Active Bleed after Pelvic Fractures

no code implementations • 23 Jun 2019 • Yuyin Zhou, David Dreizin, Yingwei Li, Zhishuai Zhang, Yan Wang, Alan Yuille

Trauma is the worldwide leading cause of death and disability in those younger than 45 years, and pelvic fractures are a major source of morbidity and mortality.

Segmentation

Paper
Add Code

Deep Differentiable Random Forests for Age Estimation

no code implementations • 23 Jul 2019 • Wei Shen, Yilu Guo, Yan Wang, Kai Zhao, Bo wang, Alan Yuille

Both of them connect split nodes to the top layer of convolutional neural networks (CNNs) and deal with inhomogeneous data by jointly learning input-dependent data partitions at the split nodes and age distributions at the leaf nodes.

Age Estimation regression

Paper
Add Code

FusionNet: Incorporating Shape and Texture for Abnormality Detection in 3D Abdominal CT Scans

no code implementations • 21 Aug 2019 • Fengze Liu, Yuyin Zhou, Elliot Fishman, Alan Yuille

Second, a FusionNet is proposed to take both the binary mask and CT image as input and perform a binary classification.

3D Classification Anomaly Detection +4

Paper
Add Code

Hyper-Pairing Network for Multi-Phase Pancreatic Ductal Adenocarcinoma Segmentation

no code implementations • 3 Sep 2019 • Yuyin Zhou, Yingwei Li, Zhishuai Zhang, Yan Wang, Angtian Wang, Elliot Fishman, Alan Yuille, Seyoun Park

Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal cancers with an overall five-year survival rate of 8%.

Paper
Add Code

TDAPNet: Prototype Network with Recurrent Top-Down Attention for Robust Object Classification under Partial Occlusion

no code implementations • 9 Sep 2019 • Mingqing Xiao, Adam Kortylewski, Ruihai Wu, Siyuan Qiao, Wei Shen, Alan Yuille

Despite deep convolutional neural networks' great success in object classification, it suffers from severe generalization performance drop under occlusion due to the inconsistency between training and testing data.

General Classification Object +1

Paper
Add Code

Universal Physical Camouflage Attacks on Object Detectors

2 code implementations • CVPR 2020 • Lifeng Huang, Chengying Gao, Yuyin Zhou, Cihang Xie, Alan Yuille, Changqing Zou, Ning Liu

In this paper, we study physical adversarial attacks on object detectors in the wild.

Object Region Proposal

Paper
Code

Grouped Spatial-Temporal Aggregation for Efficient Action Recognition

1 code implementation • ICCV 2019 • Chenxu Luo, Alan Yuille

This decomposition is more parameter-efficient and enables us to quantitatively analyze the contributions of spatial and temporal features in different layers.

Action Recognition

Paper
Code

Localizing Occluders with Compositional Convolutional Networks

no code implementations • 18 Nov 2019 • Adam Kortylewski, Qing Liu, Huiyu Wang, Zhishuai Zhang, Alan Yuille

Our experimental results demonstrate that the proposed extensions increase the model's performance at localizing occluders as well as at classifying partially occluded objects.

Paper
Add Code

Adversarial Examples Improve Image Recognition

6 code implementations • CVPR 2020 • Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le

We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger.

Ranked #4 on Domain Generalization on VizWiz-Classification

Domain Generalization Image Classification

29,814

Paper
Code

Rethinking Normalization and Elimination Singularity in Neural Networks

1 code implementation • 21 Nov 2019 • Siyuan Qiao, Huiyu Wang, Chenxi Liu, Wei Shen, Alan Yuille

To address this issue, we propose BatchChannel Normalization (BCN), which uses batch knowledge to avoid the elimination singularities in the training of channel-normalized models.

Image Classification Instance Segmentation +4

Paper
Code

Identifying Model Weakness with Adversarial Examiner

no code implementations • 25 Nov 2019 • Michelle Shu, Chenxi Liu, Weichao Qiu, Alan Yuille

Different from the existing strategy to always give the same (distribution of) test data, the adversarial examiner will dynamically select the next test data to hand out based on the testing history so far, with the goal being to undermine the model's performance.

Autonomous Driving

Paper
Add Code

Deeply Shape-guided Cascade for Instance Segmentation

1 code implementation • CVPR 2021 • Hao Ding, Siyuan Qiao, Alan Yuille, Wei Shen

The key to a successful cascade architecture for precise instance segmentation is to fully leverage the relationship between bounding box detection and mask segmentation across multiple stages.

Instance Segmentation Region Proposal +2

Paper
Code

RSA: Randomized Simulation as Augmentation for Robust Human Action Recognition

no code implementations • 3 Dec 2019 • Yi Zhang, Xinyue Wei, Weichao Qiu, Zihao Xiao, Gregory D. Hager, Alan Yuille

In this paper, we propose the Randomized Simulation as Augmentation (RSA) framework which augments real-world training data with synthetic data to improve the robustness of action recognition networks.

Action Recognition Temporal Action Localization

Paper
Add Code

DASZL: Dynamic Action Signatures for Zero-shot Learning

no code implementations • 8 Dec 2019 • Tae Soo Kim, Jonathan D. Jones, Michael Peven, Zihao Xiao, Jin Bai, Yi Zhang, Weichao Qiu, Alan Yuille, Gregory D. Hager

There are many realistic applications of activity recognition where the set of potential activity descriptions is combinatorially large.

Action Detection Activity Detection +3

Paper
Add Code

Identity Preserve Transform: Understand What Activity Classification Models Have Learnt

no code implementations • 13 Dec 2019 • Jialing Lyu, Weichao Qiu, Xinyue Wei, Yi Zhang, Alan Yuille, Zheng-Jun Zha

This can explain why an activity classification model usually fails to generalize to datasets it is not trained on.

Classification General Classification

Paper
Add Code

Learning from Synthetic Animals

2 code implementations • CVPR 2020 • Jiteng Mu, Weichao Qiu, Gregory Hager, Alan Yuille

Despite great success in human parsing, progress for parsing other deformable articulated objects, like animals, is still limited by the lack of labeled data.

Domain Adaptation Human Parsing +1

Paper
Code

AtomNAS: Fine-Grained End-to-End Neural Architecture Search

1 code implementation • ICLR 2020 • Jieru Mei, Yingwei Li, Xiaochen Lian, Xiaojie Jin, Linjie Yang, Alan Yuille, Jianchao Yang

We propose a fine-grained search space comprised of atomic blocks, a minimal search unit that is much smaller than the ones used in recent NAS algorithms.

Ranked #61 on Neural Architecture Search on ImageNet

Neural Architecture Search

223

Paper
Code

Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots

no code implementations • ECCV 2020 • Qi Chen, Lin Sun, Zhixin Wang, Kui Jia, Alan Yuille

Accurate 3D object detection in LiDAR based point clouds suffers from the challenges of data sparsity and irregularities.

Ranked #3 on 3D Object Detection on KITTI Pedestrians Moderate

3D Object Detection Object +2

Paper
Add Code

When Radiology Report Generation Meets Knowledge Graph

no code implementations • 19 Feb 2020 • Yixiao Zhang, Xiaosong Wang, Ziyue Xu, Qihang Yu, Alan Yuille, Daguang Xu

In addition, we proposed a new evaluation metric for radiology image reporting with the assistance of the same composed graph.

Graph Embedding Image Captioning

Paper
Add Code

Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion

1 code implementation • CVPR 2020 • Adam Kortylewski, Ju He, Qing Liu, Alan Yuille

Inspired by the success of compositional models at classifying partially occluded objects, we propose to integrate compositional models and DCNNs into a unified deep model with innate robustness to partial occlusion.

General Classification

109

Paper
Code

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

5 code implementations • ECCV 2020 • Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions.

Ranked #4 on Panoptic Segmentation on Cityscapes val (using extra training data)

Image Classification Panoptic Segmentation +1

1,143

Paper
Code

Synthesize then Compare: Detecting Failures and Anomalies for Semantic Segmentation

1 code implementation • ECCV 2020 • Yingda Xia, Yi Zhang, Fengze Liu, Wei Shen, Alan Yuille

The ability to detect failures and anomalies are fundamental requirements for building reliable systems for computer vision applications, especially safety-critical applications of semantic segmentation, such as autonomous driving and medical image analysis.

Ranked #8 on Anomaly Detection on Road Anomaly (using extra training data)

Anomaly Detection Autonomous Driving +3

Paper
Code

Are Labels Necessary for Neural Architecture Search?

2 code implementations • ECCV 2020 • Chenxi Liu, Piotr Dollár, Kaiming He, Ross Girshick, Alan Yuille, Saining Xie

Existing neural network architectures in computer vision -- whether designed by humans or by machines -- were typically found using both images and their associated labels.

Neural Architecture Search

2,109

Paper
Code

Neural Architecture Search for Lightweight Non-Local Networks

2 code implementations • CVPR 2020 • Yingwei Li, Xiaojie Jin, Jieru Mei, Xiaochen Lian, Linjie Yang, Cihang Xie, Qihang Yu, Yuyin Zhou, Song Bai, Alan Yuille

However, it has been rarely explored to embed the NL blocks in mobile neural networks, mainly due to the following challenges: 1) NL blocks generally have heavy computation cost which makes it difficult to be applied in applications where computational resources are limited, and 2) it is an open problem to discover an optimal configuration to embed NL blocks into mobile neural networks.

Ranked #60 on Neural Architecture Search on ImageNet

Image Classification Neural Architecture Search

104

Paper
Code

Context-Aware Group Captioning via Self-Attention and Contrastive Features

no code implementations • CVPR 2020 • Zhuowan Li, Quan Tran, Long Mai, Zhe Lin, Alan Yuille

In this paper, we introduce a new task, context-aware group captioning, which aims to describe a group of target images in the context of another group of related reference images.

Image Captioning

Paper
Add Code

PatchAttack: A Black-box Texture-based Attack with Reinforcement Learning

2 code implementations • ECCV 2020 • Chenglin Yang, Adam Kortylewski, Cihang Xie, Yinzhi Cao, Alan Yuille

PatchAttack induces misclassifications by superimposing small textured patches on the input image.

Adversarial Defense Clustering +2

Paper
Code

Organ at Risk Segmentation for Head and Neck Cancer using Stratified Learning and Neural Architecture Search

no code implementations • CVPR 2020 • Dazhou Guo, Dakai Jin, Zhuotun Zhu, Tsung-Ying Ho, Adam P. Harrison, Chun-Hung Chao, Jing Xiao, Alan Yuille, Chien-Yu Lin, Le Lu

This is the goal of our work, where we introduce stratified organ at risk segmentation (SOARS), an approach that stratifies OARs into anchor, mid-level, and small & hard (S&H) categories.

Anatomy Neural Architecture Search +1

Paper
Add Code

Domain Adaptive Relational Reasoning for 3D Multi-Organ Segmentation

no code implementations • 18 May 2020 • Shuhao Fu, Yongyi Lu, Yan Wang, Yuyin Zhou, Wei Shen, Elliot Fishman, Alan Yuille

In this paper, we present a novel unsupervised domain adaptation (UDA) method, named Domain Adaptive Relational Reasoning (DARR), to generalize 3D multi-organ segmentation models to medical data collected from different scanners and/or protocols (domains).

Organ Segmentation Relational Reasoning +3

Paper
Add Code

Robust Object Detection under Occlusion with Context-Aware CompositionalNets

no code implementations • CVPR 2020 • Angtian Wang, Yihong Sun, Adam Kortylewski, Alan Yuille

In this work, we propose to overcome two limitations of CompositionalNets which will enable them to detect partially occluded objects: 1) CompositionalNets, as well as other DCNN architectures, do not explicitly separate the representation of the context from the object itself.

Object object-detection +1

Paper
Add Code

JSSR: A Joint Synthesis, Segmentation, and Registration System for 3D Multi-Modal Image Alignment of Large-scale Pathological CT Scans

no code implementations • ECCV 2020 • Fengze Liu, Jingzheng Cai, Yuankai Huo, Chi-Tung Cheng, Ashwin Raju, Dakai Jin, Jing Xiao, Alan Yuille, Le Lu, Chien-Hung Liao, Adam P. Harrison

We extensively evaluate our JSSR system on a large-scale medical image dataset containing 1, 485 patient CT imaging studies of four different phases (i. e., 5, 940 3D CT scans with pathological livers) on the registration, segmentation and synthesis tasks.

Image Registration Multi-Task Learning +2

Paper
Add Code

Detecting Scatteredly-Distributed, Small, andCritically Important Objects in 3D OncologyImaging via Decision Stratification

no code implementations • 27 May 2020 • Zhuotun Zhu, Ke Yan, Dakai Jin, Jinzheng Cai, Tsung-Ying Ho, Adam P. Harrison, Dazhou Guo, Chun-Hung Chao, Xianghua Ye, Jing Xiao, Alan Yuille, Le Lu

We focus on the detection and segmentation of oncology-significant (or suspicious cancer metastasized) lymph nodes (OSLNs), which has not been studied before as a computational task.

Paper
Add Code

DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution

6 code implementations • CVPR 2021 • Siyuan Qiao, Liang-Chieh Chen, Alan Yuille

In this paper, we explore this mechanism in the backbone design for object detection.

Ranked #2 on Object Detection on AI-TOD

Instance Segmentation Object +4

27,836

Paper
Code

Smooth Adversarial Training

1 code implementation • 25 Jun 2020 • Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le

SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82. 2% accuracy and 58. 6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9. 5% for accuracy and 11. 6% for robustness.

Ranked #1 on Adversarial Defense on ImageNet (non-targeted PGD, max perturbation=4)

Adversarial Defense Adversarial Robustness

Paper
Code

Compositional Convolutional Neural Networks: A Robust and Interpretable Model for Object Recognition under Occlusion

no code implementations • 28 Jun 2020 • Adam Kortylewski, Qing Liu, Angtian Wang, Yihong Sun, Alan Yuille

The structure of the compositional model enables CompositionalNets to decompose images into objects and context, as well as to further decompose object representations in terms of individual parts and the objects' pose.

Image Classification object-detection +2

Paper
Add Code

Uncertainty-aware multi-view co-training for semi-supervised medical image segmentation and domain adaptation

no code implementations • 28 Jun 2020 • Yingda Xia, Dong Yang, Zhiding Yu, Fengze Liu, Jinzheng Cai, Lequan Yu, Zhuotun Zhu, Daguang Xu, Alan Yuille, Holger Roth

Experiments on the NIH pancreas segmentation dataset and a multi-organ segmentation dataset show state-of-the-art performance of the proposed framework on semi-supervised medical image segmentation.

Image Segmentation Organ Segmentation +6

Paper
Add Code

Probabilistic Multi-modal Trajectory Prediction with Lane Attention for Autonomous Vehicles

no code implementations • 6 Jul 2020 • Chenxu Luo, Lin Sun, Dariush Dabiri, Alan Yuille

As for vehicles, their trajectories are significantly influenced by the lane geometry and how to effectively use the lane information is of active interest.

Autonomous Vehicles Motion Forecasting +1

Paper
Add Code

ASAP-Net: Attention and Structure Aware Point Cloud Sequence Segmentation

1 code implementation • 12 Aug 2020 • Hanwen Cao, Yongyi Lu, Cewu Lu, Bo Pang, Gongshen Liu, Alan Yuille

In this paper, we further improve spatio-temporal point cloud feature learning with a flexible module called ASAP considering both attention and structure information across frames, which we find as two important factors for successful segmentation in dynamic point clouds.

Segmentation

Paper
Code

Lymph Node Gross Tumor Volume Detection and Segmentation via Distance-based Gating using 3D CT/PET Imaging in Radiotherapy

no code implementations • 27 Aug 2020 • Zhuotun Zhu, Dakai Jin, Ke Yan, Tsung-Ying Ho, Xianghua Ye, Dazhou Guo, Chun-Hung Chao, Jing Xiao, Alan Yuille, Le Lu

Finding, identifying and segmenting suspicious cancer metastasized lymph nodes from 3D multi-modality imaging is a clinical task of paramount importance.

Paper
Add Code

Lymph Node Gross Tumor Volume Detection in Oncology Imaging via Relationship Learning Using Graph Neural Network

no code implementations • 29 Aug 2020 • Chun-Hung Chao, Zhuotun Zhu, Dazhou Guo, Ke Yan, Tsung-Ying Ho, Jinzheng Cai, Adam P. Harrison, Xianghua Ye, Jing Xiao, Alan Yuille, Min Sun, Le Lu, Dakai Jin

Specifically, we first utilize a 3D convolutional neural network with ROI-pooling to extract the GTV$_{LN}$'s instance-wise appearance features.

Clinical Knowledge

Paper
Add Code

CoKe: Localized Contrastive Learning for Robust Keypoint Detection

no code implementations • 29 Sep 2020 • Yutong Bai, Angtian Wang, Adam Kortylewski, Alan Yuille

In this paper, we introduce a contrastive learning framework for keypoint detection (CoKe).

Contrastive Learning Keypoint Detection +1

Paper
Add Code

CO2: Consistent Contrast for Unsupervised Visual Representation Learning

no code implementations • ICLR 2021 • Chen Wei, Huiyu Wang, Wei Shen, Alan Yuille

Regarding the similarity of the query crop to each crop from other images as "unlabeled", the consistency term takes the corresponding similarity of a positive crop as a pseudo label, and encourages consistency between these two similarities.

Contrastive Learning Image Classification +5

Paper
Add Code

Shape-Texture Debiased Neural Network Training

1 code implementation • ICLR 2021 • Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie

To prevent models from exclusively attending on a single cue in representation learning, we augment training data with images with conflicting shape and texture information (eg, an image of chimpanzee shape but with lemon texture) and, most importantly, provide the corresponding supervisions from shape and texture simultaneously.

Ranked #601 on Image Classification on ImageNet

Adversarial Robustness Data Augmentation +2

107

Paper
Code

Amodal Segmentation through Out-of-Task and Out-of-Distribution Generalization with a Bayesian Model

1 code implementation • CVPR 2022 • Yihong Sun, Adam Kortylewski, Alan Yuille

Moreover, by leveraging an outlier process, Bayesian models can further generalize out-of-distribution to segment partially occluded objects and to predict their amodal object boundaries.

Amodal Instance Segmentation Object +2

Paper
Code

Can Temporal Information Help with Contrastive Self-Supervised Learning?

no code implementations • 25 Nov 2020 • Yutong Bai, Haoqi Fan, Ishan Misra, Ganesh Venkatesh, Yongyi Lu, Yuyin Zhou, Qihang Yu, Vikas Chandra, Alan Yuille

To this end, we present Temporal-aware Contrastive self-supervised learningTaCo, as a general paradigm to enhance video CSL.

Data Augmentation Representation Learning +2

Paper
Add Code

Batch Normalization with Enhanced Linear Transformation

1 code implementation • 28 Nov 2020 • Yuhui Xu, Lingxi Xie, Cihang Xie, Jieru Mei, Siyuan Qiao, Wei Shen, Hongkai Xiong, Alan Yuille

Batch normalization (BN) is a fundamental unit in modern deep networks, in which a linear transformation module was designed for improving BN's flexibility of fitting complex data distributions.

Paper
Code

MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

3 code implementations • CVPR 2021 • Huiyu Wang, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

As a result, MaX-DeepLab shows a significant 7. 1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time.

Ranked #12 on Panoptic Segmentation on COCO test-dev

Panoptic Segmentation

988

Paper
Code

Robustness Out of the Box: Compositional Representations Naturally Defend Against Black-Box Patch Attacks

no code implementations • 1 Dec 2020 • Christian Cosgrove, Adam Kortylewski, Chenglin Yang, Alan Yuille

Second, we find that compositional deep networks, which have part-based representations that lead to innate robustness to natural occlusion, are robust to patch attacks on PASCAL3D+ and the German Traffic Sign Recognition Benchmark, without adversarial training.

Traffic Sign Recognition

Paper
Add Code

Unsupervised Part Discovery via Feature Alignment

no code implementations • 1 Dec 2020 • Mengqi Guo, Yutong Bai, Zhishuai Zhang, Adam Kortylewski, Alan Yuille

Specifically, given a training image, we find a set of similar images that show instances of the same object category in the same pose, through an affine alignment of their corresponding feature maps.

Object Object Recognition

Paper
Add Code

Robust Instance Segmentation through Reasoning about Multi-Object Occlusion

1 code implementation • CVPR 2021 • Xiaoding Yuan, Adam Kortylewski, Yihong Sun, Alan Yuille

The improved segmentation masks are, in turn, integrated into the network in a top-down manner to improve the image classification.

Image Classification Instance Segmentation +3

Paper
Code

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

1 code implementation • CVPR 2021 • Siyuan Qiao, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

We name this joint task as Depth-aware Video Panoptic Segmentation, and propose a new evaluation metric along with two derived datasets for it, which will be made available to the public.

Ranked #1 on Video Panoptic Segmentation on Cityscapes-VPS (using extra training data)

Depth-aware Video Panoptic Segmentation Monocular Depth Estimation +2

212

Paper
Code

Mask Guided Matting via Progressive Refinement Network

1 code implementation • CVPR 2021 • Qihang Yu, Jianming Zhang, He Zhang, Yilin Wang, Zhe Lin, Ning Xu, Yutong Bai, Alan Yuille

We propose Mask Guided (MG) Matting, a robust matting framework that takes a general coarse mask as guidance.

Image Matting

316

Paper
Code

Meticulous Object Segmentation

1 code implementation • 13 Dec 2020 • Chenglin Yang, Yilin Wang, Jianming Zhang, He Zhang, Zhe Lin, Alan Yuille

To evaluate segmentation quality near object boundaries, we propose the Meticulosity Quality (MQ) score considering both the mask coverage and boundary precision.

2k 4k +4

Paper
Code

CORL: Compositional Representation Learning for Few-Shot Classification

no code implementations • 28 Jan 2021 • Ju He, Adam Kortylewski, Alan Yuille

In particular, during meta-learning, we train a knowledge base that consists of a dictionary of component representations and a dictionary of component activation maps that encode common spatial activation patterns of components.

Classification Few-Shot Image Classification +3

Paper
Add Code

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation

1 code implementation • ICLR 2021 • Angtian Wang, Adam Kortylewski, Alan Yuille

Using differentiable rendering we estimate the 3D object pose by minimizing the reconstruction error between NeMo and the feature representation of the target image.

3D Pose Estimation Contrastive Learning

Paper
Code

Occluded Video Instance Segmentation: A Benchmark

2 code implementations • 2 Feb 2021 • Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16. 3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.

Ranked #39 on Video Instance Segmentation on OVIS validation

Instance Segmentation Segmentation +3

Paper
Code

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

1 code implementation • CVPR 2021 • Chen Wei, Kihyuk Sohn, Clayton Mellina, Alan Yuille, Fan Yang

Semi-supervised learning on class-imbalanced data, although a realistic problem, has been under studied.

Paper
Code

Understanding Catastrophic Forgetting and Remembering in Continual Learning with Optimal Relevance Mapping

1 code implementation • 22 Feb 2021 • Prakhar Kaushik, Alex Gain, Adam Kortylewski, Alan Yuille

Additionally, current approaches that deal with forgetting ignore the problem of catastrophic remembering, i. e. the worsening ability to discriminate between data from different tasks.

Ranked #1 on Continual Learning on ImageNet-50 (5 tasks)

Continual Learning

Paper
Code

Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency

no code implementations • CVPR 2021 • Qing Liu, Vignesh Ramanathan, Dhruv Mahajan, Alan Yuille, Zhenheng Yang

However, existing approaches which rely only on image-level class labels predominantly suffer from errors due to (a) partial segmentation of objects and (b) missing object predictions.

Instance Segmentation Relation Network +3

Paper
Add Code

Learning Part Segmentation through Unsupervised Domain Adaptation from Synthetic Vehicles

1 code implementation • CVPR 2022 • Qing Liu, Adam Kortylewski, Zhishuai Zhang, Zizhang Li, Mengqi Guo, Qihao Liu, Xiaoding Yuan, Jiteng Mu, Weichao Qiu, Alan Yuille

We believe our dataset provides a rich testbed to study UDA for part segmentation and will help to significantly push forward research in this area.

Geometric Matching Segmentation +2

Paper
Code

CateNorm: Categorical Normalization for Robust Medical Image Segmentation

1 code implementation • 29 Mar 2021 • Junfei Xiao, Lequan Yu, Zongwei Zhou, Yutong Bai, Lei Xing, Alan Yuille, Yuyin Zhou

We propose a new normalization strategy, named categorical normalization (CateNorm), to normalize the activations according to categorical statistics.

Image Segmentation Medical Image Segmentation +2

Paper
Code

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation

1 code implementation • ICCV 2021 • Jiteng Mu, Weichao Qiu, Adam Kortylewski, Alan Yuille, Nuno Vasconcelos, Xiaolong Wang

To deal with the large shape variance, we introduce Articulated Signed Distance Functions (A-SDF) to represent articulated shapes with a disentangled latent space, where we have separate codes for encoding shape and articulation.

Test-time Adaptation

Paper
Code

Self-Supervised Pillar Motion Learning for Autonomous Driving

1 code implementation • CVPR 2021 • Chenxu Luo, Xiaodong Yang, Alan Yuille

Autonomous driving can benefit from motion behavior comprehension when interacting with diverse traffic participants in highly dynamic environments.

Autonomous Driving Motion Estimation

118

Paper
Code

Auto-FedAvg: Learnable Federated Averaging for Multi-Institutional Medical Image Segmentation

no code implementations • 20 Apr 2021 • Yingda Xia, Dong Yang, Wenqi Li, Andriy Myronenko, Daguang Xu, Hirofumi Obinata, Hitoshi Mori, Peng An, Stephanie Harmon, Evrim Turkbey, Baris Turkbey, Bradford Wood, Francesca Patella, Elvira Stellato, Gianpaolo Carrafiello, Anna Ierardi, Alan Yuille, Holger Roth

In this work, we design a new data-driven approach, namely Auto-FedAvg, where aggregation weights are dynamically adjusted, depending on data distributions across data silos and the current training progress of the models.

Federated Learning Image Segmentation +3

Paper
Add Code

Visual analogy: Deep learning versus compositional models

no code implementations • 14 May 2021 • Nicholas Ichien, Qing Liu, Shuhao Fu, Keith J. Holyoak, Alan Yuille, Hongjing Lu

We compared human performance to that of two recent deep learning models (Siamese Network and Relation Network) directly trained to solve these analogy problems, as well as to that of a compositional model that assesses relational similarity between part-based representations.

Relation Network Visual Analogies

Paper
Add Code

Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning

1 code implementation • 1 Jun 2021 • Ju He, Adam Kortylewski, Shaokang Yang, Shuai Liu, Cheng Yang, Changhu Wang, Alan Yuille

In particular, we decouple the training of the representation and the classifier, and systematically investigate the effects of different data re-sampling techniques when training the whole network including a classifier as well as fine-tuning the feature extractor only.

Paper
Code

Glance-and-Gaze Vision Transformer

1 code implementation • NeurIPS 2021 • Qihang Yu, Yingda Xia, Yutong Bai, Yongyi Lu, Alan Yuille, Wei Shen

It is motivated by the Glance and Gaze behavior of human beings when recognizing objects in natural scenes, with the ability to efficiently model both long-range dependencies and local context.

Paper
Code

Simulated Adversarial Testing of Face Recognition Models

no code implementations • CVPR 2022 • Nataniel Ruiz, Adam Kortylewski, Weichao Qiu, Cihang Xie, Sarah Adel Bargal, Alan Yuille, Stan Sclaroff

In this work, we propose a framework for learning how to test machine learning algorithms using simulators in an adversarial manner in order to find weaknesses in the model before deploying it in critical scenarios.

BIG-bench Machine Learning Face Recognition

Paper
Add Code

Progressive Stage-wise Learning for Unsupervised Feature Representation Enhancement

no code implementations • CVPR 2021 • Zefan Li, Chenxi Liu, Alan Yuille, Bingbing Ni, Wenjun Zhang, Wen Gao

For a given unsupervised task, we design multilevel tasks and define different learning stages for the deep network.

Paper
Add Code

Locally Enhanced Self-Attention: Combining Self-Attention and Convolution as Local and Context Terms

3 code implementations • 12 Jul 2021 • Chenglin Yang, Siyuan Qiao, Adam Kortylewski, Alan Yuille

Self-Attention has become prevalent in computer vision models.

Instance Segmentation object-detection +2

Paper
Code

Exploring Simple 3D Multi-Object Tracking for Autonomous Driving

3 code implementations • ICCV 2021 • Chenxu Luo, Xiaodong Yang, Alan Yuille

3D multi-object tracking in LiDAR point clouds is a key ingredient for self-driving vehicles.

3D Multi-Object Tracking Autonomous Driving +4

163

Paper
Code

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

1 code implementation • 11 Sep 2021 • Shiyu Tang, Ruihao Gong, Yan Wang, Aishan Liu, Jiakai Wang, Xinyun Chen, Fengwei Yu, Xianglong Liu, Dawn Song, Alan Yuille, Philip H. S. Torr, DaCheng Tao

Thus, we propose RobustART, the first comprehensive Robustness investigation benchmark on ImageNet regarding ARchitecture design (49 human-designed off-the-shelf architectures and 1200+ networks from neural architecture search) and Training techniques (10+ techniques, e. g., data augmentation) towards diverse noises (adversarial, natural, and system noises).

Adversarial Robustness Benchmarking +2

143

Paper
Code

SAME: Deformable Image Registration based on Self-supervised Anatomical Embeddings

no code implementations • 23 Sep 2021 • Fengze Liu, Ke Yan, Adam Harrison, Dazhou Guo, Le Lu, Alan Yuille, Lingyun Huang, Guotong Xie, Jing Xiao, Xianghua Ye, Dakai Jin

In this work, we introduce a fast and accurate method for unsupervised 3D medical image registration.

Image Registration Medical Image Registration

Paper
Add Code

Image BERT Pre-training with Online Tokenizer

no code implementations • ICLR 2022 • Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong

The success of language Transformers is primarily attributed to the pretext task of masked language modeling (MLM), where texts are first tokenized into semantically meaningful pieces.

Image Classification Instance Segmentation +5

Paper
Add Code

Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images

1 code implementation • ICCV 2021 • Zhuowan Li, Elias Stengel-Eskin, Yixiao Zhang, Cihang Xie, Quan Tran, Benjamin Van Durme, Alan Yuille

Our experiments show CCO substantially boosts the performance of neural symbolic methods on real images.

Question Answering Visual Question Answering

Paper
Code

Nuisance-Label Supervision: Robustness Improvement by Free Labels

no code implementations • 14 Oct 2021 • Xinyue Wei, Weichao Qiu, Yi Zhang, Zihao Xiao, Alan Yuille

Nuisance factors are those irrelevant to a task, and an ideal model should be invariant to them.

Action Recognition Data Augmentation

Paper
Add Code

A Light-weight Interpretable Compositional Model for Nuclei Detection and Weakly-Supervised Segmentation

no code implementations • 26 Oct 2021 • Yixiao Zhang, Adam Kortylewski, Qing Liu, Seyoun Park, Benjamin Green, Elizabeth Engle, Guillermo Almodovar, Ryan Walk, Sigfredo Soto-Diaz, Janis Taube, Alex Szalay, Alan Yuille

It only requires annotations on isolated nucleus, rather than on all nuclei in the dataset.

Segmentation Weakly supervised segmentation

Paper
Add Code

Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose

1 code implementation • NeurIPS 2021 • Angtian Wang, Shenxiao Mei, Alan Yuille, Adam Kortylewski

The model is initialized from a few labelled images and is subsequently used to synthesize feature representations of unseen 3D views.

3D Pose Estimation Few-Shot Learning

Paper
Code

Are Transformers More Robust Than CNNs?

1 code implementation • NeurIPS 2021 • Yutong Bai, Jieru Mei, Alan Yuille, Cihang Xie

Transformer emerges as a powerful tool for visual recognition.

Ranked #1 on Adversarial Robustness on Stylized ImageNet

Adversarial Robustness

174

Paper
Code

Searching for TrioNet: Combining Convolution with Local and Global Self-Attention

no code implementations • 15 Nov 2021 • Huaijin Pi, Huiyu Wang, Yingwei Li, Zizhang Li, Alan Yuille

In order to effectively search in this huge architecture space, we propose Hierarchical Sampling for better training of the supernet.

Neural Architecture Search

Paper
Add Code

Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge

no code implementations • 15 Nov 2021 • Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

To promote the development of occlusion understanding, we collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario.

Instance Segmentation Object Recognition +3

Paper
Add Code

iBOT: Image BERT Pre-Training with Online Tokenizer

1 code implementation • 15 Nov 2021 • Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong

We present a self-supervised framework iBOT that can perform masked prediction with an online tokenizer.

Ranked #1 on Unsupervised Image Classification on ImageNet

Instance Segmentation Language Modelling +6

620

Paper
Code

TransMix: Attend to Mix for Vision Transformers

2 code implementations • CVPR 2022 • Jie-Neng Chen, Shuyang Sun, Ju He, Philip Torr, Alan Yuille, Song Bai

The confidence of the label will be larger if the corresponding input image is weighted higher by the attention map.

Instance Segmentation object-detection +3

573

Paper
Code

Learning from Temporal Gradient for Semi-supervised Action Recognition

1 code implementation • CVPR 2022 • Junfei Xiao, Longlong Jing, Lin Zhang, Ju He, Qi She, Zongwei Zhou, Alan Yuille, Yingwei Li

Our method achieves the state-of-the-art performance on three video action recognition benchmarks (i. e., Kinetics-400, UCF-101, and HMDB-51) under several typical semi-supervised settings (i. e., different ratios of labeled data).

Action Recognition Temporal Action Localization

Paper
Code

OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images

no code implementations • 29 Nov 2021 • Bingchen Zhao, Shaozuo Yu, Wufei Ma, Mingxin Yu, Shenxiao Mei, Angtian Wang, Ju He, Alan Yuille, Adam Kortylewski

One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors.

3D Pose Estimation Benchmarking +5

Paper
Add Code

PartImageNet: A Large, High-Quality Dataset of Parts

1 code implementation • 2 Dec 2021 • Ju He, Shuo Yang, Shaokang Yang, Adam Kortylewski, Xiaoding Yuan, Jie-Neng Chen, Shuai Liu, Cheng Yang, Qihang Yu, Alan Yuille

To help address this problem, we propose PartImageNet, a large, high-quality dataset with part segmentation annotations.

Activity Recognition Few-Shot Learning +6

109

Paper
Code

MT-TransUNet: Mediating Multi-Task Tokens in Transformers for Skin Lesion Segmentation and Classification

1 code implementation • 3 Dec 2021 • Jingye Chen, Jieneng Chen, Zongwei Zhou, Bin Li, Alan Yuille, Yongyi Lu

However, these approaches formulated skin cancer diagnosis as a simple classification task, dismissing the potential benefit from lesion segmentation.

Classification Computational Efficiency +4

Paper
Code

Masked Feature Prediction for Self-Supervised Visual Pre-Training

5 code implementations • CVPR 2022 • Chen Wei, Haoqi Fan, Saining Xie, Chao-yuan Wu, Alan Yuille, Christoph Feichtenhofer

We present Masked Feature Prediction (MaskFeat) for self-supervised pre-training of video models.

Ranked #8 on Action Recognition on AVA v2.2 (using extra training data)

Action Classification Action Recognition +1

6,276

Paper
Code

Lite Vision Transformer with Enhanced Self-Attention

1 code implementation • CVPR 2022 • Chenglin Yang, Yilin Wang, Jianming Zhang, He Zhang, Zijun Wei, Zhe Lin, Alan Yuille

We propose Lite Vision Transformer (LVT), a novel light-weight transformer network with two enhanced self-attention mechanisms to improve the model performances for mobile deployment.

Panoptic Segmentation Segmentation

128

Paper
Code

Point-Level Region Contrast for Object Detection Pre-Training

1 code implementation • CVPR 2022 • Yutong Bai, Xinlei Chen, Alexander Kirillov, Alan Yuille, Alexander C. Berg

In this work we present point-level region contrast, a self-supervised pre-training approach for the task of object detection.

Contrastive Learning Knowledge Distillation +2

Paper
Code

DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

1 code implementation • CVPR 2022 • Yingwei Li, Adams Wei Yu, Tianjian Meng, Ben Caine, Jiquan Ngiam, Daiyi Peng, Junyang Shen, Bo Wu, Yifeng Lu, Denny Zhou, Quoc V. Le, Alan Yuille, Mingxing Tan

In this paper, we propose two novel techniques: InverseAug that inverses geometric-related augmentations, e. g., rotation, to enable accurate geometric alignment between lidar points and image pixels, and LearnableAlign that leverages cross-attention to dynamically capture the correlations between image and lidar features during fusion.

3D Object Detection Autonomous Driving +2

2,778

Paper
Code

CP2: Copy-Paste Contrastive Pretraining for Semantic Segmentation

1 code implementation • 22 Mar 2022 • Feng Wang, Huiyu Wang, Chen Wei, Alan Yuille, Wei Shen

Recent advances in self-supervised contrastive learning yield good image-level representation, which favors classification tasks but usually neglects pixel-level detailed information, leading to unsatisfactory transfer performance to dense prediction tasks such as semantic segmentation.

Contrastive Learning Representation Learning +2

Paper
Code

SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering

1 code implementation • CVPR 2022 • Vipul Gupta, Zhuowan Li, Adam Kortylewski, Chenyu Zhang, Yingwei Li, Alan Yuille

By swapping the context object features, the model reliance on context can be suppressed effectively.

Data Augmentation Question Answering +1

Paper
Code

Fast AdvProp

1 code implementation • ICLR 2022 • Jieru Mei, Yucheng Han, Yutong Bai, Yixiao Zhang, Yingwei Li, Xianhang Li, Alan Yuille, Cihang Xie

Specifically, our modifications in Fast AdvProp are guided by the hypothesis that disentangled learning with adversarial examples is the key for performance improvements, while other training recipes (e. g., paired clean and adversarial training samples, multi-step adversarial attackers) could be largely simplified.

Data Augmentation object-detection +1

Paper
Code

In Defense of Image Pre-Training for Spatiotemporal Recognition

1 code implementation • 3 May 2022 • Xianhang Li, Huiyu Wang, Chen Wei, Jieru Mei, Alan Yuille, Yuyin Zhou, Cihang Xie

Inspired by this observation, we hypothesize that the key to effectively leveraging image pre-training lies in the decomposition of learning spatial and temporal features, and revisiting image pre-training as the appearance prior to initializing 3D kernels.

STS Video Recognition

Paper
Code

VoGE: A Differentiable Volume Renderer using Gaussian Ellipsoids for Analysis-by-Synthesis

1 code implementation • 30 May 2022 • Angtian Wang, Peng Wang, Jian Sun, Adam Kortylewski, Alan Yuille

The Gaussian reconstruction kernels have been proposed by Westover (1990) and studied by the computer graphics community back in the 90s, which gives an alternative representation of object 3D geometry from meshes and point clouds.

Pose Estimation

Paper
Code

A Simple Data Mixing Prior for Improving Self-Supervised Learning

1 code implementation • CVPR 2022 • Sucheng Ren, Huiyu Wang, Zhengqi Gao, Shengfeng He, Alan Yuille, Yuyin Zhou, Cihang Xie

More notably, our SDMP is the first method that successfully leverages data mixing to improve (rather than hurt) the performance of Vision Transformers in the self-supervised setting.

Representation Learning Self-Supervised Learning

Paper
Code

CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

2 code implementations • CVPR 2022 • Qihang Yu, Huiyu Wang, Dahun Kim, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

We propose Clustering Mask Transformer (CMT-DeepLab), a transformer-based framework for panoptic segmentation designed around clustering.

Ranked #6 on Panoptic Segmentation on COCO test-dev

Clustering Panoptic Segmentation +1

Paper
Code

Unsupervised Domain Adaptation through Shape Modeling for Medical Image Segmentation

1 code implementation • 6 Jul 2022 • Yuan YAO, Fengze Liu, Zongwei Zhou, Yan Wang, Wei Shen, Alan Yuille, Yongyi Lu

Previous methods proposed Variational Autoencoder (VAE) based models to learn the distribution of shape for a particular organ and used it to automatically evaluate the quality of a segmentation prediction by fitting it into the learned shape distribution.

Image Segmentation Pancreas Segmentation +3

Paper
Code

kMaX-DeepLab: k-means Mask Transformer

2 code implementations • 8 Jul 2022 • Qihang Yu, Huiyu Wang, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

However, we observe that most existing transformer-based vision models simply borrow the idea from NLP, neglecting the crucial difference between languages and images, particularly the extremely large sequence length of spatially flattened pixel features.

Ranked #2 on Panoptic Segmentation on COCO test-dev

Clustering Object Detection +1

988

Paper
Code

In Defense of Online Models for Video Instance Segmentation

1 code implementation • 21 Jul 2022 • Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Ranked #9 on Video Instance Segmentation on YouTube-VIS validation (using extra training data)

Contrastive Learning Instance Segmentation +5

592

Paper
Code

Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation

no code implementations • 29 Jul 2022 • Qihao Liu, Yi Zhang, Song Bai, Alan Yuille

Inspired by the remarkable ability of humans to infer occluded joints from visible cues, we develop a method to explicitly model this process that significantly improves bottom-up multi-person human pose estimation with or without occlusions.

Ranked #10 on 3D Multi-Person Pose Estimation (absolute) on MuPoTS-3D

3D Human Pose Estimation 3D Multi-Person Pose Estimation (absolute) +2

Paper
Add Code

Masked Autoencoders Enable Efficient Knowledge Distillers

1 code implementation • CVPR 2023 • Yutong Bai, Zeyu Wang, Junfei Xiao, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie

For example, by distilling the knowledge from an MAE pre-trained ViT-L into a ViT-B, our method achieves 84. 0% ImageNet top-1 accuracy, outperforming the baseline of directly distilling a fine-tuned ViT-L by 1. 2%.

Knowledge Distillation

Paper
Code

Robust Category-Level 6D Pose Estimation with Coarse-to-Fine Rendering of Neural Features

1 code implementation • 12 Sep 2022 • Wufei Ma, Angtian Wang, Alan Yuille, Adam Kortylewski

We consider the problem of category-level 6D pose estimation from a single RGB image.

6D Pose Estimation Contrastive Learning +1

Paper
Code

MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models

2 code implementations • 4 Oct 2022 • Chenglin Yang, Siyuan Qiao, Qihang Yu, Xiaoding Yuan, Yukun Zhu, Alan Yuille, Hartwig Adam, Liang-Chieh Chen

The tiny-MOAT family is also benchmarked on downstream tasks, serving as a baseline for the community.

Ranked #1 on Object Detection on MS COCO

Image Classification Instance Segmentation +2

988

Paper
Code

Context-Enhanced Stereo Transformer

1 code implementation • 21 Oct 2022 • Weiyu Guo, Zhaoshuo Li, Yongkui Yang, Zheng Wang, Russell H. Taylor, Mathias Unberath, Alan Yuille, Yingwei Li

We construct our stereo depth estimation model, Context Enhanced Stereo Transformer (CSTR), by plugging CEP into the state-of-the-art stereo depth estimation method Stereo Transformer.

Stereo Depth Estimation Stereo Matching

Paper
Code

1st Place Solution of The Robust Vision Challenge 2022 Semantic Segmentation Track

1 code implementation • 23 Oct 2022 • Junfei Xiao, Zhichao Xu, Shiyi Lan, Zhiding Yu, Alan Yuille, Anima Anandkumar

The model is trained on a composite dataset consisting of images from 9 datasets (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, WildDash 2, IDD, BDD, and COCO) with a simple dataset balancing strategy.

Segmentation Semantic Segmentation

Paper
Code

Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification

1 code implementation • 23 Oct 2022 • Junfei Xiao, Yutong Bai, Alan Yuille, Zongwei Zhou

We hope that this study can direct future research on the application of Transformers to a larger variety of medical imaging tasks.

Computational Efficiency Transfer Learning

Paper
Code

Synthetic Tumors Make AI Segment Tumors Better

1 code implementation • 26 Oct 2022 • Qixin Hu, Junfei Xiao, Yixiong Chen, Shuwen Sun, Jie-Neng Chen, Alan Yuille, Zongwei Zhou

We develop a novel strategy to generate synthetic tumors.

Tumor Segmentation

269

Paper
Code

SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training

no code implementations • ICCV 2023 • Yuanze Lin, Chen Wei, Huiyu Wang, Alan Yuille, Cihang Xie

Coupling all these designs allows our method to enjoy both competitive performances on text-to-video retrieval and video question answering tasks, and much less pre-training costs by 1. 9X or more.

Question Answering Retrieval +3

Paper
Add Code

LUMix: Improving Mixup by Better Modelling Label Uncertainty

no code implementations • 29 Nov 2022 • Shuyang Sun, Jie-Neng Chen, Ruifei He, Alan Yuille, Philip Torr, Song Bai

LUMix is simple as it can be implemented in just a few lines of code and can be universally applied to any deep networks \eg CNNs and Vision Transformers, with minimal computational cost.

Data Augmentation

Paper
Add Code

Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models

no code implementations • 1 Dec 2022 • Zhuowan Li, Cihang Xie, Benjamin Van Durme, Alan Yuille

Despite the impressive advancements achieved through vision-and-language pretraining, it remains unclear whether this joint learning paradigm can help understand each individual modality.

Attribute Representation Learning

Paper
Add Code

Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning

2 code implementations • CVPR 2023 • Zhuowan Li, Xingrui Wang, Elias Stengel-Eskin, Adam Kortylewski, Wufei Ma, Benjamin Van Durme, Alan Yuille

Visual Question Answering (VQA) models often perform poorly on out-of-distribution data and struggle on domain generalization.

Domain Generalization Question Answering +2

Paper
Code

AsyInst: Asymmetric Affinity with DepthGrad and Color for Box-Supervised Instance Segmentation

no code implementations • 7 Dec 2022 • Siwei Yang, Longlong Jing, Junfei Xiao, Hang Zhao, Alan Yuille, Yingwei Li

Through systematic analysis, we found that the commonly used pairwise affinity loss has two limitations: (1) it works with color affinity but leads to inferior performance with other modalities such as depth gradient, (2)the original affinity loss does not prevent trivial predictions as intended but actually accelerates this process due to the affinity loss term being symmetric.

Box-supervised Instance Segmentation Segmentation +2

Paper
Add Code

Unleashing the Power of Visual Prompting At the Pixel Level

1 code implementation • 20 Dec 2022 • Junyang Wu, Xianhang Li, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie

This paper presents a simple and effective visual prompting method for adapting pre-trained models to downstream recognition tasks.

Visual Prompting

Paper
Code

Learning Road Scene-level Representations via Semantic Region Prediction

no code implementations • 2 Jan 2023 • Zihao Xiao, Alan Yuille, Yi-Ting Chen

In this work, we tackle two vital tasks in automated driving systems, i. e., driver intent prediction and risk object identification from egocentric images.

Paper
Add Code

CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

2 code implementations • ICCV 2023 • Jie Liu, Yixiao Zhang, Jie-Neng Chen, Junfei Xiao, Yongyi Lu, Bennett A. Landman, Yixuan Yuan, Alan Yuille, Yucheng Tang, Zongwei Zhou

The proposed model is developed from an assembly of 14 datasets, using a total of 3, 410 CT scans for training and then evaluated on 6, 162 external CT scans from 3 additional datasets.

Ranked #1 on Organ Segmentation on BTCV

Organ Segmentation Segmentation +1

471

Paper
Code

Benchmarking Robustness in Neural Radiance Fields

no code implementations • 10 Jan 2023 • Chen Wang, Angtian Wang, Junbo Li, Alan Yuille, Cihang Xie

We find that NeRF-based models are significantly degraded in the presence of corruption, and are more sensitive to a different set of corruptions than image recognition models.

Benchmarking Camera Calibration +2

Paper
Add Code

CancerUniT: Towards a Single Unified Model for Effective Detection, Segmentation, and Diagnosis of Eight Major Cancers Using a Large Collection of CT Scans

no code implementations • ICCV 2023 • Jieneng Chen, Yingda Xia, Jiawen Yao, Ke Yan, Jianpeng Zhang, Le Lu, Fakai Wang, Bo Zhou, Mingyan Qiu, Qihang Yu, Mingze Yuan, Wei Fang, Yuxing Tang, Minfeng Xu, Jian Zhou, Yuqian Zhao, Qifeng Wang, Xianghua Ye, Xiaoli Yin, Yu Shi, Xin Chen, Jingren Zhou, Alan Yuille, Zaiyi Liu, Ling Zhang

Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice, while most medical AI systems are built to focus on single organs with a narrow list of a few diseases.

Organ Segmentation Representation Learning +1

Paper
Add Code

PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation

1 code implementation • CVPR 2023 • Qihao Liu, Adam Kortylewski, Alan Yuille

We introduce a learning-based testing method, termed PoseExaminer, that automatically diagnoses HPS algorithms by searching over the parameter space of human pose images to find the failure modes.

Multi-agent Reinforcement Learning

Paper
Code

InstMove: Instance Motion for Object-centric Video Segmentation

1 code implementation • CVPR 2023 • Qihao Liu, Junfeng Wu, Yi Jiang, Xiang Bai, Alan Yuille, Song Bai

A common solution is to use optical flow to provide motion information, but essentially it only considers pixel-level motion, which still relies on appearance similarity and hence is often inaccurate under occlusion and fast movement.

Object Optical Flow Estimation +3