UFO²: A Unified Framework towards Omni-supervised Object Detection

1 code implementation ECCV 2020 Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Alexander G. Schwing, Jan Kautz

Existing work on object detection often relies on a single form of annotation: the model is trained using either accurate yet costly bounding boxes or cheaper but less expressive image-level tags.

Object object-detection +1

LITA: Language Instructed Temporal-Localization Assistant

1 code implementation27 Mar 2024 De-An Huang, Shijia Liao, Subhashree Radhakrishnan, Hongxu Yin, Pavlo Molchanov, Zhiding Yu, Jan Kautz

In addition to leveraging existing video datasets with timestamps, we propose a new task, Reasoning Temporal Localization (RTL), along with the dataset, ActivityNet-RTL, for learning and evaluating this task.

Instruction Following Temporal Localization +2

Improving Distant 3D Object Detection Using 2D Box Supervision

no code implementations14 Mar 2024 Zetong Yang, Zhiding Yu, Chris Choy, Renhao Wang, Anima Anandkumar, Jose M. Alvarez

This mapping allows the depth estimation of distant objects conditioned on their 2D boxes, making long-range 3D detection with 2D supervision feasible.

3D Object Detection Depth Estimation +2

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

1 code implementation21 Feb 2024 Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, Anima Anandkumar

Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model.

Image Generation

Fully Attentional Networks with Self-emerging Token Labeling

1 code implementation ICCV 2023 Bingyin Zhao, Zhiding Yu, Shiyi Lan, Yutao Cheng, Anima Anandkumar, Yingjie Lao, Jose M. Alvarez

With the proposed STL framework, our best model based on FAN-L-Hybrid (77. 3M parameters) achieves 84. 8% Top-1 accuracy and 42. 1% mCE on ImageNet-1K and ImageNet-C, and sets a new state-of-the-art for ImageNet-A (46. 1%) and ImageNet-R (56. 6%) without using extra data, outperforming the original FAN counterpart by significant margins.

Semantic Segmentation

A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties

1 code implementation21 Dec 2023 Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu, Alan Yuille, Yuyin Zhou, Cihang Xie

Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common sense knowledge for supervising segmentation models.

Common Sense Reasoning Descriptive +1

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

1 code implementation5 Dec 2023 Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez

We initially observed that the nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models incorporating ego status, such as the ego vehicle's velocity.

Autonomous Driving

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

1 code implementation8 Aug 2023 Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar, Jiaya Jia, Jose Alvarez

For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.

3D Object Detection Autonomous Driving +2

FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation

1 code implementation4 Jul 2023 Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, Jose M. Alvarez

This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop.

Autonomous Driving Prediction Of Occupancy Grid Maps

Differentially Private Video Activity Recognition

no code implementations27 Jun 2023 Zelun Luo, Yuliang Zou, Yijin Yang, Zane Durante, De-An Huang, Zhiding Yu, Chaowei Xiao, Li Fei-Fei, Animashree Anandkumar

In recent years, differential privacy has seen significant advancements in image classification; however, its application to video activity recognition remains under-explored.

Activity Recognition Classification +2

Real-Time Radiance Fields for Single-Image Portrait View Synthesis

no code implementations3 May 2023 Alex Trevithick, Matthew Chan, Michael Stengel, Eric R. Chan, Chao Liu, Zhiding Yu, Sameh Khamis, Manmohan Chandraker, Ravi Ramamoorthi, Koki Nagano

We present a one-shot method to infer and render a photorealistic 3D representation from a single unposed image (e. g., face portrait) in real-time.

Data Augmentation Novel View Synthesis

Vision Transformers Are Good Mask Auto-Labelers

no code implementations CVPR 2023 Shiyi Lan, Xitong Yang, Zhiding Yu, Zuxuan Wu, Jose M. Alvarez, Anima Anandkumar

We propose Mask Auto-Labeler (MAL), a high-quality Transformer-based mask auto-labeling framework for instance segmentation using only box annotations.

Instance Segmentation Segmentation +1

FocalFormer3D: Focusing on Hard Instance for 3D Object Detection

1 code implementation ICCV 2023 Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Anima Anandkumar, Jiaya Jia, Jose M. Alvarez

For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.

3D Object Detection Autonomous Driving +2

1st Place Solution of The Robust Vision Challenge 2022 Semantic Segmentation Track

1 code implementation23 Oct 2022 Junfei Xiao, Zhichao Xu, Shiyi Lan, Zhiding Yu, Alan Yuille, Anima Anandkumar

The model is trained on a composite dataset consisting of images from 9 datasets (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, WildDash 2, IDD, BDD, and COCO) with a simple dataset balancing strategy.

Segmentation Semantic Segmentation

Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

2 code implementations15 Sep 2022 Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, Chaowei Xiao

In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.

Image Classification Zero-shot Generalization

PointDP: Diffusion-driven Purification against Adversarial Attacks on 3D Point Cloud Recognition

no code implementations21 Aug 2022 Jiachen Sun, Weili Nie, Zhiding Yu, Z. Morley Mao, Chaowei Xiao

3D Point cloud is becoming a critical data representation in many real-world applications like autonomous driving, robotics, and medical imaging.

Autonomous Driving

MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training

2 code implementations3 Aug 2022 De-An Huang, Zhiding Yu, Anima Anandkumar

By only training a query-based image instance segmentation model, MinVIS outperforms the previous best result on the challenging Occluded VIS dataset by over 10% AP.

Instance Segmentation Segmentation +2

Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions

1 code implementation CVPR 2022 Huaizu Jiang, Xiaojian Ma, Weili Nie, Zhiding Yu, Yuke Zhu, Song-Chun Zhu, Anima Anandkumar

A significant gap remains between today's visual pattern recognition models and human-level visual cognition especially when it comes to few-shot learning and compositional reasoning of novel concepts.

Benchmarking Few-Shot Image Classification +5

Understanding The Robustness in Vision Transformers

2 code implementations26 Apr 2022 Daquan Zhou, Zhiding Yu, Enze Xie, Chaowei Xiao, Anima Anandkumar, Jiashi Feng, Jose M. Alvarez

Our study is motivated by the intriguing properties of the emerging visual grouping in Vision Transformers, which indicates that self-attention may promote robustness through improved mid-level representations.

Ranked #4 on Domain Generalization on ImageNet-R (using extra training data)

Domain Generalization Image Classification +3

RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning

1 code implementation ICLR 2022 Xiaojian Ma, Weili Nie, Zhiding Yu, Huaizu Jiang, Chaowei Xiao, Yuke Zhu, Song-Chun Zhu, Anima Anandkumar

This task remains challenging for current deep learning algorithms since it requires addressing three key technical problems jointly: 1) identifying object entities and their properties, 2) inferring semantic relations between pairs of entities, and 3) generalizing to novel object-relation combinations, i. e., systematic generalization.

Human-Object Interaction Detection Object +5

M$^2$BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation

no code implementations11 Apr 2022 Enze Xie, Zhiding Yu, Daquan Zhou, Jonah Philion, Anima Anandkumar, Sanja Fidler, Ping Luo, Jose M. Alvarez

In this paper, we propose M$^2$BEV, a unified framework that jointly performs 3D object detection and map segmentation in the Birds Eye View~(BEV) space with multi-camera image inputs.

3D Object Detection object-detection +1

CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs

1 code implementation CVPR 2022 Jiteng Mu, Shalini De Mello, Zhiding Yu, Nuno Vasconcelos, Xiaolong Wang, Jan Kautz, Sifei Liu

We represent the correspondence maps of different images as warped coordinate frames transformed from a canonical coordinate frame, i. e., the correspondence map, which describes the structure (e. g., the shape of a face), is controlled via a transformation.


FreeSOLO: Learning to Segment Objects without Annotations

1 code implementation CVPR 2022 Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima Anandkumar, Chunhua Shen, Jose M. Alvarez

FreeSOLO further demonstrates superiority as a strong pre-training method, outperforming state-of-the-art self-supervised pre-training methods by +9. 8% AP when fine-tuning instance segmentation with only 5% COCO masks.

Instance Segmentation object-detection +4

Adversarially Robust 3D Point Cloud Recognition Using Self-Supervisions

no code implementations NeurIPS 2021 Jiachen Sun, Yulong Cao, Christopher B. Choy, Zhiding Yu, Anima Anandkumar, Zhuoqing Morley Mao, Chaowei Xiao

In this paper, we systematically study the impact of various self-supervised learning proxy tasks on different architectures and threat models for 3D point clouds with adversarial training.

Adversarial Robustness Autonomous Driving +1

Scaling Fair Learning to Hundreds of Intersectional Groups

no code implementations29 Sep 2021 Eric Zhao, De-An Huang, Hao liu, Zhiding Yu, Anqi Liu, Olga Russakovsky, Anima Anandkumar

In real-world applications, however, there are multiple protected attributes yielding a large number of intersectional protected groups.

Attribute Fairness +1

Learning Contrastive Representation for Semantic Correspondence

no code implementations22 Sep 2021 Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz, Ming-Hsuan Yang

Dense correspondence across semantically related images has been extensively studied, but still faces two challenges: 1) large variations in appearance, scale and pose exist even for objects from the same category, and 2) labeling pixel-level dense correspondences is labor intensive and infeasible to scale.

Contrastive Learning Semantic correspondence

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

1 code implementation17 Jun 2021 Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Anima Anandkumar

A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert.

Autonomous Driving Image Augmentation +3

Taxonomy of Machine Learning Safety: A Survey and Primer

no code implementations9 Jun 2021 Sina Mohseni, Haotao Wang, Zhiding Yu, Chaowei Xiao, Zhangyang Wang, Jay Yadawa

The open-world deployment of Machine Learning (ML) algorithms in safety-critical applications such as autonomous vehicles needs to address a variety of ML vulnerabilities such as interpretability, verifiability, and performance limitations.

Autonomous Vehicles BIG-bench Machine Learning +1

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

23 code implementations NeurIPS 2021 Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo

We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders.

C++ code Semantic Segmentation +1

Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection

1 code implementation12 Apr 2021 Nadine Chang, Zhiding Yu, Yu-Xiong Wang, Anima Anandkumar, Sanja Fidler, Jose M. Alvarez

As a result, image resampling alone is not enough to yield a sufficiently balanced distribution at the object level.


Transferable Unsupervised Robust Representation Learning

no code implementations1 Jan 2021 De-An Huang, Zhiding Yu, Anima Anandkumar

We upend this view and show that URRL improves both the natural accuracy of unsupervised representation learning and its robustness to corruptions and adversarial noise.

Data Augmentation Representation Learning +1

UFO$^2$: A Unified Framework towards Omni-supervised Object Detection

no code implementations21 Oct 2020 Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Alexander G. Schwing, Jan Kautz

Existing work on object detection often relies on a single form of annotation: the model is trained using either accurate yet costly bounding boxes or cheaper but less expressive image-level tags.

object-detection Object Detection

Learning Calibrated Uncertainties for Domain Shift: A Distributionally Robust Learning Approach

no code implementations8 Oct 2020 Haoxuan Wang, Zhiding Yu, Yisong Yue, Anima Anandkumar, Anqi Liu, Junchi Yan

We propose a framework for learning calibrated uncertainties under domain shifts, where the source (training) distribution differs from the target (test) distribution.

Density Ratio Estimation Unsupervised Domain Adaptation

Distributionally Robust Learning for Unsupervised Domain Adaptation

no code implementations28 Sep 2020 Haoxuan Wang, Anqi Liu, Zhiding Yu, Yisong Yue, Anima Anandkumar

This formulation motivates the use of two neural networks that are jointly trained --- a discriminative network between the source and target domains for density-ratio estimation, in addition to the standard classification network.

Density Ratio Estimation Unsupervised Domain Adaptation

Delving Deeper into Anti-aliasing in ConvNets

2 code implementations21 Aug 2020 Xueyan Zou, Fanyi Xiao, Zhiding Yu, Yong Jae Lee

Aliasing refers to the phenomenon that high frequency signals degenerate into completely different ones after sampling.

Instance Segmentation Segmentation +1

Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification

1 code implementation ECCV 2020 Yang Zou, Xiaodong Yang, Zhiding Yu, B. V. K. Vijaya Kumar, Jan Kautz

To this end, we propose a joint learning framework that disentangles id-related/unrelated features and enforces adaptation to work on the id-related feature space exclusively.

Person Re-Identification Unsupervised Domain Adaptation

Unsupervised Controllable Generation with Self-Training

no code implementations17 Jul 2020 Grigorios G. Chrysos, Jean Kossaifi, Zhiding Yu, Anima Anandkumar

Instead, we propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training.


Neural Networks with Recurrent Generative Feedback

1 code implementation NeurIPS 2020 Yujia Huang, James Gornet, Sihui Dai, Zhiding Yu, Tan Nguyen, Doris Y. Tsao, Anima Anandkumar

This mechanism can be interpreted as a form of self-consistency between the maximum a posteriori (MAP) estimation of an internal generative model and the external environment.

Adversarial Robustness

Transposer: Universal Texture Synthesis Using Feature Maps as Transposed Convolution Filter

no code implementations14 Jul 2020 Guilin Liu, Rohan Taori, Ting-Chun Wang, Zhiding Yu, Shiqiu Liu, Fitsum A. Reda, Karan Sapra, Andrew Tao, Bryan Catanzaro

Specifically, we directly treat the whole encoded feature map of the input texture as transposed convolution filters and the features' self-similarity map, which captures the auto-correlation information, as input to the transposed convolution.

Texture Synthesis

Uncertainty-aware multi-view co-training for semi-supervised medical image segmentation and domain adaptation

no code implementations28 Jun 2020 Yingda Xia, Dong Yang, Zhiding Yu, Fengze Liu, Jinzheng Cai, Lequan Yu, Zhuotun Zhu, Daguang Xu, Alan Yuille, Holger Roth

Experiments on the NIH pancreas segmentation dataset and a multi-organ segmentation dataset show state-of-the-art performance of the proposed framework on semi-supervised medical image segmentation.

Image Segmentation Organ Segmentation +6

Confidence Regularized Self-Training

2 code implementations ICCV 2019 Yang Zou, Zhiding Yu, Xiaofeng Liu, B. V. K. Vijaya Kumar, Jinsong Wang

Recent advances in domain adaptation show that deep self-training presents a powerful means for unsupervised domain adaptation.

Image Classification Semantic Segmentation +2

Regularizing Neural Networks via Minimizing Hyperspherical Energy

1 code implementation CVPR 2020 Rongmei Lin, Weiyang Liu, Zhen Liu, Chen Feng, Zhiding Yu, James M. Rehg, Li Xiong, Le Song

Inspired by the Thomson problem in physics where the distribution of multiple propelling electrons on a unit sphere can be modeled via minimizing some potential energy, hyperspherical energy minimization has demonstrated its potential in regularizing neural networks and improving their generalization power.

Partial Convolution based Padding

4 code implementations28 Nov 2018 Guilin Liu, Kevin J. Shih, Ting-Chun Wang, Fitsum A. Reda, Karan Sapra, Zhiding Yu, Andrew Tao, Bryan Catanzaro

In this paper, we present a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks.

General Classification Semantic Segmentation

Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training

1 code implementation18 Oct 2018 Yang Zou, Zhiding Yu, B. V. K. Vijaya Kumar, Jinsong Wang

In this paper, we propose a novel UDA framework based on an iterative self-training procedure, where the problem is formulated as latent variable loss minimization, and can be solved by alternatively generating pseudo labels on target data and re-training the model with these labels.

Pseudo Label Semantic Segmentation +2

Simultaneous Edge Alignment and Learning

3 code implementations ECCV 2018 Zhiding Yu, Weiyang Liu, Yang Zou, Chen Feng, Srikumar Ramalingam, B. V. K. Vijaya Kumar, Jan Kautz

Edge detection is among the most fundamental vision problems for its role in perceptual grouping and its wide applications.

Edge Detection Representation Learning

Learning towards Minimum Hyperspherical Energy

4 code implementations NeurIPS 2018 Weiyang Liu, Rongmei Lin, Zhen Liu, Lixin Liu, Zhiding Yu, Bo Dai, Le Song

In light of this intuition, we reduce the redundancy regularization problem to generic energy minimization, and propose a minimum hyperspherical energy (MHE) objective as generic regularization for neural networks.

Decoupled Networks

1 code implementation CVPR 2018 Weiyang Liu, Zhen Liu, Zhiding Yu, Bo Dai, Rongmei Lin, Yisen Wang, James M. Rehg, Le Song

Inner product-based convolution has been a central component of convolutional neural networks (CNNs) and the key to learning visual representations.

Learning Strict Identity Mappings in Deep Residual Networks

1 code implementation CVPR 2018 Xin Yu, Zhiding Yu, Srikumar Ramalingam

A family of super deep networks, referred to as residual networks or ResNet, achieved record-beating performance in various visual tasks such as image recognition, object detection, and semantic segmentation.

object-detection Object Detection +1

Deep Hyperspherical Learning

no code implementations NeurIPS 2017 Weiyang Liu, Yan-Ming Zhang, Xingguo Li, Zhiding Yu, Bo Dai, Tuo Zhao, Le Song

In light of such challenges, we propose hyperspherical convolution (SphereConv), a novel learning framework that gives angular representations on hyperspheres.

Representation Learning

CASENet: Deep Category-Aware Semantic Edge Detection

11 code implementations CVPR 2017 Zhiding Yu, Chen Feng, Ming-Yu Liu, Srikumar Ramalingam

To this end, we propose a novel end-to-end deep semantic edge learning architecture based on ResNet and a new skip-layer architecture where category-wise edge activations at the top convolution layer share and are fused with the same set of bottom layer features.

Edge Detection Object Proposal Generation +1

SphereFace: Deep Hypersphere Embedding for Face Recognition

20 code implementations CVPR 2017 Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, Le Song

This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space.

Face Identification Face Recognition +1

Large-Margin Softmax Loss for Convolutional Neural Networks

2 code implementations7 Dec 2016 Weiyang Liu, Yandong Wen, Zhiding Yu, Meng Yang

Cross-entropy loss together with softmax is arguably one of the most common used supervision components in convolutional neural networks (CNNs).

General Classification

Structured Hough Voting for Vision-based Highway Border Detection

no code implementations18 Nov 2014 Zhiding Yu, Wende Zhang, B. V. K. Vijaya Kumar, Dan Levi

We propose a vision-based highway border detection algorithm using structured Hough voting.

Multi-Task Regularization with Covariance Dictionary for Linear Classifiers

no code implementations21 Oct 2013 Fanyi Xiao, Ruikun Luo, Zhiding Yu

In this paper we propose a multi-task linear classifier learning problem called D-SVM (Dictionary SVM).

Transfer Learning valid

Constructing the L2-Graph for Robust Subspace Learning and Subspace Clustering

no code implementations5 Sep 2012 Xi Peng, Zhiding Yu, Huajin Tang, Zhang Yi

Under the framework of graph-based learning, the key to robust subspace clustering and subspace learning is to obtain a good similarity graph that eliminates the effects of errors and retains only connections between the data points from the same subspace (i. e., intra-subspace data points).

Clustering Image Clustering +1

