1 code implementation • ECCV 2020 • Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Alexander G. Schwing, Jan Kautz
Existing work on object detection often relies on a single form of annotation: the model is trained using either accurate yet costly bounding boxes or cheaper but less expressive image-level tags.
no code implementations • 18 Mar 2025 • Nvidia, Johan Bjorck, Fernando Castañeda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi "Jim" Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, Joel Jang, Zhenyu Jiang, Jan Kautz, Kaushil Kundalia, Lawrence Lao, Zhiqi Li, Zongyu Lin, Kevin Lin, Guilin Liu, Edith Llontop, Loic Magne, Ajay Mandlekar, Avnish Narayan, Soroush Nasiriany, Scott Reed, You Liang Tan, Guanzhi Wang, Zu Wang, Jing Wang, Qi Wang, Jiannan Xiang, Yuqi Xie, Yinzhen Xu, Zhenjia Xu, Seonghyeon Ye, Zhiding Yu, Ao Zhang, Hao Zhang, Yizhou Zhao, Ruijie Zheng, Yuke Zhu
A robot foundation model, trained on massive and diverse data sources, is essential for enabling the robots to reason about novel situations, robustly handle real-world variability, and rapidly learn new tasks.
no code implementations • 15 Mar 2025 • Zhenxin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Zuxuan Wu, Jose M. Alvarez
Hydra-NeXt surpasses the previous state-of-the-art by 22. 98 DS and 17. 49 SR, marking a significant advancement in autonomous driving.
no code implementations • 14 Mar 2025 • Chonghao Sima, Kashyap Chitta, Zhiding Yu, Shiyi Lan, Ping Luo, Andreas Geiger, Hongyang Li, Jose M. Alvarez
In this work, we propose Centaur (Cluster Entropy for Test-time trAining using Uncertainty) which updates a planner's behavior via test-time training, without relying on hand-engineered rules or cost functions.
no code implementations • 5 Mar 2025 • Zi Wang, Shiyi Lan, Xinglong Sun, Nadine Chang, Zhenxin Li, Zhiding Yu, Jose M. Alvarez
In this paper, we propose SafeFusion, a training framework to learn from collision data.
no code implementations • 19 Feb 2025 • Payman Behnam, Yaosheng Fu, Ritchie Zhao, Po-An Tsai, Zhiding Yu, Alexey Tumanov
To address this challenge, we present RocketKV, a training-free KV cache compression strategy designed specifically to reduce both memory bandwidth and capacity demand of KV cache during the decode phase.
no code implementations • 13 Feb 2025 • Ming-Chang Chiu, Fuxiao Liu, Karan Sapra, Andrew Tao, Yaser Jacoob, Xuezhe Ma, Zhiding Yu, Guilin Liu
The enhancement of Visual Language Models (VLMs) has traditionally relied on knowledge distillation from larger, more capable models.
no code implementations • 7 Feb 2025 • Yue Zhao, Fuzhao Xue, Scott Reed, Linxi Fan, Yuke Zhu, Jan Kautz, Zhiding Yu, Philipp Krähenbühl, De-An Huang
We validate the effectiveness of QLIP for multimodal understanding and text-conditioned image generation with a single model.
1 code implementation • 20 Jan 2025 • Zhiqi Li, Guo Chen, Shilong Liu, Shihao Wang, Vibashan VS, Yishen Ji, Shiyi Lan, Hao Zhang, Yilin Zhao, Subhashree Radhakrishnan, Nadine Chang, Karan Sapra, Amala Sanjay Deshmukh, Tuomas Rintamaki, Matthieu Le, Ilia Karmanov, Lukas Voegtle, Philipp Fischer, De-An Huang, Timo Roman, Tong Lu, Jose M. Alvarez, Bryan Catanzaro, Jan Kautz, Andrew Tao, Guilin Liu, Zhiding Yu
Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models.
no code implementations • 11 Dec 2024 • Jihao Liu, Zhiding Yu, Shiyi Lan, Shihao Wang, Rongyao Fang, Jan Kautz, Hongsheng Li, Jose M. Alvare
This paper presents StreamChat, a novel approach that enhances the interaction capabilities of Large Multimodal Models (LMMs) with streaming video content.
no code implementations • 2 Oct 2024 • Kyle Vedder, Neehar Peri, Ishan Khatri, Siyi Li, Eric Eaton, Mehmet Kocamaz, Yue Wang, Zhiding Yu, Deva Ramanan, Joachim Pehserl
We reframe scene flow as the task of estimating a continuous space-time ODE that describes motion for an entire observation sequence, represented with a neural prior.
2 code implementations • 28 Aug 2024 • Min Shi, Fuxiao Liu, Shihao Wang, Shijia Liao, Subhashree Radhakrishnan, Yilin Zhao, De-An Huang, Hongxu Yin, Karan Sapra, Yaser Yacoob, Humphrey Shi, Bryan Catanzaro, Andrew Tao, Jan Kautz, Zhiding Yu, Guilin Liu
The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs).
no code implementations • 9 Jul 2024 • Barath Lakshmanan, Joshua Chen, Shiyi Lan, Maying Shen, Zhiding Yu, Jose M. Alvarez
The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role.
2 code implementations • 11 Jun 2024 • Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, Yu-Gang Jiang, Jose M. Alvarez
We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model.
no code implementations • 29 May 2024 • Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei Ping, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu Yin
We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities.
1 code implementation • 27 May 2024 • Yiming Li, Zehong Wang, Yue Wang, Zhiding Yu, Zan Gojcic, Marco Pavone, Chen Feng, Jose M. Alvarez
Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory.
1 code implementation • 2 May 2024 • Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez
We further propose OmniDrive-nuScenes, a new visual question-answering dataset challenging the true 3D situational awareness of a model with comprehensive visual question-answering (VQA) tasks, including scene description, traffic regulation, 3D grounding, counterfactual reasoning, decision making and planning.
no code implementations • 1 Apr 2024 • Shuaiyi Huang, De-An Huang, Zhiding Yu, Shiyi Lan, Subhashree Radhakrishnan, Jose M. Alvarez, Abhinav Shrivastava, Anima Anandkumar
Video instance segmentation (VIS) is a challenging vision task that aims to detect, segment, and track objects in videos.
1 code implementation • 27 Mar 2024 • De-An Huang, Shijia Liao, Subhashree Radhakrishnan, Hongxu Yin, Pavlo Molchanov, Zhiding Yu, Jan Kautz
In addition to leveraging existing video datasets with timestamps, we propose a new task, Reasoning Temporal Localization (RTL), along with the dataset, ActivityNet-RTL, for learning and evaluating this task.
no code implementations • CVPR 2024 • Zetong Yang, Zhiding Yu, Chris Choy, Renhao Wang, Anima Anandkumar, Jose M. Alvarez
This mapping allows the depth estimation of distant objects conditioned on their 2D boxes, making long-range 3D detection with 2D supervision feasible.
1 code implementation • 21 Feb 2024 • Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, Anima Anandkumar
Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model.
1 code implementation • ICCV 2023 • Bingyin Zhao, Zhiding Yu, Shiyi Lan, Yutao Cheng, Anima Anandkumar, Yingjie Lao, Jose M. Alvarez
With the proposed STL framework, our best model based on FAN-L-Hybrid (77. 3M parameters) achieves 84. 8% Top-1 accuracy and 42. 1% mCE on ImageNet-1K and ImageNet-C, and sets a new state-of-the-art for ImageNet-A (46. 1%) and ImageNet-R (56. 6%) without using extra data, outperforming the original FAN counterpart by significant margins.
Ranked #16 on
Domain Generalization
on ImageNet-C
1 code implementation • 21 Dec 2023 • Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu, Alan Yuille, Yuyin Zhou, Cihang Xie
Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common sense knowledge for supervising segmentation models.
1 code implementation • CVPR 2024 • Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez
We initially observed that the nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models incorporating ego status, such as the ego vehicle's velocity.
1 code implementation • 8 Aug 2023 • Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar, Jiaya Jia, Jose Alvarez
For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.
Ranked #8 on
3D Object Detection
on nuScenes
1 code implementation • ICCV 2023 • Zhiqi Li, Zhiding Yu, Wenhai Wang, Anima Anandkumar, Tong Lu, Jose M. Alvarez
Currently, the two most prominent VTM paradigms are forward projection and backward projection.
1 code implementation • 4 Jul 2023 • Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, Jose M. Alvarez
This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop.
Ranked #2 on
Prediction Of Occupancy Grid Maps
on Occ3D-nuScenes
no code implementations • 27 Jun 2023 • Zelun Luo, Yuliang Zou, Yijin Yang, Zane Durante, De-An Huang, Zhiding Yu, Chaowei Xiao, Li Fei-Fei, Animashree Anandkumar
In recent years, differential privacy has seen significant advancements in image classification; however, its application to video activity recognition remains under-explored.
1 code implementation • 15 Jun 2023 • Yiming Li, Sihang Li, Xinhao Liu, Moonjun Gong, Kenan Li, Nuo Chen, Zijun Wang, Zhiheng Li, Tao Jiang, Fisher Yu, Yue Wang, Hang Zhao, Zhiding Yu, Chen Feng
Monocular scene understanding is a foundational component of autonomous systems.
3D Semantic Scene Completion
3D Semantic Scene Completion from a single 2D image
no code implementations • 3 May 2023 • Alex Trevithick, Matthew Chan, Michael Stengel, Eric R. Chan, Chao Liu, Zhiding Yu, Sameh Khamis, Manmohan Chandraker, Ravi Ramamoorthi, Koki Nagano
We present a one-shot method to infer and render a photorealistic 3D representation from a single unposed image (e. g., face portrait) in real-time.
2 code implementations • 4 Mar 2023 • Shikun Liu, Linxi Fan, Edward Johns, Zhiding Yu, Chaowei Xiao, Anima Anandkumar
Recent vision-language models have shown impressive multi-modal generation capabilities.
Ranked #1 on
Image Captioning
on nocaps val
1 code implementation • CVPR 2023 • Yiming Li, Zhiding Yu, Christopher Choy, Chaowei Xiao, Jose M. Alvarez, Sanja Fidler, Chen Feng, Anima Anandkumar
To enable such capability in AI systems, we propose VoxFormer, a Transformer-based semantic scene completion framework that can output complete 3D volumetric semantics from only 2D images.
3D geometry
3D Semantic Scene Completion from a single RGB image
+1
no code implementations • 9 Feb 2023 • Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Ming-Yu Liu, Yuke Zhu, Mohammad Shoeybi, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar
Augmenting pretrained language models (LMs) with a vision encoder (e. g., Flamingo) has obtained the state-of-the-art results in image-to-text generation.
no code implementations • CVPR 2023 • Shiyi Lan, Xitong Yang, Zhiding Yu, Zuxuan Wu, Jose M. Alvarez, Anima Anandkumar
We propose Mask Auto-Labeler (MAL), a high-quality Transformer-based mask auto-labeling framework for instance segmentation using only box annotations.
no code implementations • ICCV 2023 • Yanwei Li, Zhiding Yu, Jonah Philion, Anima Anandkumar, Sanja Fidler, Jiaya Jia, Jose Alvarez
In this work, we present an end-to-end framework for camera-based 3D multi-object tracking, called DQTrack.
1 code implementation • ICCV 2023 • Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Anima Anandkumar, Jiaya Jia, Jose M. Alvarez
For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.
1 code implementation • 23 Oct 2022 • Junfei Xiao, Zhichao Xu, Shiyi Lan, Zhiding Yu, Alan Yuille, Anima Anandkumar
The model is trained on a composite dataset consisting of images from 9 datasets (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, WildDash 2, IDD, BDD, and COCO) with a simple dataset balancing strategy.
1 code implementation • 15 Sep 2022 • Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, Chaowei Xiao
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
no code implementations • 21 Aug 2022 • Jiachen Sun, Weili Nie, Zhiding Yu, Z. Morley Mao, Chaowei Xiao
3D Point cloud is becoming a critical data representation in many real-world applications like autonomous driving, robotics, and medical imaging.
2 code implementations • 3 Aug 2022 • De-An Huang, Zhiding Yu, Anima Anandkumar
By only training a query-based image instance segmentation model, MinVIS outperforms the previous best result on the challenging Occluded VIS dataset by over 10% AP.
no code implementations • CVPR 2022 • Rafid Mahmood, James Lucas, David Acuna, Daiqing Li, Jonah Philion, Jose M. Alvarez, Zhiding Yu, Sanja Fidler, Marc T. Law
Given a small training data set and a learning algorithm, how much more data is necessary to reach a target validation or test performance?
1 code implementation • CVPR 2022 • Huaizu Jiang, Xiaojian Ma, Weili Nie, Zhiding Yu, Yuke Zhu, Song-Chun Zhu, Anima Anandkumar
A significant gap remains between today's visual pattern recognition models and human-level visual cognition especially when it comes to few-shot learning and compositional reasoning of novel concepts.
Ranked #1 on
Few-Shot Image Classification
on Bongard-HOI
2 code implementations • 26 Apr 2022 • Daquan Zhou, Zhiding Yu, Enze Xie, Chaowei Xiao, Anima Anandkumar, Jiashi Feng, Jose M. Alvarez
Our study is motivated by the intriguing properties of the emerging visual grouping in Vision Transformers, which indicates that self-attention may promote robustness through improved mid-level representations.
Ranked #4 on
Domain Generalization
on ImageNet-R
(using extra training data)
1 code implementation • ICLR 2022 • Xiaojian Ma, Weili Nie, Zhiding Yu, Huaizu Jiang, Chaowei Xiao, Yuke Zhu, Song-Chun Zhu, Anima Anandkumar
This task remains challenging for current deep learning algorithms since it requires addressing three key technical problems jointly: 1) identifying object entities and their properties, 2) inferring semantic relations between pairs of entities, and 3) generalizing to novel object-relation combinations, i. e., systematic generalization.
Ranked #1 on
Zero-Shot Human-Object Interaction Detection
on HICO
no code implementations • 11 Apr 2022 • Enze Xie, Zhiding Yu, Daquan Zhou, Jonah Philion, Anima Anandkumar, Sanja Fidler, Ping Luo, Jose M. Alvarez
In this paper, we propose M$^2$BEV, a unified framework that jointly performs 3D object detection and map segmentation in the Birds Eye View~(BEV) space with multi-camera image inputs.
1 code implementation • CVPR 2022 • Jiteng Mu, Shalini De Mello, Zhiding Yu, Nuno Vasconcelos, Xiaolong Wang, Jan Kautz, Sifei Liu
We represent the correspondence maps of different images as warped coordinate frames transformed from a canonical coordinate frame, i. e., the correspondence map, which describes the structure (e. g., the shape of a face), is controlled via a transformation.
1 code implementation • CVPR 2022 • Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima Anandkumar, Chunhua Shen, Jose M. Alvarez
FreeSOLO further demonstrates superiority as a strong pre-training method, outperforming state-of-the-art self-supervised pre-training methods by +9. 8% AP when fine-tuning instance segmentation with only 5% COCO masks.
6 code implementations • 28 Jan 2022 • Jiachen Sun, Qingzhao Zhang, Bhavya Kailkhura, Zhiding Yu, Chaowei Xiao, Z. Morley Mao
Deep neural networks on 3D point cloud data have been widely used in the real world, especially in safety-critical applications.
Ranked #1 on
3D Point Cloud Data Augmentation
on ModelNet40-C
3D Point Cloud Classification
3D Point Cloud Data Augmentation
+2
no code implementations • NeurIPS 2021 • Jiachen Sun, Yulong Cao, Christopher B. Choy, Zhiding Yu, Anima Anandkumar, Zhuoqing Morley Mao, Chaowei Xiao
In this paper, we systematically study the impact of various self-supervised learning proxy tasks on different architectures and threat models for 3D point clouds with adversarial training.
no code implementations • NeurIPS 2021 • Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz
It is therefore interesting to study how these two tasks can be coupled to benefit each other.
1 code implementation • NeurIPS 2021 • Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Anima Anandkumar, Zhangyang Wang
Diversity and hardness are two complementary dimensions of data augmentation to achieve robustness.
no code implementations • 29 Sep 2021 • Eric Zhao, De-An Huang, Hao liu, Zhiding Yu, Anqi Liu, Olga Russakovsky, Anima Anandkumar
In real-world applications, however, there are multiple protected attributes yielding a large number of intersectional protected groups.
no code implementations • 22 Sep 2021 • Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz, Ming-Hsuan Yang
Dense correspondence across semantically related images has been extensively studied, but still faces two challenges: 1) large variations in appearance, scale and pose exist even for objects from the same category, and 2) labeling pixel-level dense correspondences is labor intensive and infeasible to scale.
3 code implementations • CVPR 2022 • Zhiqi Li, Wenhai Wang, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo, Tong Lu
Specifically, we supervise the attention modules in the mask decoder in a layer-wise manner.
Ranked #4 on
Panoptic Segmentation
on COCO test-dev
no code implementations • CVPR 2022 • Ismail Elezi, Zhiding Yu, Anima Anandkumar, Laura Leal-Taixe, Jose M. Alvarez
Deep neural networks have reached high accuracy on object detection but their success hinges on large amounts of labeled data.
1 code implementation • 17 Jun 2021 • Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Anima Anandkumar
A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert.
no code implementations • 9 Jun 2021 • Sina Mohseni, Haotao Wang, Zhiding Yu, Chaowei Xiao, Zhangyang Wang, Jay Yadawa
The open-world deployment of Machine Learning (ML) algorithms in safety-critical applications such as autonomous vehicles needs to address a variety of ML vulnerabilities such as interpretability, verifiability, and performance limitations.
26 code implementations • NeurIPS 2021 • Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo
We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders.
Ranked #1 on
Semantic Segmentation
on COCO-Stuff full
3 code implementations • ICCV 2021 • Shiyi Lan, Zhiding Yu, Christopher Choy, Subhashree Radhakrishnan, Guilin Liu, Yuke Zhu, Larry S. Davis, Anima Anandkumar
We introduce DiscoBox, a novel framework that jointly learns instance segmentation and semantic correspondence using bounding box supervision.
1 code implementation • 12 Apr 2021 • Nadine Chang, Zhiding Yu, Yu-Xiong Wang, Anima Anandkumar, Sanja Fidler, Jose M. Alvarez
As a result, image resampling alone is not enough to yield a sufficiently balanced distribution at the object level.
2 code implementations • ICLR 2021 • Wuyang Chen, Zhiding Yu, Shalini De Mello, Sifei Liu, Jose M. Alvarez, Zhangyang Wang, Anima Anandkumar
Training on synthetic data can be beneficial for label or data-scarce scenarios.
no code implementations • 1 Jan 2021 • De-An Huang, Zhiding Yu, Anima Anandkumar
We upend this view and show that URRL improves both the natural accuracy of unsupervised representation learning and its robustness to corruptions and adversarial noise.
no code implementations • 21 Oct 2020 • Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Alexander G. Schwing, Jan Kautz
Existing work on object detection often relies on a single form of annotation: the model is trained using either accurate yet costly bounding boxes or cheaper but less expressive image-level tags.
no code implementations • 8 Oct 2020 • Haoxuan Wang, Zhiding Yu, Yisong Yue, Anima Anandkumar, Anqi Liu, Junchi Yan
We propose a framework for learning calibrated uncertainties under domain shifts, where the source (training) distribution differs from the target (test) distribution.
1 code implementation • NeurIPS 2020 • Weili Nie, Zhiding Yu, Lei Mao, Ankit B. Patel, Yuke Zhu, Animashree Anandkumar
Inspired by the original one hundred BPs, we propose a new benchmark Bongard-LOGO for human-level concept learning and reasoning.
no code implementations • 28 Sep 2020 • Haoxuan Wang, Anqi Liu, Zhiding Yu, Yisong Yue, Anima Anandkumar
This formulation motivates the use of two neural networks that are jointly trained --- a discriminative network between the source and target domains for density-ratio estimation, in addition to the standard classification network.
2 code implementations • 21 Aug 2020 • Xueyan Zou, Fanyi Xiao, Zhiding Yu, Yong Jae Lee
Aliasing refers to the phenomenon that high frequency signals degenerate into completely different ones after sampling.
1 code implementation • ECCV 2020 • Yang Zou, Xiaodong Yang, Zhiding Yu, B. V. K. Vijaya Kumar, Jan Kautz
To this end, we propose a joint learning framework that disentangles id-related/unrelated features and enforces adaptation to work on the id-related feature space exclusively.
Ranked #7 on
Unsupervised Domain Adaptation
on Market to MSMT
no code implementations • 17 Jul 2020 • Grigorios G. Chrysos, Jean Kossaifi, Zhiding Yu, Anima Anandkumar
Instead, we propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training.
1 code implementation • NeurIPS 2020 • Yujia Huang, James Gornet, Sihui Dai, Zhiding Yu, Tan Nguyen, Doris Y. Tsao, Anima Anandkumar
This mechanism can be interpreted as a form of self-consistency between the maximum a posteriori (MAP) estimation of an internal generative model and the external environment.
no code implementations • 14 Jul 2020 • Guilin Liu, Rohan Taori, Ting-Chun Wang, Zhiding Yu, Shiqiu Liu, Fitsum A. Reda, Karan Sapra, Andrew Tao, Bryan Catanzaro
Specifically, we directly treat the whole encoded feature map of the input texture as transposed convolution filters and the features' self-similarity map, which captures the auto-correlation information, as input to the transposed convolution.
1 code implementation • ICML 2020 • Wuyang Chen, Zhiding Yu, Zhangyang Wang, Anima Anandkumar
Models trained on synthetic images often face degraded generalization to real data.
no code implementations • 28 Jun 2020 • Yingda Xia, Dong Yang, Zhiding Yu, Fengze Liu, Jinzheng Cai, Lequan Yu, Zhuotun Zhu, Daguang Xu, Alan Yuille, Holger Roth
Experiments on the NIH pancreas segmentation dataset and a multi-organ segmentation dataset show state-of-the-art performance of the proposed framework on semi-supervised medical image segmentation.
2 code implementations • CVPR 2020 • Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Yong Jae Lee, Alexander G. Schwing, Jan Kautz
Weakly supervised learning has emerged as a compelling tool for object detection by reducing the need for strong supervision during training.
Ranked #1 on
Weakly Supervised Object Detection
on COCO test-dev
no code implementations • ICML 2020 • Beidi Chen, Weiyang Liu, Zhiding Yu, Jan Kautz, Anshumali Shrivastava, Animesh Garg, Anima Anandkumar
We also find that AVH has a statistically significant correlation with human visual hardness.
2 code implementations • ICCV 2019 • Yang Zou, Zhiding Yu, Xiaofeng Liu, B. V. K. Vijaya Kumar, Jinsong Wang
Recent advances in domain adaptation show that deep self-training presents a powerful means for unsupervised domain adaptation.
Ranked #18 on
Image-to-Image Translation
on SYNTHIA-to-Cityscapes
1 code implementation • CVPR 2020 • Rongmei Lin, Weiyang Liu, Zhen Liu, Chen Feng, Zhiding Yu, James M. Rehg, Li Xiong, Le Song
Inspired by the Thomson problem in physics where the distribution of multiple propelling electrons on a unit sphere can be modeled via minimizing some potential energy, hyperspherical energy minimization has demonstrated its potential in regularizing neural networks and improving their generalization power.
12 code implementations • CVPR 2019 • Zhedong Zheng, Xiaodong Yang, Zhiding Yu, Liang Zheng, Yi Yang, Jan Kautz
To this end, we propose a joint learning framework that couples re-id learning and data generation end-to-end.
Ranked #1 on
Person Re-Identification
on UAV-Human
Image-to-Image Translation
Unsupervised Domain Adaptation
+1
4 code implementations • 28 Nov 2018 • Guilin Liu, Kevin J. Shih, Ting-Chun Wang, Fitsum A. Reda, Karan Sapra, Zhiding Yu, Andrew Tao, Bryan Catanzaro
In this paper, we present a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks.
1 code implementation • 18 Oct 2018 • Yang Zou, Zhiding Yu, B. V. K. Vijaya Kumar, Jinsong Wang
In this paper, we propose a novel UDA framework based on an iterative self-training procedure, where the problem is formulated as latent variable loss minimization, and can be solved by alternatively generating pseudo labels on target data and re-training the model with these labels.
1 code implementation • ECCV 2018 • Yang Zou, Zhiding Yu, B. V. K. Vijaya Kumar, Jinsong Wang
Recent deep networks achieved state of the art performanceon a variety of semantic segmentation tasks.
3 code implementations • ECCV 2018 • Zhiding Yu, Weiyang Liu, Yang Zou, Chen Feng, Srikumar Ramalingam, B. V. K. Vijaya Kumar, Jan Kautz
Edge detection is among the most fundamental vision problems for its role in perceptual grouping and its wide applications.
4 code implementations • NeurIPS 2018 • Weiyang Liu, Rongmei Lin, Zhen Liu, Lixin Liu, Zhiding Yu, Bo Dai, Le Song
In light of this intuition, we reduce the redundancy regularization problem to generic energy minimization, and propose a minimum hyperspherical energy (MHE) objective as generic regularization for neural networks.
1 code implementation • CVPR 2018 • Weiyang Liu, Zhen Liu, Zhiding Yu, Bo Dai, Rongmei Lin, Yisen Wang, James M. Rehg, Le Song
Inner product-based convolution has been a central component of convolutional neural networks (CNNs) and the key to learning visual representations.
1 code implementation • CVPR 2018 • Xin Yu, Zhiding Yu, Srikumar Ramalingam
A family of super deep networks, referred to as residual networks or ResNet, achieved record-beating performance in various visual tasks such as image recognition, object detection, and semantic segmentation.
no code implementations • NeurIPS 2017 • Weiyang Liu, Yan-Ming Zhang, Xingguo Li, Zhiding Yu, Bo Dai, Tuo Zhao, Le Song
In light of such challenges, we propose hyperspherical convolution (SphereConv), a novel learning framework that gives angular representations on hyperspheres.
11 code implementations • CVPR 2017 • Zhiding Yu, Chen Feng, Ming-Yu Liu, Srikumar Ramalingam
To this end, we propose a novel end-to-end deep semantic edge learning architecture based on ResNet and a new skip-layer architecture where category-wise edge activations at the top convolution layer share and are fused with the same set of bottom layer features.
Ranked #1 on
Edge Detection
on SBD
22 code implementations • CVPR 2017 • Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, Le Song
This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space.
Ranked #1 on
Face Verification
on CK+
2 code implementations • 7 Dec 2016 • Weiyang Liu, Yandong Wen, Zhiding Yu, Meng Yang
Cross-entropy loss together with softmax is arguably one of the most common used supervision components in convolutional neural networks (CNNs).
no code implementations • 14 Nov 2015 • Weiyang Liu, Zhiding Yu, Yandong Wen, Rongmei Lin, Meng Yang
Sparse coding with dictionary learning (DL) has shown excellent classification performance.
no code implementations • 18 Nov 2014 • Zhiding Yu, Wende Zhang, B. V. K. Vijaya Kumar, Dan Levi
We propose a vision-based highway border detection algorithm using structured Hough voting.
no code implementations • 17 Oct 2014 • Weiyang Liu, Zhiding Yu, Lijia Lu, Yandong Wen, Hui Li, Yuexian Zou
The LCD similarity measure can be kernelized under KCRC, which theoretically links CRC and LCD under the kernel method.
no code implementations • CVPR 2014 • Zhiding Yu, Chunjing Xu, Deyu Meng, Zhuo Hui, Fanyi Xiao, Wenbo Liu, Jianzhuang Liu
We propose a very intuitive and simple approximation for the conventional spectral clustering methods.
no code implementations • 21 Oct 2013 • Fanyi Xiao, Ruikun Luo, Zhiding Yu
In this paper we propose a multi-task linear classifier learning problem called D-SVM (Dictionary SVM).
no code implementations • 5 Sep 2012 • Xi Peng, Zhiding Yu, Huajin Tang, Zhang Yi
Under the framework of graph-based learning, the key to robust subspace clustering and subspace learning is to obtain a good similarity graph that eliminates the effects of errors and retains only connections between the data points from the same subspace (i. e., intra-subspace data points).