no code implementations • ICLR 2019 • Yutong Bai, Lingxi Xie
Reinforcement learning (RL) is a metaheuristic aiming at teaching an agent to interact with an environment and maximizing the reward in a complex task.
1 code implementation • 8 Apr 2024 • Alberto Hojel, Yutong Bai, Trevor Darrell, Amir Globerson, Amir Bar
In this work, we analyze the activations of MAE-VQGAN, a recent Visual Prompting model, and find task vectors, activations that encode task-specific information.
1 code implementation • 1 Dec 2023 • Yutong Bai, Xinyang Geng, Karttikeya Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A Efros
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.
no code implementations • 4 Oct 2023 • Shiqi Liu, Yutong Bai, Xinyang Han, Alan Yuille
By the generalized inverse theory, we derived two forms of general inverse matrix formulations that can correspond to the two prominent classes of Pan-sharpening methods, that is, component substitution and multi-resolution analysis methods.
no code implementations • 1 Jun 2023 • Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, Alan Yuille
(2) We find regions in the latent space that lead to distorted images independent of the text prompt, suggesting that parts of the latent space are not well-structured.
1 code implementation • 23 Oct 2022 • Junfei Xiao, Yutong Bai, Alan Yuille, Zongwei Zhou
We hope that this study can direct future research on the application of Transformers to a larger variety of medical imaging tasks.
1 code implementation • 5 Oct 2022 • Liangyu Chen, Yutong Bai, Siyu Huang, Yongyi Lu, Bihan Wen, Alan L. Yuille, Zongwei Zhou
However, we uncover a striking contradiction to this promise: active learning fails to select data as efficiently as random selection at the first few choices.
1 code implementation • CVPR 2023 • Yutong Bai, Zeyu Wang, Junfei Xiao, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie
For example, by distilling the knowledge from an MAE pre-trained ViT-L into a ViT-B, our method achieves 84. 0% ImageNet top-1 accuracy, outperforming the baseline of directly distilling a fine-tuned ViT-L by 1. 2%.
1 code implementation • 7 Jun 2022 • Zeyu Wang, Yutong Bai, Yuyin Zhou, Cihang Xie
The recent success of Vision Transformers is shaking the long dominance of Convolutional Neural Networks (CNNs) in image recognition for a decade.
1 code implementation • ICLR 2022 • Jieru Mei, Yucheng Han, Yutong Bai, Yixiao Zhang, Yingwei Li, Xianhang Li, Alan Yuille, Cihang Xie
Specifically, our modifications in Fast AdvProp are guided by the hypothesis that disentangled learning with adversarial examples is the key for performance improvements, while other training recipes (e. g., paired clean and adversarial training samples, multi-step adversarial attackers) could be largely simplified.
1 code implementation • CVPR 2022 • Yutong Bai, Xinlei Chen, Alexander Kirillov, Alan Yuille, Alexander C. Berg
In this work we present point-level region contrast, a self-supervised pre-training approach for the task of object detection.
1 code implementation • NeurIPS 2021 • Yutong Bai, Jieru Mei, Alan Yuille, Cihang Xie
Transformer emerges as a powerful tool for visual recognition.
Ranked #1 on Adversarial Robustness on Stylized ImageNet
1 code implementation • NeurIPS 2021 • Qihang Yu, Yingda Xia, Yutong Bai, Yongyi Lu, Alan Yuille, Wei Shen
It is motivated by the Glance and Gaze behavior of human beings when recognizing objects in natural scenes, with the ability to efficiently model both long-range dependencies and local context.
1 code implementation • 29 Mar 2021 • Junfei Xiao, Lequan Yu, Zongwei Zhou, Yutong Bai, Lei Xing, Alan Yuille, Yuyin Zhou
We propose a new normalization strategy, named categorical normalization (CateNorm), to normalize the activations according to categorical statistics.
2 code implementations • 14 Mar 2021 • Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang
Fine-grained visual classification (FGVC) which aims at recognizing objects from subcategories is a very challenging task due to the inherently subtle inter-class differences.
Ranked #4 on Fine-Grained Image Classification on CUB-200-2011
1 code implementation • CVPR 2021 • Qihang Yu, Jianming Zhang, He Zhang, Yilin Wang, Zhe Lin, Ning Xu, Yutong Bai, Alan Yuille
We propose Mask Guided (MG) Matting, a robust matting framework that takes a general coarse mask as guidance.
no code implementations • 1 Dec 2020 • Mengqi Guo, Yutong Bai, Zhishuai Zhang, Adam Kortylewski, Alan Yuille
Specifically, given a training image, we find a set of similar images that show instances of the same object category in the same pose, through an affine alignment of their corresponding feature maps.
no code implementations • 25 Nov 2020 • Yutong Bai, Haoqi Fan, Ishan Misra, Ganesh Venkatesh, Yongyi Lu, Yuyin Zhou, Qihang Yu, Vikas Chandra, Alan Yuille
To this end, we present Temporal-aware Contrastive self-supervised learningTaCo, as a general paradigm to enhance video CSL.
no code implementations • 29 Sep 2020 • Yutong Bai, Angtian Wang, Adam Kortylewski, Alan Yuille
In this paper, we introduce a contrastive learning framework for keypoint detection (CoKe).
no code implementations • CVPR 2020 • Qihang Yu, Dong Yang, Holger Roth, Yutong Bai, Yixiao Zhang, Alan L. Yuille, Daguang Xu
3D convolution neural networks (CNN) have been proved very successful in parsing organs or tumours in 3D medical images, but it remains sophisticated and time-consuming to choose or design proper 3D networks given different task contexts.
3 code implementations • CVPR 2019 • Runtao Liu, Chenxi Liu, Yutong Bai, Alan Yuille
Yet there has been evidence that current benchmark datasets suffer from bias, and current state-of-the-art models cannot be easily evaluated on their intermediate reasoning process.
Ranked #1 on Referring Expression Segmentation on CLEVR-Ref+
1 code implementation • ICCV 2019 • Yutong Bai, Qing Liu, Lingxi Xie, Weichao Qiu, Yan Zheng, Alan Yuille
In particular, this enables images in the training dataset to be matched to a virtual 3D model of the object (for simplicity, we assume that the object viewpoint can be estimated by standard techniques).