1 code implementation • 13 Sep 2023 • Haoqin Tu, Bingchen Zhao, Chen Wei, Cihang Xie
Multi-modal large language models (MLLMs) are trained based on large language models (LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual responses.
1 code implementation • 24 Jul 2023 • YiQing Wang, Zihan Li, Jieru Mei, Zihao Wei, Li Liu, Chen Wang, Shengtian Sang, Alan Yuille, Cihang Xie, Yuyin Zhou
To address this limitation, we present Masked Multi-view with Swin Transformers (SwinMM), a novel multi-view pipeline for enabling accurate and data-efficient self-supervised medical image analysis.
1 code implementation • 21 Jul 2023 • Qingyue Wei, Lequan Yu, Xianhang Li, Wei Shao, Cihang Xie, Lei Xing, Yuyin Zhou
Specifically, our approach first involves training a segmentation model on a small set of clean labeled images to generate initial labels for unlabeled data.
1 code implementation • 27 Jun 2023 • Xianhang Li, Zeyu Wang, Cihang Xie
The recent work CLIPA presents an inverse scaling law for CLIP training -- whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training.
1 code implementation • 11 May 2023 • Xianhang Li, Zeyu Wang, Cihang Xie
CLIP, the first foundation model that connects images and text, has enabled many recent breakthroughs in computer vision.
Ranked #8 on
Zero-Shot Transfer Image Classification
on ImageNet
no code implementations • 6 Apr 2023 • Chen Wei, Karttikeya Mangalam, Po-Yao Huang, Yanghao Li, Haoqi Fan, Hu Xu, Huiyu Wang, Cihang Xie, Alan Yuille, Christoph Feichtenhofer
There has been a longstanding belief that generation can facilitate a true understanding of visual data.
no code implementations • 25 Jan 2023 • Shaoyuan Xie, Zichao Li, Zeyu Wang, Cihang Xie
In recent years, camera-based 3D object detection has gained widespread attention for its ability to achieve high performance with low computational cost.
no code implementations • 10 Jan 2023 • Chen Wang, Angtian Wang, Junbo Li, Alan Yuille, Cihang Xie
We find that NeRF-based models are significantly degraded in the presence of corruption, and are more sensitive to a different set of corruptions than image recognition models.
1 code implementation • 20 Dec 2022 • Junyang Wu, Xianhang Li, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie
This paper presents a simple and effective visual prompting method for adapting pre-trained models to downstream recognition tasks.
no code implementations • 1 Dec 2022 • Zhuowan Li, Cihang Xie, Benjamin Van Durme, Alan Yuille
In this work, we investigate how language can help with visual representation learning from a probing perspective.
no code implementations • 29 Nov 2022 • Nataniel Ruiz, Sarah Adel Bargal, Cihang Xie, Kate Saenko, Stan Sclaroff
One shortcoming of this is the fact that these deep neural networks cannot be easily evaluated for robustness issues with respect to specific scene variations.
no code implementations • 27 Nov 2022 • Yunchao Zhang, Zonglin Di, Kaiwen Zhou, Cihang Xie, Xin Eric Wang
However, since the local data is inaccessible to the server under federated learning, attackers may easily poison the training data of the local client to build a backdoor in the agent without notice.
no code implementations • 21 Nov 2022 • Yuanze Lin, Chen Wei, Huiyu Wang, Alan Yuille, Cihang Xie
Coupling all these designs allows our method to enjoy both competitive performances on text-to-video retrieval and video question answering tasks, and much less pre-training costs by 1. 9X or more.
no code implementations • 6 Sep 2022 • Zichao Li, Li Liu, Zeyu Wang, Yuyin Zhou, Cihang Xie
Adversarial training (AT) with samples generated by Fast Gradient Sign Method (FGSM), also known as FGSM-AT, is a computationally simple method to train robust networks.
1 code implementation • CVPR 2023 • Yutong Bai, Zeyu Wang, Junfei Xiao, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie
For example, by distilling the knowledge from an MAE pre-trained ViT-L into a ViT-B, our method achieves 84. 0% ImageNet top-1 accuracy, outperforming the baseline of directly distilling a fine-tuned ViT-L by 1. 2%.
1 code implementation • CVPR 2022 • Sucheng Ren, Huiyu Wang, Zhengqi Gao, Shengfeng He, Alan Yuille, Yuyin Zhou, Cihang Xie
More notably, our SDMP is the first method that successfully leverages data mixing to improve (rather than hurt) the performance of Vision Transformers in the self-supervised setting.
1 code implementation • 7 Jun 2022 • Zeyu Wang, Yutong Bai, Yuyin Zhou, Cihang Xie
The recent success of Vision Transformers is shaking the long dominance of Convolutional Neural Networks (CNNs) in image recognition for a decade.
1 code implementation • 24 May 2022 • Sizhe Chen, Zhehao Huang, Qinghua Tao, Yingwen Wu, Cihang Xie, Xiaolin Huang
The score-based query attacks (SQAs) pose practical threats to deep neural networks by crafting adversarial perturbations within dozens of queries, only using the model's output scores.
1 code implementation • 24 May 2022 • Shutong Wu, Sizhe Chen, Cihang Xie, Xiaolin Huang
Based on OPS, we introduce an unlearnable dataset called CIFAR-10-S, which is indistinguishable from CIFAR-10 by humans but induces the trained model to extremely low accuracy.
1 code implementation • 3 May 2022 • Xianhang Li, Huiyu Wang, Chen Wei, Jieru Mei, Alan Yuille, Yuyin Zhou, Cihang Xie
Inspired by this observation, we hypothesize that the key to effectively leveraging image pre-training lies in the decomposition of learning spatial and temporal features, and revisiting image pre-training as the appearance prior to initializing 3D kernels.
1 code implementation • ICLR 2022 • Jieru Mei, Yucheng Han, Yutong Bai, Yixiao Zhang, Yingwei Li, Xianhang Li, Alan Yuille, Cihang Xie
Specifically, our modifications in Fast AdvProp are guided by the hypothesis that disentangled learning with adversarial examples is the key for performance improvements, while other training recipes (e. g., paired clean and adversarial training samples, multi-step adversarial attackers) could be largely simplified.
1 code implementation • 9 Feb 2022 • Yuyin Zhou, Xianhang Li, Fengze Liu, Xuxi Chen, Lequan Yu, Cihang Xie, Matthew P. Lungren, Lei Xing
Specifically, our method dynamically adjusts the per-sample importance weight between the real observed labels and pseudo-labels, where the weights are efficiently determined in a meta process.
Ranked #8 on
Image Classification
on Clothing1M (using clean data)
(using extra training data)
1 code implementation • 15 Nov 2021 • Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong
We present a self-supervised framework iBOT that can perform masked prediction with an online tokenizer.
Ranked #1 on
Unsupervised Image Classification
on ImageNet
1 code implementation • NeurIPS 2021 • Yutong Bai, Jieru Mei, Alan Yuille, Cihang Xie
Transformer emerges as a powerful tool for visual recognition.
Ranked #1 on
Adversarial Robustness
on ImageNet-A
1 code implementation • ICCV 2021 • Zhuowan Li, Elias Stengel-Eskin, Yixiao Zhang, Cihang Xie, Quan Tran, Benjamin Van Durme, Alan Yuille
Our experiments show CCO substantially boosts the performance of neural symbolic methods on real images.
no code implementations • ICLR 2022 • Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong
The success of language Transformers is primarily attributed to the pretext task of masked language modeling (MLM), where texts are first tokenized into semantically meaningful pieces.
no code implementations • CVPR 2022 • Nataniel Ruiz, Adam Kortylewski, Weichao Qiu, Cihang Xie, Sarah Adel Bargal, Alan Yuille, Stan Sclaroff
In this work, we propose a framework for learning how to test machine learning algorithms using simulators in an adversarial manner in order to find weaknesses in the model before deploying it in critical scenarios.
1 code implementation • CVPR 2021 • Xiangning Chen, Cihang Xie, Mingxing Tan, Li Zhang, Cho-Jui Hsieh, Boqing Gong
Data augmentation has become a de facto component for training high-performance deep image classifiers, but its potential is under-explored for object detection.
Ranked #18 on
Object Detection
on COCO-O
1 code implementation • 28 Nov 2020 • Yuhui Xu, Lingxi Xie, Cihang Xie, Jieru Mei, Siyuan Qiao, Wei Shen, Hongkai Xiong, Alan Yuille
Batch normalization (BN) is a fundamental unit in modern deep networks, in which a linear transformation module was designed for improving BN's flexibility of fitting complex data distributions.
1 code implementation • ICLR 2021 • Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie
To prevent models from exclusively attending on a single cue in representation learning, we augment training data with images with conflicting shape and texture information (eg, an image of chimpanzee shape but with lemon texture) and, most importantly, provide the corresponding supervisions from shape and texture simultaneously.
Ranked #559 on
Image Classification
on ImageNet
1 code implementation • 25 Jun 2020 • Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le
SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82. 2% accuracy and 58. 6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9. 5% for accuracy and 11. 6% for robustness.
2 code implementations • ECCV 2020 • Chenglin Yang, Adam Kortylewski, Cihang Xie, Yinzhi Cao, Alan Yuille
PatchAttack induces misclassifications by superimposing small textured patches on the input image.
2 code implementations • CVPR 2020 • Yingwei Li, Xiaojie Jin, Jieru Mei, Xiaochen Lian, Linjie Yang, Cihang Xie, Qihang Yu, Yuyin Zhou, Song Bai, Alan Yuille
However, it has been rarely explored to embed the NL blocks in mobile neural networks, mainly due to the following challenges: 1) NL blocks generally have heavy computation cost which makes it difficult to be applied in applications where computational resources are limited, and 2) it is an open problem to discover an optimal configuration to embed NL blocks into mobile neural networks.
Ranked #60 on
Neural Architecture Search
on ImageNet
6 code implementations • CVPR 2020 • Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le
We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger.
Ranked #6 on
Domain Generalization
on VizWiz-Classification
2 code implementations • CVPR 2020 • Lifeng Huang, Chengying Gao, Yuyin Zhou, Cihang Xie, Alan Yuille, Changqing Zou, Ning Liu
In this paper, we study physical adversarial attacks on object detectors in the wild.
no code implementations • ICLR 2020 • Cihang Xie, Alan Yuille
This two-domain hypothesis may explain the issue of BN when training with a mixture of clean and adversarial images, as estimating normalization statistics of this mixture distribution is challenging.
1 code implementation • ECCV 2020 • Yingwei Li, Song Bai, Cihang Xie, Zhenyu Liao, Xiaohui Shen, Alan L. Yuille
We observe the property of regional homogeneity in adversarial perturbations and suggest that the defenses are less robust to regionally homogeneous perturbations.
2 code implementations • CVPR 2019 • Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, Kaiming He
This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks.
1 code implementation • 9 Dec 2018 • Yingwei Li, Song Bai, Yuyin Zhou, Cihang Xie, Zhishuai Zhang, Alan Yuille
The critical principle of ghost networks is to apply feature-level perturbations to an existing model to potentially create a huge set of diverse models.
1 code implementation • 31 Mar 2018 • Alexey Kurakin, Ian Goodfellow, Samy Bengio, Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu Pang, Jun Zhu, Xiaolin Hu, Cihang Xie, Jian-Yu Wang, Zhishuai Zhang, Zhou Ren, Alan Yuille, Sangxia Huang, Yao Zhao, Yuzhe Zhao, Zhonglin Han, Junjiajia Long, Yerkebulan Berdibekov, Takuya Akiba, Seiya Tokui, Motoki Abe
To accelerate research on adversarial examples and robustness of machine learning classifiers, Google Brain organized a NIPS 2017 competition that encouraged researchers to develop new methods to generate adversarial examples as well as to develop new ways to defend against them.
1 code implementation • CVPR 2019 • Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jian-Yu Wang, Zhou Ren, Alan Yuille
We hope that our proposed attack strategy can serve as a strong benchmark baseline for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in the future.
no code implementations • CVPR 2018 • Zhishuai Zhang, Siyuan Qiao, Cihang Xie, Wei Shen, Bo wang, Alan L. Yuille
Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module.
no code implementations • 13 Nov 2017 • Jianyu Wang, Zhishuai Zhang, Cihang Xie, Yuyin Zhou, Vittal Premachandran, Jun Zhu, Lingxi Xie, Alan Yuille
We use clustering algorithms to study the population activities of the features and extract a set of visual concepts which we show are visually tight and correspond to semantic parts of vehicles.
2 code implementations • ICLR 2018 • Cihang Xie, Jian-Yu Wang, Zhishuai Zhang, Zhou Ren, Alan Yuille
Convolutional neural networks have demonstrated high accuracy on various tasks in recent years.
no code implementations • CVPR 2018 • Zhishuai Zhang, Cihang Xie, Jian-Yu Wang, Lingxi Xie, Alan L. Yuille
The first layer extracts the evidence of local visual cues, and the second layer performs a voting mechanism by utilizing the spatial relationship between visual cues and semantic parts.
no code implementations • 25 Jul 2017 • Jianyu Wang, Cihang Xie, Zhishuai Zhang, Jun Zhu, Lingxi Xie, Alan Yuille
Our approach detects semantic parts by accumulating the confidence of local visual cues.
2 code implementations • ICCV 2017 • Cihang Xie, Jian-Yu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, Alan Yuille
Our observation is that both segmentation and detection are based on classifying multiple targets on an image (e. g., the basic target is a pixel or a receptive field in segmentation, and an object proposal in detection), which inspires us to optimize a loss function over a set of pixels/proposals for generating adversarial perturbations.
13 code implementations • 3 Oct 2016 • Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, Alexander Matyasko, Vahid Behzadan, Karen Hambardzumyan, Zhishuai Zhang, Yi-Lin Juang, Zhi Li, Ryan Sheatsley, Abhibhav Garg, Jonathan Uesato, Willi Gierke, Yinpeng Dong, David Berthelot, Paul Hendricks, Jonas Rauber, Rujun Long, Patrick McDaniel
An adversarial example library for constructing attacks, building defenses, and benchmarking both
1 code implementation • 21 Nov 2015 • Jianyu Wang, Zhishuai Zhang, Cihang Xie, Vittal Premachandran, Alan Yuille
We address the key question of how object part representations can be found from the internal states of CNNs that are trained for high-level tasks, such as object classification.