no code implementations • 26 Nov 2024 • Jinrui Yang, Qing Liu, Yijun Li, Soo Ye Kim, Daniil Pakhomov, Mengwei Ren, Jianming Zhang, Zhe Lin, Cihang Xie, Yuyin Zhou
Layered representations, which allow for independent editing of image components, are essential for user-driven content creation, yet existing approaches often struggle to decompose image into plausible layers with accurately retained transparent visual effects such as shadows and reflections.
no code implementations • 25 Nov 2024 • Yanqing Liu, Xianhang Li, Zeyu Wang, Bingchen Zhao, Cihang Xie
Previous works show that noisy, web-crawled image-text pairs may limit vision-language pretraining like CLIP and propose learning with synthetic captions as a promising alternative.
1 code implementation • 15 Nov 2024 • Sucheng Ren, Yaodong Yu, Nataniel Ruiz, Feng Wang, Alan Yuille, Cihang Xie
In this paper, we show that this scale-wise autoregressive framework can be effectively decoupled into \textit{intra-scale modeling}, which captures local spatial dependencies within each scale, and \textit{inter-scale modeling}, which models cross-scale relationships progressively from coarse-to-fine scales.
1 code implementation • 11 Oct 2024 • Zijun Wang, Haoqin Tu, Jieru Mei, Bingchen Zhao, Yisen Wang, Cihang Xie
This paper studies the vulnerabilities of transformer-based Large Language Models (LLMs) to jailbreaking attacks, focusing specifically on the optimization-based Greedy Coordinate Gradient (GCG) strategy.
1 code implementation • 10 Oct 2024 • Feng Wang, Timing Yang, Yaodong Yu, Sucheng Ren, Guoyizhe Wei, Angtian Wang, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie
In this work, we present a comprehensive analysis of causal image modeling and introduce the Adventurer series models where we treat images as sequences of patch tokens and employ uni-directional language models to learn visual representations.
1 code implementation • 9 Oct 2024 • Tony Lee, Haoqin Tu, Chi Heem Wong, Wenhao Zheng, Yiyang Zhou, Yifan Mai, Josselin Somerville Roberts, Michihiro Yasunaga, Huaxiu Yao, Cihang Xie, Percy Liang
Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as fairness, multilinguality, or toxicity.
no code implementations • 23 Sep 2024 • Yunfei Xie, Juncheng Wu, Haoqin Tu, Siwei Yang, Bingchen Zhao, Yongshuo Zong, Qiao Jin, Cihang Xie, Yuyin Zhou
Large language models (LLMs) have exhibited remarkable capabilities across various domains and tasks, pushing the boundaries of our knowledge in learning and cognition.
no code implementations • 2 Sep 2024 • Yuxuan Wang, Cihang Xie, Yang Liu, Zilong Zheng
Recent advancements in large-scale video-language models have shown significant potential for real-time planning and detailed interactions.
no code implementations • 2 Sep 2024 • Yunfei Xie, Cihang Xie, Alan Yuille, Jieru Mei
Local aggregation is employed to form superpixels, leveraging the inherent redundancy of the image data to produce segments closely aligned with specific parts of the object, guided by object-level supervision.
1 code implementation • 6 Aug 2024 • Yunfei Xie, Ce Zhou, Lang Gao, Juncheng Wu, Xianhang Li, Hong-Yu Zhou, Sheng Liu, Lei Xing, James Zou, Cihang Xie, Yuyin Zhou
We then build a comprehensive knowledge base and prompt multimodal large language models to perform retrieval-augmented generation with the identified ROIs as guidance, resulting in multigranular texual descriptions.
Ranked #1 on Medical Visual Question Answering on SLAKE-English (using extra training data)
Medical Visual Question Answering Visual Question Answering (VQA)
no code implementations • 24 Jun 2024 • Yuxuan Wang, Yueqian Wang, Dongyan Zhao, Cihang Xie, Zilong Zheng
Recent advancements in Multimodal Large Language Models (MLLMs) have extended their capabilities to video understanding.
no code implementations • 12 Jun 2024 • Xianhang Li, Haoqin Tu, Mude Hui, Zeyu Wang, Bingchen Zhao, Junfei Xiao, Sucheng Ren, Jieru Mei, Qing Liu, Huangjie Zheng, Yuyin Zhou, Cihang Xie
For discriminative models like CLIP, we observe enhanced zero-shot performance in cross-modal retrieval tasks.
Ranked #119 on Visual Question Answering on MM-Vet
1 code implementation • 11 Jun 2024 • Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie
The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks.
no code implementations • 30 May 2024 • Jinrui Yang, Xianhang Li, Druv Pai, Yuyin Zhou, Yi Ma, Yaodong Yu, Cihang Xie
CRATE, a white-box transformer architecture designed to learn compressed and sparse representations, offers an intriguing alternative to standard vision transformers (ViTs) due to its inherent mathematical interpretability.
no code implementations • 24 May 2024 • Sucheng Ren, Hongru Zhu, Chen Wei, Yijiang Li, Alan Yuille, Cihang Xie
This paper presents a new self-supervised video representation learning framework, ARVideo, which autoregressively predicts the next video token in a tailored sequence order.
1 code implementation • 23 May 2024 • Feng Wang, Jiahao Wang, Sucheng Ren, Guoyizhe Wei, Jieru Mei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie
Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba.
no code implementations • 15 Apr 2024 • Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, Cihang Xie
This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200, 000 edits.
no code implementations • 12 Apr 2024 • Zichao Li, Cihang Xie, Ekin Dogus Cubuk
With regards to data, we demonstrate the significance of high-quality training data and show that a smaller dataset of high-quality data can outperform a larger dataset with lower quality.
1 code implementation • 23 Mar 2024 • Siwei Yang, Xianhang Li, Jieru Mei, Jieneng Chen, Cihang Xie, Yuyin Zhou
We identify that the Decoder-only 3D-TransUNet model should offer enhanced efficacy in the segmentation of brain metastases, as indicated by our 5-fold cross-validation on the training set.
1 code implementation • 14 Feb 2024 • Siwei Yang, Bingchen Zhao, Cihang Xie
This paper introduces AQA-Bench, a novel benchmark to assess the sequential reasoning capabilities of large language models (LLMs) in algorithmic contexts, such as depth-first search (DFS).
1 code implementation • CVPR 2024 • Zeyu Wang, Xianhang Li, Hongru Zhu, Cihang Xie
For example, by training on DataComp-1B dataset, our AdvXL empowers a vanilla ViT-g model to substantially surpass the previous records of $l_{\infty}$-, $l_{2}$-, and $l_{1}$-robust accuracy by margins of 11. 4%, 14. 2% and 12. 9%, respectively.
no code implementations • 5 Jan 2024 • Jieru Mei, Liang-Chieh Chen, Alan Yuille, Cihang Xie
In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation.
1 code implementation • 21 Dec 2023 • Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu, Alan Yuille, Yuyin Zhou, Cihang Xie
Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common sense knowledge for supervising segmentation models.
no code implementations • 18 Dec 2023 • Bingchen Zhao, Haoqin Tu, Chen Wei, Jieru Mei, Cihang Xie
This paper introduces an efficient strategy to transform Large Language Models (LLMs) into Multi-Modal Large Language Models (MLLMs).
no code implementations • 11 Dec 2023 • Fangxun Shu, Lei Zhang, Hao Jiang, Cihang Xie
This paper presents Audio-Visual LLM, a Multimodal Large Language Model that takes both visual and auditory inputs for holistic video understanding.
no code implementations • 11 Dec 2023 • Lei Zhang, Fangxun Shu, Tianyang Liu, Sucheng Ren, Hao Jiang, Cihang Xie
However, the vast scale of these datasets inevitably introduces significant variability in data quality, which can adversely affect the model performance.
4 code implementations • 4 Dec 2023 • Sucheng Ren, Zeyu Wang, Hongru Zhu, Junfei Xiao, Alan Yuille, Cihang Xie
This paper enhances image-GPT (iGPT), one of the pioneering works that introduce autoregressive pretraining to predict the next pixels for visual representation learning.
1 code implementation • 27 Nov 2023 • Haoqin Tu, Chenhang Cui, Zijun Wang, Yiyang Zhou, Bingchen Zhao, Junlin Han, Wangchunshu Zhou, Huaxiu Yao, Cihang Xie
Different from prior studies, we shift our focus from evaluating standard performance to introducing a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness.
1 code implementation • CVPR 2024 • Yipeng Gao, Zeyu Wang, Wei-Shi Zheng, Cihang Xie, Yuyin Zhou
Contrastive learning has emerged as a promising paradigm for 3D open-world understanding, i. e., aligning point cloud representation to image and text embedding space individually.
Ranked #1 on Zero-shot 3D classification on Objaverse LVIS (using extra training data)
1 code implementation • 6 Oct 2023 • Peiran Xu, Zeyu Wang, Jieru Mei, Liangqiong Qu, Alan Yuille, Cihang Xie, Yuyin Zhou
Federated learning (FL) is an emerging paradigm in machine learning, where a shared model is collaboratively learned using data from multiple devices to mitigate the risk of data leakage.
1 code implementation • ICCV 2023 • Zeyu Wang, Dingwen Li, Chenxu Luo, Cihang Xie, Xiaodong Yang
In this work, we propose to boost the representation learning of a multi-camera BEV based student detector by training it to imitate the features of a well-trained LiDAR based teacher detector.
1 code implementation • 13 Sep 2023 • Haoqin Tu, Bingchen Zhao, Chen Wei, Cihang Xie
Multi-modal large language models (MLLMs) are trained based on large language models (LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual responses.
1 code implementation • 24 Jul 2023 • YiQing Wang, Zihan Li, Jieru Mei, Zihao Wei, Li Liu, Chen Wang, Shengtian Sang, Alan Yuille, Cihang Xie, Yuyin Zhou
To address this limitation, we present Masked Multi-view with Swin Transformers (SwinMM), a novel multi-view pipeline for enabling accurate and data-efficient self-supervised medical image analysis.
1 code implementation • 21 Jul 2023 • Qingyue Wei, Lequan Yu, Xianhang Li, Wei Shao, Cihang Xie, Lei Xing, Yuyin Zhou
Specifically, our approach first involves training a segmentation model on a small set of clean labeled images to generate initial labels for unlabeled data.
2 code implementations • 27 Jun 2023 • Xianhang Li, Zeyu Wang, Cihang Xie
The recent work CLIPA presents an inverse scaling law for CLIP training -- whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training.
1 code implementation • NeurIPS 2023 • Xianhang Li, Zeyu Wang, Cihang Xie
However, its associated training cost is prohibitively high, imposing a significant barrier to its widespread exploration.
no code implementations • ICCV 2023 • Chen Wei, Karttikeya Mangalam, Po-Yao Huang, Yanghao Li, Haoqi Fan, Hu Xu, Huiyu Wang, Cihang Xie, Alan Yuille, Christoph Feichtenhofer
There has been a longstanding belief that generation can facilitate a true understanding of visual data.
1 code implementation • 25 Jan 2023 • Shaoyuan Xie, Zichao Li, Zeyu Wang, Cihang Xie
In recent years, camera-based 3D object detection has gained widespread attention for its ability to achieve high performance with low computational cost.
no code implementations • 10 Jan 2023 • Chen Wang, Angtian Wang, Junbo Li, Alan Yuille, Cihang Xie
We find that NeRF-based models are significantly degraded in the presence of corruption, and are more sensitive to a different set of corruptions than image recognition models.
1 code implementation • 20 Dec 2022 • Junyang Wu, Xianhang Li, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie
This paper presents a simple and effective visual prompting method for adapting pre-trained models to downstream recognition tasks.
no code implementations • 1 Dec 2022 • Zhuowan Li, Cihang Xie, Benjamin Van Durme, Alan Yuille
Despite the impressive advancements achieved through vision-and-language pretraining, it remains unclear whether this joint learning paradigm can help understand each individual modality.
no code implementations • 29 Nov 2022 • Nataniel Ruiz, Sarah Adel Bargal, Cihang Xie, Kate Saenko, Stan Sclaroff
One shortcoming of this is the fact that these deep neural networks cannot be easily evaluated for robustness issues with respect to specific scene variations.
no code implementations • 27 Nov 2022 • Yunchao Zhang, Zonglin Di, Kaiwen Zhou, Cihang Xie, Xin Eric Wang
However, since the local data is inaccessible to the server under federated learning, attackers may easily poison the training data of the local client to build a backdoor in the agent without notice.
no code implementations • ICCV 2023 • Yuanze Lin, Chen Wei, Huiyu Wang, Alan Yuille, Cihang Xie
Coupling all these designs allows our method to enjoy both competitive performances on text-to-video retrieval and video question answering tasks, and much less pre-training costs by 1. 9X or more.
no code implementations • 6 Sep 2022 • Zichao Li, Li Liu, Zeyu Wang, Yuyin Zhou, Cihang Xie
Adversarial training (AT) with samples generated by Fast Gradient Sign Method (FGSM), also known as FGSM-AT, is a computationally simple method to train robust networks.
1 code implementation • CVPR 2023 • Yutong Bai, Zeyu Wang, Junfei Xiao, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie
For example, by distilling the knowledge from an MAE pre-trained ViT-L into a ViT-B, our method achieves 84. 0% ImageNet top-1 accuracy, outperforming the baseline of directly distilling a fine-tuned ViT-L by 1. 2%.
1 code implementation • CVPR 2022 • Sucheng Ren, Huiyu Wang, Zhengqi Gao, Shengfeng He, Alan Yuille, Yuyin Zhou, Cihang Xie
More notably, our SDMP is the first method that successfully leverages data mixing to improve (rather than hurt) the performance of Vision Transformers in the self-supervised setting.
1 code implementation • 7 Jun 2022 • Zeyu Wang, Yutong Bai, Yuyin Zhou, Cihang Xie
The recent success of Vision Transformers is shaking the long dominance of Convolutional Neural Networks (CNNs) in image recognition for a decade.
1 code implementation • 24 May 2022 • Sizhe Chen, Zhehao Huang, Qinghua Tao, Yingwen Wu, Cihang Xie, Xiaolin Huang
The score-based query attacks (SQAs) pose practical threats to deep neural networks by crafting adversarial perturbations within dozens of queries, only using the model's output scores.
1 code implementation • 24 May 2022 • Shutong Wu, Sizhe Chen, Cihang Xie, Xiaolin Huang
Based on OPS, we introduce an unlearnable dataset called CIFAR-10-S, which is indistinguishable from CIFAR-10 by humans but induces the trained model to extremely low accuracy.
1 code implementation • 3 May 2022 • Xianhang Li, Huiyu Wang, Chen Wei, Jieru Mei, Alan Yuille, Yuyin Zhou, Cihang Xie
Inspired by this observation, we hypothesize that the key to effectively leveraging image pre-training lies in the decomposition of learning spatial and temporal features, and revisiting image pre-training as the appearance prior to initializing 3D kernels.
1 code implementation • ICLR 2022 • Jieru Mei, Yucheng Han, Yutong Bai, Yixiao Zhang, Yingwei Li, Xianhang Li, Alan Yuille, Cihang Xie
Specifically, our modifications in Fast AdvProp are guided by the hypothesis that disentangled learning with adversarial examples is the key for performance improvements, while other training recipes (e. g., paired clean and adversarial training samples, multi-step adversarial attackers) could be largely simplified.
1 code implementation • CVPR 2024 • Yuyin Zhou, Xianhang Li, Fengze Liu, Qingyue Wei, Xuxi Chen, Lequan Yu, Cihang Xie, Matthew P. Lungren, Lei Xing
Extensive experiments demonstrate that our method effectively mitigates the challenges of noisy labels, often necessitating few to no validation samples, and is well generalized to other tasks such as image segmentation.
Ranked #8 on Image Classification on Clothing1M (using clean data) (using extra training data)
1 code implementation • 15 Nov 2021 • Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong
We present a self-supervised framework iBOT that can perform masked prediction with an online tokenizer.
Ranked #3 on Unsupervised Image Classification on ImageNet
1 code implementation • NeurIPS 2021 • Yutong Bai, Jieru Mei, Alan Yuille, Cihang Xie
Transformer emerges as a powerful tool for visual recognition.
Ranked #1 on Adversarial Robustness on Stylized ImageNet
1 code implementation • ICCV 2021 • Zhuowan Li, Elias Stengel-Eskin, Yixiao Zhang, Cihang Xie, Quan Tran, Benjamin Van Durme, Alan Yuille
Our experiments show CCO substantially boosts the performance of neural symbolic methods on real images.
no code implementations • ICLR 2022 • Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong
The success of language Transformers is primarily attributed to the pretext task of masked language modeling (MLM), where texts are first tokenized into semantically meaningful pieces.
no code implementations • CVPR 2022 • Nataniel Ruiz, Adam Kortylewski, Weichao Qiu, Cihang Xie, Sarah Adel Bargal, Alan Yuille, Stan Sclaroff
In this work, we propose a framework for learning how to test machine learning algorithms using simulators in an adversarial manner in order to find weaknesses in the model before deploying it in critical scenarios.
1 code implementation • CVPR 2021 • Xiangning Chen, Cihang Xie, Mingxing Tan, Li Zhang, Cho-Jui Hsieh, Boqing Gong
Data augmentation has become a de facto component for training high-performance deep image classifiers, but its potential is under-explored for object detection.
Ranked #17 on Object Detection on COCO-O
1 code implementation • 28 Nov 2020 • Yuhui Xu, Lingxi Xie, Cihang Xie, Jieru Mei, Siyuan Qiao, Wei Shen, Hongkai Xiong, Alan Yuille
Batch normalization (BN) is a fundamental unit in modern deep networks, in which a linear transformation module was designed for improving BN's flexibility of fitting complex data distributions.
1 code implementation • ICLR 2021 • Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie
To prevent models from exclusively attending on a single cue in representation learning, we augment training data with images with conflicting shape and texture information (eg, an image of chimpanzee shape but with lemon texture) and, most importantly, provide the corresponding supervisions from shape and texture simultaneously.
Ranked #646 on Image Classification on ImageNet
1 code implementation • 25 Jun 2020 • Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le
SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82. 2% accuracy and 58. 6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9. 5% for accuracy and 11. 6% for robustness.
2 code implementations • ECCV 2020 • Chenglin Yang, Adam Kortylewski, Cihang Xie, Yinzhi Cao, Alan Yuille
PatchAttack induces misclassifications by superimposing small textured patches on the input image.
2 code implementations • CVPR 2020 • Yingwei Li, Xiaojie Jin, Jieru Mei, Xiaochen Lian, Linjie Yang, Cihang Xie, Qihang Yu, Yuyin Zhou, Song Bai, Alan Yuille
However, it has been rarely explored to embed the NL blocks in mobile neural networks, mainly due to the following challenges: 1) NL blocks generally have heavy computation cost which makes it difficult to be applied in applications where computational resources are limited, and 2) it is an open problem to discover an optimal configuration to embed NL blocks into mobile neural networks.
Ranked #60 on Neural Architecture Search on ImageNet
6 code implementations • CVPR 2020 • Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le
We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger.
Ranked #4 on Domain Generalization on VizWiz-Classification
2 code implementations • CVPR 2020 • Lifeng Huang, Chengying Gao, Yuyin Zhou, Cihang Xie, Alan Yuille, Changqing Zou, Ning Liu
In this paper, we study physical adversarial attacks on object detectors in the wild.
no code implementations • ICLR 2020 • Cihang Xie, Alan Yuille
This two-domain hypothesis may explain the issue of BN when training with a mixture of clean and adversarial images, as estimating normalization statistics of this mixture distribution is challenging.
1 code implementation • ECCV 2020 • Yingwei Li, Song Bai, Cihang Xie, Zhenyu Liao, Xiaohui Shen, Alan L. Yuille
We observe the property of regional homogeneity in adversarial perturbations and suggest that the defenses are less robust to regionally homogeneous perturbations.
1 code implementation • 9 Dec 2018 • Yingwei Li, Song Bai, Yuyin Zhou, Cihang Xie, Zhishuai Zhang, Alan Yuille
The critical principle of ghost networks is to apply feature-level perturbations to an existing model to potentially create a huge set of diverse models.
2 code implementations • CVPR 2019 • Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, Kaiming He
This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks.
Ranked #1 on Adversarial Defense on CAAD 2018
1 code implementation • 31 Mar 2018 • Alexey Kurakin, Ian Goodfellow, Samy Bengio, Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu Pang, Jun Zhu, Xiaolin Hu, Cihang Xie, Jian-Yu Wang, Zhishuai Zhang, Zhou Ren, Alan Yuille, Sangxia Huang, Yao Zhao, Yuzhe Zhao, Zhonglin Han, Junjiajia Long, Yerkebulan Berdibekov, Takuya Akiba, Seiya Tokui, Motoki Abe
To accelerate research on adversarial examples and robustness of machine learning classifiers, Google Brain organized a NIPS 2017 competition that encouraged researchers to develop new methods to generate adversarial examples as well as to develop new ways to defend against them.
2 code implementations • CVPR 2019 • Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jian-Yu Wang, Zhou Ren, Alan Yuille
We hope that our proposed attack strategy can serve as a strong benchmark baseline for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in the future.
no code implementations • CVPR 2018 • Zhishuai Zhang, Siyuan Qiao, Cihang Xie, Wei Shen, Bo wang, Alan L. Yuille
Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module.
no code implementations • 13 Nov 2017 • Jianyu Wang, Zhishuai Zhang, Cihang Xie, Yuyin Zhou, Vittal Premachandran, Jun Zhu, Lingxi Xie, Alan Yuille
We use clustering algorithms to study the population activities of the features and extract a set of visual concepts which we show are visually tight and correspond to semantic parts of vehicles.
2 code implementations • ICLR 2018 • Cihang Xie, Jian-Yu Wang, Zhishuai Zhang, Zhou Ren, Alan Yuille
Convolutional neural networks have demonstrated high accuracy on various tasks in recent years.
no code implementations • CVPR 2018 • Zhishuai Zhang, Cihang Xie, Jian-Yu Wang, Lingxi Xie, Alan L. Yuille
The first layer extracts the evidence of local visual cues, and the second layer performs a voting mechanism by utilizing the spatial relationship between visual cues and semantic parts.
no code implementations • 25 Jul 2017 • Jianyu Wang, Cihang Xie, Zhishuai Zhang, Jun Zhu, Lingxi Xie, Alan Yuille
Our approach detects semantic parts by accumulating the confidence of local visual cues.
2 code implementations • ICCV 2017 • Cihang Xie, Jian-Yu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, Alan Yuille
Our observation is that both segmentation and detection are based on classifying multiple targets on an image (e. g., the basic target is a pixel or a receptive field in segmentation, and an object proposal in detection), which inspires us to optimize a loss function over a set of pixels/proposals for generating adversarial perturbations.
13 code implementations • 3 Oct 2016 • Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, Alexander Matyasko, Vahid Behzadan, Karen Hambardzumyan, Zhishuai Zhang, Yi-Lin Juang, Zhi Li, Ryan Sheatsley, Abhibhav Garg, Jonathan Uesato, Willi Gierke, Yinpeng Dong, David Berthelot, Paul Hendricks, Jonas Rauber, Rujun Long, Patrick McDaniel
An adversarial example library for constructing attacks, building defenses, and benchmarking both
1 code implementation • 21 Nov 2015 • Jianyu Wang, Zhishuai Zhang, Cihang Xie, Vittal Premachandran, Alan Yuille
We address the key question of how object part representations can be found from the internal states of CNNs that are trained for high-level tasks, such as object classification.