Search Results for author: Cihang Xie

Found 80 papers, 51 papers with code

Generative Image Layer Decomposition with Visual Effects

no code implementations26 Nov 2024 Jinrui Yang, Qing Liu, Yijun Li, Soo Ye Kim, Daniil Pakhomov, Mengwei Ren, Jianming Zhang, Zhe Lin, Cihang Xie, Yuyin Zhou

Layered representations, which allow for independent editing of image components, are essential for user-driven content creation, yet existing approaches often struggle to decompose image into plausible layers with accurately retained transparent visual effects such as shadows and reflections.

CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions

no code implementations25 Nov 2024 Yanqing Liu, Xianhang Li, Zeyu Wang, Bingchen Zhao, Cihang Xie

Previous works show that noisy, web-crawled image-text pairs may limit vision-language pretraining like CLIP and propose learning with synthetic captions as a promising alternative.

Cross-Modal Retrieval

M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation

1 code implementation15 Nov 2024 Sucheng Ren, Yaodong Yu, Nataniel Ruiz, Feng Wang, Alan Yuille, Cihang Xie

In this paper, we show that this scale-wise autoregressive framework can be effectively decoupled into \textit{intra-scale modeling}, which captures local spatial dependencies within each scale, and \textit{inter-scale modeling}, which models cross-scale relationships progressively from coarse-to-fine scales.

Image Generation Mamba

AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation

1 code implementation11 Oct 2024 Zijun Wang, Haoqin Tu, Jieru Mei, Bingchen Zhao, Yisen Wang, Cihang Xie

This paper studies the vulnerabilities of transformer-based Large Language Models (LLMs) to jailbreaking attacks, focusing specifically on the optimization-based Greedy Coordinate Gradient (GCG) strategy.

Safety Alignment

Causal Image Modeling for Efficient Visual Understanding

1 code implementation10 Oct 2024 Feng Wang, Timing Yang, Yaodong Yu, Sucheng Ren, Guoyizhe Wei, Angtian Wang, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie

In this work, we present a comprehensive analysis of causal image modeling and introduce the Adventurer series models where we treat images as sequences of patch tokens and employ uni-directional language models to learn visual representations.

Causal Inference

VHELM: A Holistic Evaluation of Vision Language Models

1 code implementation9 Oct 2024 Tony Lee, Haoqin Tu, Chi Heem Wong, Wenhao Zheng, Yiyang Zhou, Yifan Mai, Josselin Somerville Roberts, Michihiro Yasunaga, Huaxiu Yao, Cihang Xie, Percy Liang

Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as fairness, multilinguality, or toxicity.

Fairness

A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?

no code implementations23 Sep 2024 Yunfei Xie, Juncheng Wu, Haoqin Tu, Siwei Yang, Bingchen Zhao, Yongshuo Zong, Qiao Jin, Cihang Xie, Yuyin Zhou

Large language models (LLMs) have exhibited remarkable capabilities across various domains and tasks, pushing the boundaries of our knowledge in learning and cognition.

Hallucination MedQA +1

VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

no code implementations2 Sep 2024 Yuxuan Wang, Cihang Xie, Yang Liu, Zilong Zheng

Recent advancements in large-scale video-language models have shown significant potential for real-time planning and detailed interactions.

Video Understanding

From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation

no code implementations2 Sep 2024 Yunfei Xie, Cihang Xie, Alan Yuille, Jieru Mei

Local aggregation is employed to form superpixels, leveraging the inherent redundancy of the image data to produce segments closely aligned with specific parts of the object, guided by object-level supervision.

Computational Efficiency Image Segmentation +4

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

1 code implementation6 Aug 2024 Yunfei Xie, Ce Zhou, Lang Gao, Juncheng Wu, Xianhang Li, Hong-Yu Zhou, Sheng Liu, Lei Xing, James Zou, Cihang Xie, Yuyin Zhou

We then build a comprehensive knowledge base and prompt multimodal large language models to perform retrieval-augmented generation with the identified ROIs as guidance, resulting in multigranular texual descriptions.

 Ranked #1 on Medical Visual Question Answering on SLAKE-English (using extra training data)

Medical Visual Question Answering Visual Question Answering (VQA)

Autoregressive Pretraining with Mamba in Vision

1 code implementation11 Jun 2024 Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie

The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks.

Mamba

Scaling White-Box Transformers for Vision

no code implementations30 May 2024 Jinrui Yang, Xianhang Li, Druv Pai, Yuyin Zhou, Yi Ma, Yaodong Yu, Cihang Xie

CRATE, a white-box transformer architecture designed to learn compressed and sparse representations, offers an intriguing alternative to standard vision transformers (ViTs) due to its inherent mathematical interpretability.

Semantic Segmentation Unsupervised Object Segmentation

ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning

no code implementations24 May 2024 Sucheng Ren, Hongru Zhu, Chen Wei, Yijiang Li, Alan Yuille, Cihang Xie

This paper presents a new self-supervised video representation learning framework, ARVideo, which autoregressively predicts the next video token in a tailored sequence order.

Representation Learning

Mamba-R: Vision Mamba ALSO Needs Registers

1 code implementation23 May 2024 Feng Wang, Jiahao Wang, Sucheng Ren, Guoyizhe Wei, Jieru Mei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie

Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba.

Mamba Semantic Segmentation

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

no code implementations15 Apr 2024 Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, Cihang Xie

This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200, 000 edits.

Attribute

Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

no code implementations12 Apr 2024 Zichao Li, Cihang Xie, Ekin Dogus Cubuk

With regards to data, we demonstrate the significance of high-quality training data and show that a smaller dataset of high-quality data can outperform a larger dataset with lower quality.

Data Augmentation

3D-TransUNet for Brain Metastases Segmentation in the BraTS2023 Challenge

1 code implementation23 Mar 2024 Siwei Yang, Xianhang Li, Jieru Mei, Jieneng Chen, Cihang Xie, Yuyin Zhou

We identify that the Decoder-only 3D-TransUNet model should offer enhanced efficacy in the segmentation of brain metastases, as indicated by our 5-fold cross-validation on the training set.

Brain Tumor Segmentation Decoder +2

AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability

1 code implementation14 Feb 2024 Siwei Yang, Bingchen Zhao, Cihang Xie

This paper introduces AQA-Bench, a novel benchmark to assess the sequential reasoning capabilities of large language models (LLMs) in algorithmic contexts, such as depth-first search (DFS).

Revisiting Adversarial Training at Scale

1 code implementation CVPR 2024 Zeyu Wang, Xianhang Li, Hongru Zhu, Cihang Xie

For example, by training on DataComp-1B dataset, our AdvXL empowers a vanilla ViT-g model to substantially surpass the previous records of $l_{\infty}$-, $l_{2}$-, and $l_{1}$-robust accuracy by margins of 11. 4%, 14. 2% and 12. 9%, respectively.

SPFormer: Enhancing Vision Transformer with Superpixel Representation

no code implementations5 Jan 2024 Jieru Mei, Liang-Chieh Chen, Alan Yuille, Cihang Xie

In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation.

Superpixels

A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties

1 code implementation21 Dec 2023 Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu, Alan Yuille, Yuyin Zhou, Cihang Xie

Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common sense knowledge for supervising segmentation models.

Common Sense Reasoning Descriptive +1

Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning

no code implementations18 Dec 2023 Bingchen Zhao, Haoqin Tu, Chen Wei, Jieru Mei, Cihang Xie

This paper introduces an efficient strategy to transform Large Language Models (LLMs) into Multi-Modal Large Language Models (MLLMs).

Domain Adaptation

Audio-Visual LLM for Video Understanding

no code implementations11 Dec 2023 Fangxun Shu, Lei Zhang, Hao Jiang, Cihang Xie

This paper presents Audio-Visual LLM, a Multimodal Large Language Model that takes both visual and auditory inputs for holistic video understanding.

AudioCaps Language Modelling +3

Filter & Align: Leveraging Human Knowledge to Curate Image-Text Data

no code implementations11 Dec 2023 Lei Zhang, Fangxun Shu, Tianyang Liu, Sucheng Ren, Hao Jiang, Cihang Xie

However, the vast scale of these datasets inevitably introduces significant variability in data quality, which can adversely affect the model performance.

Image Captioning Image-text Retrieval +1

Rejuvenating image-GPT as Strong Visual Representation Learners

4 code implementations4 Dec 2023 Sucheng Ren, Zeyu Wang, Hongru Zhu, Junfei Xiao, Alan Yuille, Cihang Xie

This paper enhances image-GPT (iGPT), one of the pioneering works that introduce autoregressive pretraining to predict the next pixels for visual representation learning.

Representation Learning

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

1 code implementation27 Nov 2023 Haoqin Tu, Chenhang Cui, Zijun Wang, Yiyang Zhou, Bingchen Zhao, Junlin Han, Wangchunshu Zhou, Huaxiu Yao, Cihang Xie

Different from prior studies, we shift our focus from evaluating standard performance to introducing a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness.

Adversarial Robustness Visual Question Answering (VQA) +1

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

1 code implementation CVPR 2024 Yipeng Gao, Zeyu Wang, Wei-Shi Zheng, Cihang Xie, Yuyin Zhou

Contrastive learning has emerged as a promising paradigm for 3D open-world understanding, i. e., aligning point cloud representation to image and text embedding space individually.

 Ranked #1 on Zero-shot 3D classification on Objaverse LVIS (using extra training data)

Contrastive Learning Retrieval +3

FedConv: Enhancing Convolutional Neural Networks for Handling Data Heterogeneity in Federated Learning

1 code implementation6 Oct 2023 Peiran Xu, Zeyu Wang, Jieru Mei, Liangqiong Qu, Alan Yuille, Cihang Xie, Yuyin Zhou

Federated learning (FL) is an emerging paradigm in machine learning, where a shared model is collaboratively learned using data from multiple devices to mitigate the risk of data leakage.

Federated Learning

DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation

1 code implementation ICCV 2023 Zeyu Wang, Dingwen Li, Chenxu Luo, Cihang Xie, Xiaodong Yang

In this work, we propose to boost the representation learning of a multi-camera BEV based student detector by training it to imitate the features of a well-trained LiDAR based teacher detector.

3D Object Detection Autonomous Driving +4

Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics

1 code implementation13 Sep 2023 Haoqin Tu, Bingchen Zhao, Chen Wei, Cihang Xie

Multi-modal large language models (MLLMs) are trained based on large language models (LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual responses.

Ethics TruthfulQA

SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation

1 code implementation24 Jul 2023 YiQing Wang, Zihan Li, Jieru Mei, Zihao Wei, Li Liu, Chen Wang, Shengtian Sang, Alan Yuille, Cihang Xie, Yuyin Zhou

To address this limitation, we present Masked Multi-view with Swin Transformers (SwinMM), a novel multi-view pipeline for enabling accurate and data-efficient self-supervised medical image analysis.

Contrastive Learning Image Reconstruction +5

Consistency-guided Meta-Learning for Bootstrapping Semi-Supervised Medical Image Segmentation

1 code implementation21 Jul 2023 Qingyue Wei, Lequan Yu, Xianhang Li, Wei Shao, Cihang Xie, Lei Xing, Yuyin Zhou

Specifically, our approach first involves training a segmentation model on a small set of clean labeled images to generate initial labels for unlabeled data.

Image Segmentation Meta-Learning +4

CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \$10,000 Budget; An Extra \$4,000 Unlocks 81.8% Accuracy

2 code implementations27 Jun 2023 Xianhang Li, Zeyu Wang, Cihang Xie

The recent work CLIPA presents an inverse scaling law for CLIP training -- whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training.

An Inverse Scaling Law for CLIP Training

1 code implementation NeurIPS 2023 Xianhang Li, Zeyu Wang, Cihang Xie

However, its associated training cost is prohibitively high, imposing a significant barrier to its widespread exploration.

On the Adversarial Robustness of Camera-based 3D Object Detection

1 code implementation25 Jan 2023 Shaoyuan Xie, Zichao Li, Zeyu Wang, Cihang Xie

In recent years, camera-based 3D object detection has gained widespread attention for its ability to achieve high performance with low computational cost.

3D Object Detection Adversarial Attack +5

Benchmarking Robustness in Neural Radiance Fields

no code implementations10 Jan 2023 Chen Wang, Angtian Wang, Junbo Li, Alan Yuille, Cihang Xie

We find that NeRF-based models are significantly degraded in the presence of corruption, and are more sensitive to a different set of corruptions than image recognition models.

Benchmarking Camera Calibration +2

Unleashing the Power of Visual Prompting At the Pixel Level

1 code implementation20 Dec 2022 Junyang Wu, Xianhang Li, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie

This paper presents a simple and effective visual prompting method for adapting pre-trained models to downstream recognition tasks.

Diversity Visual Prompting

Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models

no code implementations1 Dec 2022 Zhuowan Li, Cihang Xie, Benjamin Van Durme, Alan Yuille

Despite the impressive advancements achieved through vision-and-language pretraining, it remains unclear whether this joint learning paradigm can help understand each individual modality.

Attribute Representation Learning

Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing

no code implementations29 Nov 2022 Nataniel Ruiz, Sarah Adel Bargal, Cihang Xie, Kate Saenko, Stan Sclaroff

One shortcoming of this is the fact that these deep neural networks cannot be easily evaluated for robustness issues with respect to specific scene variations.

counterfactual Object

Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated Learning

no code implementations27 Nov 2022 Yunchao Zhang, Zonglin Di, Kaiwen Zhou, Cihang Xie, Xin Eric Wang

However, since the local data is inaccessible to the server under federated learning, attackers may easily poison the training data of the local client to build a backdoor in the agent without notice.

Federated Learning Navigate +1

SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training

no code implementations ICCV 2023 Yuanze Lin, Chen Wei, Huiyu Wang, Alan Yuille, Cihang Xie

Coupling all these designs allows our method to enjoy both competitive performances on text-to-video retrieval and video question answering tasks, and much less pre-training costs by 1. 9X or more.

cross-modal alignment Question Answering +4

Bag of Tricks for FGSM Adversarial Training

no code implementations6 Sep 2022 Zichao Li, Li Liu, Zeyu Wang, Yuyin Zhou, Cihang Xie

Adversarial training (AT) with samples generated by Fast Gradient Sign Method (FGSM), also known as FGSM-AT, is a computationally simple method to train robust networks.

Masked Autoencoders Enable Efficient Knowledge Distillers

1 code implementation CVPR 2023 Yutong Bai, Zeyu Wang, Junfei Xiao, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie

For example, by distilling the knowledge from an MAE pre-trained ViT-L into a ViT-B, our method achieves 84. 0% ImageNet top-1 accuracy, outperforming the baseline of directly distilling a fine-tuned ViT-L by 1. 2%.

Knowledge Distillation

A Simple Data Mixing Prior for Improving Self-Supervised Learning

1 code implementation CVPR 2022 Sucheng Ren, Huiyu Wang, Zhengqi Gao, Shengfeng He, Alan Yuille, Yuyin Zhou, Cihang Xie

More notably, our SDMP is the first method that successfully leverages data mixing to improve (rather than hurt) the performance of Vision Transformers in the self-supervised setting.

Representation Learning Self-Supervised Learning

Can CNNs Be More Robust Than Transformers?

1 code implementation7 Jun 2022 Zeyu Wang, Yutong Bai, Yuyin Zhou, Cihang Xie

The recent success of Vision Transformers is shaking the long dominance of Convolutional Neural Networks (CNNs) in image recognition for a decade.

Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks

1 code implementation24 May 2022 Sizhe Chen, Zhehao Huang, Qinghua Tao, Yingwen Wu, Cihang Xie, Xiaolin Huang

The score-based query attacks (SQAs) pose practical threats to deep neural networks by crafting adversarial perturbations within dozens of queries, only using the model's output scores.

Adversarial Attack

One-Pixel Shortcut: on the Learning Preference of Deep Neural Networks

1 code implementation24 May 2022 Shutong Wu, Sizhe Chen, Cihang Xie, Xiaolin Huang

Based on OPS, we introduce an unlearnable dataset called CIFAR-10-S, which is indistinguishable from CIFAR-10 by humans but induces the trained model to extremely low accuracy.

Data Augmentation

In Defense of Image Pre-Training for Spatiotemporal Recognition

1 code implementation3 May 2022 Xianhang Li, Huiyu Wang, Chen Wei, Jieru Mei, Alan Yuille, Yuyin Zhou, Cihang Xie

Inspired by this observation, we hypothesize that the key to effectively leveraging image pre-training lies in the decomposition of learning spatial and temporal features, and revisiting image pre-training as the appearance prior to initializing 3D kernels.

STS Video Recognition

Fast AdvProp

1 code implementation ICLR 2022 Jieru Mei, Yucheng Han, Yutong Bai, Yixiao Zhang, Yingwei Li, Xianhang Li, Alan Yuille, Cihang Xie

Specifically, our modifications in Fast AdvProp are guided by the hypothesis that disentangled learning with adversarial examples is the key for performance improvements, while other training recipes (e. g., paired clean and adversarial training samples, multi-step adversarial attackers) could be largely simplified.

Data Augmentation object-detection +1

L2B: Learning to Bootstrap Robust Models for Combating Label Noise

1 code implementation CVPR 2024 Yuyin Zhou, Xianhang Li, Fengze Liu, Qingyue Wei, Xuxi Chen, Lequan Yu, Cihang Xie, Matthew P. Lungren, Lei Xing

Extensive experiments demonstrate that our method effectively mitigates the challenges of noisy labels, often necessitating few to no validation samples, and is well generalized to other tasks such as image segmentation.

Ranked #8 on Image Classification on Clothing1M (using clean data) (using extra training data)

Image Segmentation Learning with noisy labels +3

Image BERT Pre-training with Online Tokenizer

no code implementations ICLR 2022 Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong

The success of language Transformers is primarily attributed to the pretext task of masked language modeling (MLM), where texts are first tokenized into semantically meaningful pieces.

Image Classification Instance Segmentation +5

Simulated Adversarial Testing of Face Recognition Models

no code implementations CVPR 2022 Nataniel Ruiz, Adam Kortylewski, Weichao Qiu, Cihang Xie, Sarah Adel Bargal, Alan Yuille, Stan Sclaroff

In this work, we propose a framework for learning how to test machine learning algorithms using simulators in an adversarial manner in order to find weaknesses in the model before deploying it in critical scenarios.

BIG-bench Machine Learning Face Recognition

Robust and Accurate Object Detection via Adversarial Learning

1 code implementation CVPR 2021 Xiangning Chen, Cihang Xie, Mingxing Tan, Li Zhang, Cho-Jui Hsieh, Boqing Gong

Data augmentation has become a de facto component for training high-performance deep image classifiers, but its potential is under-explored for object detection.

AutoML Data Augmentation +3

Batch Normalization with Enhanced Linear Transformation

1 code implementation28 Nov 2020 Yuhui Xu, Lingxi Xie, Cihang Xie, Jieru Mei, Siyuan Qiao, Wei Shen, Hongkai Xiong, Alan Yuille

Batch normalization (BN) is a fundamental unit in modern deep networks, in which a linear transformation module was designed for improving BN's flexibility of fitting complex data distributions.

Shape-Texture Debiased Neural Network Training

1 code implementation ICLR 2021 Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie

To prevent models from exclusively attending on a single cue in representation learning, we augment training data with images with conflicting shape and texture information (eg, an image of chimpanzee shape but with lemon texture) and, most importantly, provide the corresponding supervisions from shape and texture simultaneously.

Adversarial Robustness Data Augmentation +2

Smooth Adversarial Training

1 code implementation25 Jun 2020 Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le

SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82. 2% accuracy and 58. 6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9. 5% for accuracy and 11. 6% for robustness.

Adversarial Defense Adversarial Robustness

Neural Architecture Search for Lightweight Non-Local Networks

2 code implementations CVPR 2020 Yingwei Li, Xiaojie Jin, Jieru Mei, Xiaochen Lian, Linjie Yang, Cihang Xie, Qihang Yu, Yuyin Zhou, Song Bai, Alan Yuille

However, it has been rarely explored to embed the NL blocks in mobile neural networks, mainly due to the following challenges: 1) NL blocks generally have heavy computation cost which makes it difficult to be applied in applications where computational resources are limited, and 2) it is an open problem to discover an optimal configuration to embed NL blocks into mobile neural networks.

Image Classification Neural Architecture Search

Adversarial Examples Improve Image Recognition

6 code implementations CVPR 2020 Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le

We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger.

Domain Generalization Image Classification

Intriguing properties of adversarial training at scale

no code implementations ICLR 2020 Cihang Xie, Alan Yuille

This two-domain hypothesis may explain the issue of BN when training with a mixture of clean and adversarial images, as estimating normalization statistics of this mixture distribution is challenging.

Adversarial Robustness

Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses

1 code implementation ECCV 2020 Yingwei Li, Song Bai, Cihang Xie, Zhenyu Liao, Xiaohui Shen, Alan L. Yuille

We observe the property of regional homogeneity in adversarial perturbations and suggest that the defenses are less robust to regionally homogeneous perturbations.

object-detection Object Detection +1

Learning Transferable Adversarial Examples via Ghost Networks

1 code implementation9 Dec 2018 Yingwei Li, Song Bai, Yuyin Zhou, Cihang Xie, Zhishuai Zhang, Alan Yuille

The critical principle of ghost networks is to apply feature-level perturbations to an existing model to potentially create a huge set of diverse models.

Adversarial Attack

Adversarial Attacks and Defences Competition

1 code implementation31 Mar 2018 Alexey Kurakin, Ian Goodfellow, Samy Bengio, Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu Pang, Jun Zhu, Xiaolin Hu, Cihang Xie, Jian-Yu Wang, Zhishuai Zhang, Zhou Ren, Alan Yuille, Sangxia Huang, Yao Zhao, Yuzhe Zhao, Zhonglin Han, Junjiajia Long, Yerkebulan Berdibekov, Takuya Akiba, Seiya Tokui, Motoki Abe

To accelerate research on adversarial examples and robustness of machine learning classifiers, Google Brain organized a NIPS 2017 competition that encouraged researchers to develop new methods to generate adversarial examples as well as to develop new ways to defend against them.

BIG-bench Machine Learning

Improving Transferability of Adversarial Examples with Input Diversity

2 code implementations CVPR 2019 Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jian-Yu Wang, Zhou Ren, Alan Yuille

We hope that our proposed attack strategy can serve as a strong benchmark baseline for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in the future.

Adversarial Attack Diversity +1

Single-Shot Object Detection with Enriched Semantics

no code implementations CVPR 2018 Zhishuai Zhang, Siyuan Qiao, Cihang Xie, Wei Shen, Bo wang, Alan L. Yuille

Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module.

Object object-detection +3

Visual Concepts and Compositional Voting

no code implementations13 Nov 2017 Jianyu Wang, Zhishuai Zhang, Cihang Xie, Yuyin Zhou, Vittal Premachandran, Jun Zhu, Lingxi Xie, Alan Yuille

We use clustering algorithms to study the population activities of the features and extract a set of visual concepts which we show are visually tight and correspond to semantic parts of vehicles.

Clustering Semantic Part Detection

DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion

no code implementations CVPR 2018 Zhishuai Zhang, Cihang Xie, Jian-Yu Wang, Lingxi Xie, Alan L. Yuille

The first layer extracts the evidence of local visual cues, and the second layer performs a voting mechanism by utilizing the spatial relationship between visual cues and semantic parts.

Semantic Part Detection

Adversarial Examples for Semantic Segmentation and Object Detection

2 code implementations ICCV 2017 Cihang Xie, Jian-Yu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, Alan Yuille

Our observation is that both segmentation and detection are based on classifying multiple targets on an image (e. g., the basic target is a pixel or a receptive field in segmentation, and an object proposal in detection), which inspires us to optimize a loss function over a set of pixels/proposals for generating adversarial perturbations.

Adversarial Attack Object +4

Unsupervised learning of object semantic parts from internal states of CNNs by population encoding

1 code implementation21 Nov 2015 Jianyu Wang, Zhishuai Zhang, Cihang Xie, Vittal Premachandran, Alan Yuille

We address the key question of how object part representations can be found from the internal states of CNNs that are trained for high-level tasks, such as object classification.

Clustering Keypoint Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.