Search Results for author: Xuwang Yin

Found 18 papers, 11 papers with code

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

1 code implementation6 Feb 2024 Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendrycks

Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods.

Learning Globally Optimized Language Structure via Adversarial Training

no code implementations12 Nov 2023 Xuwang Yin

Key contributions include: (1) an adversarial attack strategy tailored to text to generate negative samples, circumventing MCMC limitations; (2) an adversarial training algorithm for EBMs leveraging these attacks; (3) empirical validation of performance improvements on a sequence generation task.

Adversarial Attack Text Generation

Representation Engineering: A Top-Down Approach to AI Transparency

1 code implementation2 Oct 2023 Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks

In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience.

Question Answering

Generative Robust Classification

no code implementations14 Dec 2022 Xuwang Yin

Training adversarially robust discriminative (i. e., softmax) classifier has been the dominant approach to robust classification.

Classification Data Augmentation +1

End-to-End Signal Classification in Signed Cumulative Distribution Transform Space

1 code implementation30 Apr 2022 Abu Hasnat Mohammad Rubaiyat, Shiying Li, Xuwang Yin, Mohammad Shifat E Rabbi, Yan Zhuang, Gustavo K. Rohde

This paper presents a new end-to-end signal classification method using the signed cumulative distribution transform (SCDT).

Classification

Invariance encoding in sliced-Wasserstein space for image classification with limited training data

2 code implementations9 Jan 2022 Mohammad Shifat E Rabbi, Yan Zhuang, Shiying Li, Abu Hasnat Mohammad Rubaiyat, Xuwang Yin, Gustavo K. Rohde

However, they are known to underperform when training data are limited and thus require data augmentation strategies that render the method computationally expensive and not always effective.

Data Augmentation Image Classification

Learning Energy-Based Models With Adversarial Training

1 code implementation11 Dec 2020 Xuwang Yin, Shiying Li, Gustavo K. Rohde

We study a new approach to learning energy-based models (EBMs) based on adversarial training (AT).

Adversarial Defense Adversarial Robustness +3

GAT: Generative Adversarial Training for Adversarial Example Detection and Classification

no code implementations ICLR 2020 Xuwang Yin, Soheil Kolouri, Gustavo K. Rohde

The vulnerabilities of deep neural networks against adversarial examples have become a significant concern for deploying these models in sensitive domains.

General Classification Robust classification +1

Testing Robustness Against Unforeseen Adversaries

3 code implementations21 Aug 2019 Max Kaufmann, Daniel Kang, Yi Sun, Steven Basart, Xuwang Yin, Mantas Mazeika, Akul Arora, Adam Dziedzic, Franziska Boenisch, Tom Brown, Jacob Steinhardt, Dan Hendrycks

To narrow in on this discrepancy between research and reality we introduce ImageNet-UA, a framework for evaluating model robustness against a range of unforeseen adversaries, including eighteen new non-L_p attacks.

Adversarial Defense Adversarial Robustness

Neural Networks, Hypersurfaces, and Radon Transforms

1 code implementation4 Jul 2019 Soheil Kolouri, Xuwang Yin, Gustavo K. Rohde

Connections between integration along hypersufaces, Radon transforms, and neural networks are exploited to highlight an integral geometric mathematical interpretation of neural networks.

Cell image classification: a comparative overview

1 code implementation7 Jun 2019 Mohammad Shifat-E-Rabbi, Xuwang Yin, Cailey Elizabeth Fitzgerald, Gustavo K. Rohde

Cell image classification methods are currently being used in numerous applications in cell biology and medicine.

Classification Image Classification

GAT: Generative Adversarial Training for Adversarial Example Detection and Robust Classification

1 code implementation27 May 2019 Xuwang Yin, Soheil Kolouri, Gustavo K. Rohde

The vulnerabilities of deep neural networks against adversarial examples have become a significant concern for deploying these models in sensitive domains.

Classification General Classification +2

Chat-crowd: A Dialog-based Platform for Visual Layout Composition

no code implementations NAACL 2019 Paola Cascante-Bonilla, Xuwang Yin, Vicente Ordonez, Song Feng

In this paper we introduce Chat-crowd, an interactive environment for visual layout composition via conversational interactions.

Goal-Oriented Dialog

Privacy Partitioning: Protecting User Data During the Deep Learning Inference Phase

no code implementations7 Dec 2018 Jianfeng Chi, Emmanuel Owusu, Xuwang Yin, Tong Yu, William Chan, Patrick Tague, Yuan Tian

We present a practical method for protecting data during the inference phase of deep learning based on bipartite topology threat modeling and an interactive adversarial deep network construction.

BIG-bench Machine Learning Face Identification +1

Robust Text Detection in Natural Scene Images

no code implementations11 Jan 2013 Xu-Cheng Yin, Xuwang Yin, Kai-Zhu Huang, Hong-Wei Hao

Text detection in natural scene images is an important prerequisite for many content-based image analysis tasks.

Clustering Metric Learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.