Search Results for author: Yaodong Yu

Found 40 papers, 24 papers with code

Trading Inference-Time Compute for Adversarial Robustness

no code implementations31 Jan 2025 Wojciech Zaremba, Evgenia Nitishinskaya, Boaz Barak, Stephanie Lin, Sam Toyer, Yaodong Yu, Rachel Dias, Eric Wallace, Kai Xiao, Johannes Heidecke, Amelia Glaese

We conduct experiments on the impact of increasing inference-time compute in reasoning models (specifically OpenAI o1-preview and o1-mini) on their robustness to adversarial attacks.

Adversarial Robustness

Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction

1 code implementation23 Dec 2024 Ziyang Wu, Tianjiao Ding, Yifu Lu, Druv Pai, Jingyuan Zhang, Weida Wang, Yaodong Yu, Yi Ma, Benjamin D. Haeffele

Specifically, we derive a novel variational form of the MCR$^2$ objective and show that the architecture that results from unrolled gradient descent of this variational objective leads to a new attention module called Token Statistics Self-Attention (TSSA).

M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation

1 code implementation15 Nov 2024 Sucheng Ren, Yaodong Yu, Nataniel Ruiz, Feng Wang, Alan Yuille, Cihang Xie

In this paper, we show that this scale-wise autoregressive framework can be effectively decoupled into \textit{intra-scale modeling}, which captures local spatial dependencies within each scale, and \textit{inter-scale modeling}, which models cross-scale relationships progressively from coarse-to-fine scales.

Image Generation Mamba

Causal Image Modeling for Efficient Visual Understanding

1 code implementation10 Oct 2024 Feng Wang, Timing Yang, Yaodong Yu, Sucheng Ren, Guoyizhe Wei, Angtian Wang, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie

In this work, we present a comprehensive analysis of causal image modeling and introduce the Adventurer series models where we treat images as sequences of patch tokens and employ uni-directional language models to learn visual representations.

Causal Inference

Accuracy on the wrong line: On the pitfalls of noisy data for out-of-distribution generalisation

no code implementations27 Jun 2024 Amartya Sanyal, Yaxi Hu, Yaodong Yu, Yian Ma, Yixin Wang, Bernhard Schölkopf

"Accuracy-on-the-line" is a widely observed phenomenon in machine learning, where a model's accuracy on in-distribution (ID) and out-of-distribution (OOD) data is positively correlated across different hyperparameters and data configurations.

A Global Geometric Analysis of Maximal Coding Rate Reduction

no code implementations4 Jun 2024 Peng Wang, Huikang Liu, Druv Pai, Yaodong Yu, Zhihui Zhu, Qing Qu, Yi Ma

The maximal coding rate reduction (MCR$^2$) objective for learning structured and compact deep representations is drawing increasing attention, especially after its recent usage in the derivation of fully explainable and highly effective deep network architectures.

Scaling White-Box Transformers for Vision

no code implementations30 May 2024 Jinrui Yang, Xianhang Li, Druv Pai, Yuyin Zhou, Yi Ma, Yaodong Yu, Cihang Xie

CRATE, a white-box transformer architecture designed to learn compressed and sparse representations, offers an intriguing alternative to standard vision transformers (ViTs) due to its inherent mathematical interpretability.

Semantic Segmentation Unsupervised Object Segmentation

Masked Completion via Structured Diffusion with White-Box Transformers

1 code implementation3 Apr 2024 Druv Pai, Ziyang Wu, Sam Buchanan, Yaodong Yu, Yi Ma

We do this by exploiting a fundamental connection between diffusion, compression, and (masked) completion, deriving a deep transformer-like masked autoencoder architecture, called CRATE-MAE, in which the role of each layer is mathematically fully interpretable: they transform the data distribution to and from a structured representation.

Representation Learning

Differentially Private Representation Learning via Image Captioning

1 code implementation4 Mar 2024 Tom Sander, Yaodong Yu, Maziar Sanjabi, Alain Durmus, Yi Ma, Kamalika Chaudhuri, Chuan Guo

In this work, we show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets.

Image Captioning Representation Learning

A Study on the Calibration of In-context Learning

1 code implementation7 Dec 2023 HANLIN ZHANG, Yi-Fan Zhang, Yaodong Yu, Dhruv Madeka, Dean Foster, Eric Xing, Himabindu Lakkaraju, Sham Kakade

Accurate uncertainty quantification is crucial for the safe deployment of machine learning models, and prior research has demonstrated improvements in the calibration of modern language models (LMs).

In-Context Learning Natural Language Understanding +1

White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?

1 code implementation22 Nov 2023 Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Hao Bai, Yuexiang Zhai, Benjamin D. Haeffele, Yi Ma

This leads to a family of white-box transformer-like deep network architectures, named CRATE, which are mathematically fully interpretable.

All Data Compression +2

Emergence of Segmentation with Minimalistic White-Box Transformers

1 code implementation30 Aug 2023 Yaodong Yu, Tianzhe Chu, Shengbang Tong, Ziyang Wu, Druv Pai, Sam Buchanan, Yi Ma

Transformer-like models for vision tasks have recently proven effective for a wide range of downstream applications such as segmentation and detection.

Segmentation Self-Supervised Learning

Scaff-PD: Communication Efficient Fair and Robust Federated Learning

no code implementations25 Jul 2023 Yaodong Yu, Sai Praneeth Karimireddy, Yi Ma, Michael I. Jordan

We present Scaff-PD, a fast and communication-efficient algorithm for distributionally robust federated learning.

Fairness Federated Learning

ViP: A Differentially Private Foundation Model for Computer Vision

1 code implementation15 Jun 2023 Yaodong Yu, Maziar Sanjabi, Yi Ma, Kamalika Chaudhuri, Chuan Guo

In this work, we propose as a mitigation measure a recipe to train foundation vision models with differential privacy (DP) guarantee.

White-Box Transformers via Sparse Rate Reduction

1 code implementation NeurIPS 2023 Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Benjamin D. Haeffele, Yi Ma

Particularly, we show that the standard transformer block can be derived from alternating optimization on complementary parts of this objective: the multi-head self-attention operator can be viewed as a gradient descent step to compress the token sets by minimizing their lossy coding rate, and the subsequent multi-layer perceptron can be viewed as attempting to sparsify the representation of the tokens.

Representation Learning

Federated Conformal Predictors for Distributed Uncertainty Quantification

1 code implementation27 May 2023 Charles Lu, Yaodong Yu, Sai Praneeth Karimireddy, Michael I. Jordan, Ramesh Raskar

Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning since it can be easily applied as a post-processing step to already trained models.

Conformal Prediction Federated Learning +2

TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels

1 code implementation13 Jul 2022 Yaodong Yu, Alexander Wei, Sai Praneeth Karimireddy, Yi Ma, Michael I. Jordan

Leveraging this observation, we propose a Train-Convexify-Train (TCT) procedure to sidestep this issue: first, learn features using off-the-shelf methods (e. g., FedAvg); then, optimize a convexified problem obtained from the network's empirical neural tangent kernel approximation.

Federated Learning

Robust Calibration with Multi-domain Temperature Scaling

no code implementations6 Jun 2022 Yaodong Yu, Stephen Bates, Yi Ma, Michael I. Jordan

Uncertainty quantification is essential for the reliable deployment of machine learning models to high-stakes application domains.

Uncertainty Quantification

Conditional Supervised Contrastive Learning for Fair Text Classification

1 code implementation23 May 2022 Jianfeng Chi, William Shand, Yaodong Yu, Kai-Wei Chang, Han Zhao, Yuan Tian

Contrastive representation learning has gained much attention due to its superior performance in learning representations from both image and sequential data.

Contrastive Learning Fairness +3

Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback

no code implementations15 May 2022 Tianyi Lin, Aldo Pacchiano, Yaodong Yu, Michael I. Jordan

Motivated by applications to online learning in sparse estimation and Bayesian optimization, we consider the problem of online unconstrained nonsubmodular minimization with delayed costs in both full information and bandit feedback settings.

Bayesian Optimization

What You See is What You Get: Principled Deep Learning via Distributional Generalization

1 code implementation7 Apr 2022 Bogdan Kulynych, Yao-Yuan Yang, Yaodong Yu, Jarosław Błasiok, Preetum Nakkiran

In contrast, we show that Differentially-Private (DP) training provably ensures the high-level WYSIWYG property, which we quantify using a notion of distributional generalization.

Predicting Out-of-Distribution Error with the Projection Norm

1 code implementation11 Feb 2022 Yaodong Yu, Zitong Yang, Alexander Wei, Yi Ma, Jacob Steinhardt

Projection Norm first uses model predictions to pseudo-label test samples and then trains a new model on the pseudo-labels.

Pseudo Label text-classification +1

The Effect of Model Size on Worst-Group Generalization

no code implementations8 Dec 2021 Alan Pham, Eunice Chan, Vikranth Srivatsa, Dhruba Ghosh, Yaoqing Yang, Yaodong Yu, Ruiqi Zhong, Joseph E. Gonzalez, Jacob Steinhardt

Overparameterization is shown to result in poor test accuracy on rare subgroups under a variety of settings where subgroup information is known.

Closed-Loop Data Transcription to an LDR via Minimaxing Rate Reduction

1 code implementation12 Nov 2021 Xili Dai, Shengbang Tong, Mingyang Li, Ziyang Wu, Michael Psenka, Kwan Ho Ryan Chan, Pengyuan Zhai, Yaodong Yu, Xiaojun Yuan, Heung Yeung Shum, Yi Ma

In particular, we propose to learn a closed-loop transcription between a multi-class multi-dimensional data distribution and a linear discriminative representation (LDR) in the feature space that consists of multiple independent multi-dimensional linear subspaces.

Decoder

On the Convergence of Stochastic Extragradient for Bilinear Games using Restarted Iteration Averaging

no code implementations30 Jun 2021 Chris Junchi Li, Yaodong Yu, Nicolas Loizou, Gauthier Gidel, Yi Ma, Nicolas Le Roux, Michael I. Jordan

We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) method with constant step size, and presenting variations of the method that yield favorable convergence.

ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction

2 code implementations21 May 2021 Kwan Ho Ryan Chan, Yaodong Yu, Chong You, Haozhi Qi, John Wright, Yi Ma

This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation.

All Data Compression

Fast Distributionally Robust Learning with Variance Reduced Min-Max Optimization

no code implementations27 Apr 2021 Yaodong Yu, Tianyi Lin, Eric Mazumdar, Michael I. Jordan

Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications -- reflecting the need for classifiers and predictive models that are robust to the distribution shifts that arise from phenomena such as selection bias or nonstationarity.

BIG-bench Machine Learning Selection bias

Understanding Generalization in Adversarial Training via the Bias-Variance Decomposition

1 code implementation17 Mar 2021 Yaodong Yu, Zitong Yang, Edgar Dobriban, Jacob Steinhardt, Yi Ma

To investigate this gap, we decompose the test risk into its bias and variance components and study their behavior as a function of adversarial training perturbation radii ($\varepsilon$).

Deep Networks from the Principle of Rate Reduction

3 code implementations27 Oct 2020 Kwan Ho Ryan Chan, Yaodong Yu, Chong You, Haozhi Qi, John Wright, Yi Ma

The layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layer-by-layer in a forward propagation fashion by emulating the gradient scheme.

Adversarial Robustness of Stabilized NeuralODEs Might be from Obfuscated Gradients

1 code implementation28 Sep 2020 Yifei Huang, Yaodong Yu, Hongyang Zhang, Yi Ma, Yuan YAO

Even replacing only the first layer of a ResNet by such a ODE block can exhibit further improvement in robustness, e. g., under PGD-20 ($\ell_\infty=0. 031$) attack on CIFAR-10 dataset, it achieves 91. 57\% and natural accuracy and 62. 35\% robust accuracy, while a counterpart architecture of ResNet trained with TRADES achieves natural and robust accuracy 76. 29\% and 45. 24\%, respectively.

Adversarial Defense Adversarial Robustness

Boundary thickness and robustness in learning models

1 code implementation NeurIPS 2020 Yaoqing Yang, Rajiv Khanna, Yaodong Yu, Amir Gholami, Kurt Keutzer, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney

Using these observations, we show that noise-augmentation on mixup training further increases boundary thickness, thereby combating vulnerability to various forms of adversarial attacks and OOD transforms.

Adversarial Defense Data Augmentation

Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

2 code implementations NeurIPS 2020 Yaodong Yu, Kwan Ho Ryan Chan, Chong You, Chaobing Song, Yi Ma

To learn intrinsic low-dimensional structures from high-dimensional data that most discriminate between classes, we propose the principle of Maximal Coding Rate Reduction ($\text{MCR}^2$), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class.

Clustering Contrastive Learning +1

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

1 code implementation ICML 2020 Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, Yi Ma

We provide a simple explanation for this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network.

Theoretically Principled Trade-off between Robustness and Accuracy

9 code implementations24 Jan 2019 Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, Michael. I. Jordan

We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples.

Adversarial Attack Adversarial Defense +2

Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms for Finding Local Minima

no code implementations NeurIPS 2018 Yaodong Yu, Pan Xu, Quanquan Gu

We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently.

Stochastic Optimization

A Primal-Dual Analysis of Global Optimality in Nonconvex Low-Rank Matrix Recovery

no code implementations ICML 2018 Xiao Zhang, Lingxiao Wang, Yaodong Yu, Quanquan Gu

We propose a primal-dual based framework for analyzing the global optimality of nonconvex low-rank matrix recovery.

Matrix Completion

Learning One-hidden-layer ReLU Networks via Gradient Descent

no code implementations20 Jun 2018 Xiao Zhang, Yaodong Yu, Lingxiao Wang, Quanquan Gu

We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network.

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima

no code implementations18 Dec 2017 Yaodong Yu, Pan Xu, Quanquan Gu

We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently.

Stochastic Optimization

Saving Gradient and Negative Curvature Computations: Finding Local Minima More Efficiently

no code implementations11 Dec 2017 Yaodong Yu, Difan Zou, Quanquan Gu

We propose a family of nonconvex optimization algorithms that are able to save gradient and negative curvature computations to a large extent, and are guaranteed to find an approximate local minimum with improved runtime complexity.

Cannot find the paper you are looking for? You can Submit a new open access paper.