no code implementations • 31 Jan 2025 • Wojciech Zaremba, Evgenia Nitishinskaya, Boaz Barak, Stephanie Lin, Sam Toyer, Yaodong Yu, Rachel Dias, Eric Wallace, Kai Xiao, Johannes Heidecke, Amelia Glaese
We conduct experiments on the impact of increasing inference-time compute in reasoning models (specifically OpenAI o1-preview and o1-mini) on their robustness to adversarial attacks.
1 code implementation • 23 Dec 2024 • Ziyang Wu, Tianjiao Ding, Yifu Lu, Druv Pai, Jingyuan Zhang, Weida Wang, Yaodong Yu, Yi Ma, Benjamin D. Haeffele
Specifically, we derive a novel variational form of the MCR$^2$ objective and show that the architecture that results from unrolled gradient descent of this variational objective leads to a new attention module called Token Statistics Self-Attention (TSSA).
1 code implementation • 15 Nov 2024 • Sucheng Ren, Yaodong Yu, Nataniel Ruiz, Feng Wang, Alan Yuille, Cihang Xie
In this paper, we show that this scale-wise autoregressive framework can be effectively decoupled into \textit{intra-scale modeling}, which captures local spatial dependencies within each scale, and \textit{inter-scale modeling}, which models cross-scale relationships progressively from coarse-to-fine scales.
1 code implementation • 10 Oct 2024 • Feng Wang, Timing Yang, Yaodong Yu, Sucheng Ren, Guoyizhe Wei, Angtian Wang, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie
In this work, we present a comprehensive analysis of causal image modeling and introduce the Adventurer series models where we treat images as sequences of patch tokens and employ uni-directional language models to learn visual representations.
no code implementations • 27 Jun 2024 • Amartya Sanyal, Yaxi Hu, Yaodong Yu, Yian Ma, Yixin Wang, Bernhard Schölkopf
"Accuracy-on-the-line" is a widely observed phenomenon in machine learning, where a model's accuracy on in-distribution (ID) and out-of-distribution (OOD) data is positively correlated across different hyperparameters and data configurations.
no code implementations • 4 Jun 2024 • Peng Wang, Huikang Liu, Druv Pai, Yaodong Yu, Zhihui Zhu, Qing Qu, Yi Ma
The maximal coding rate reduction (MCR$^2$) objective for learning structured and compact deep representations is drawing increasing attention, especially after its recent usage in the derivation of fully explainable and highly effective deep network architectures.
no code implementations • 30 May 2024 • Jinrui Yang, Xianhang Li, Druv Pai, Yuyin Zhou, Yi Ma, Yaodong Yu, Cihang Xie
CRATE, a white-box transformer architecture designed to learn compressed and sparse representations, offers an intriguing alternative to standard vision transformers (ViTs) due to its inherent mathematical interpretability.
1 code implementation • 3 Apr 2024 • Druv Pai, Ziyang Wu, Sam Buchanan, Yaodong Yu, Yi Ma
We do this by exploiting a fundamental connection between diffusion, compression, and (masked) completion, deriving a deep transformer-like masked autoencoder architecture, called CRATE-MAE, in which the role of each layer is mathematically fully interpretable: they transform the data distribution to and from a structured representation.
1 code implementation • 4 Mar 2024 • Tom Sander, Yaodong Yu, Maziar Sanjabi, Alain Durmus, Yi Ma, Kamalika Chaudhuri, Chuan Guo
In this work, we show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets.
1 code implementation • 7 Dec 2023 • HANLIN ZHANG, Yi-Fan Zhang, Yaodong Yu, Dhruv Madeka, Dean Foster, Eric Xing, Himabindu Lakkaraju, Sham Kakade
Accurate uncertainty quantification is crucial for the safe deployment of machine learning models, and prior research has demonstrated improvements in the calibration of modern language models (LMs).
1 code implementation • 22 Nov 2023 • Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Hao Bai, Yuexiang Zhai, Benjamin D. Haeffele, Yi Ma
This leads to a family of white-box transformer-like deep network architectures, named CRATE, which are mathematically fully interpretable.
1 code implementation • 30 Aug 2023 • Yaodong Yu, Tianzhe Chu, Shengbang Tong, Ziyang Wu, Druv Pai, Sam Buchanan, Yi Ma
Transformer-like models for vision tasks have recently proven effective for a wide range of downstream applications such as segmentation and detection.
no code implementations • 25 Jul 2023 • Yaodong Yu, Sai Praneeth Karimireddy, Yi Ma, Michael I. Jordan
We present Scaff-PD, a fast and communication-efficient algorithm for distributionally robust federated learning.
1 code implementation • 15 Jun 2023 • Yaodong Yu, Maziar Sanjabi, Yi Ma, Kamalika Chaudhuri, Chuan Guo
In this work, we propose as a mitigation measure a recipe to train foundation vision models with differential privacy (DP) guarantee.
1 code implementation • NeurIPS 2023 • Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Benjamin D. Haeffele, Yi Ma
Particularly, we show that the standard transformer block can be derived from alternating optimization on complementary parts of this objective: the multi-head self-attention operator can be viewed as a gradient descent step to compress the token sets by minimizing their lossy coding rate, and the subsequent multi-layer perceptron can be viewed as attempting to sparsify the representation of the tokens.
1 code implementation • 27 May 2023 • Charles Lu, Yaodong Yu, Sai Praneeth Karimireddy, Michael I. Jordan, Ramesh Raskar
Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning since it can be easily applied as a post-processing step to already trained models.
1 code implementation • 13 Jul 2022 • Yaodong Yu, Alexander Wei, Sai Praneeth Karimireddy, Yi Ma, Michael I. Jordan
Leveraging this observation, we propose a Train-Convexify-Train (TCT) procedure to sidestep this issue: first, learn features using off-the-shelf methods (e. g., FedAvg); then, optimize a convexified problem obtained from the network's empirical neural tangent kernel approximation.
no code implementations • 6 Jun 2022 • Yaodong Yu, Stephen Bates, Yi Ma, Michael I. Jordan
Uncertainty quantification is essential for the reliable deployment of machine learning models to high-stakes application domains.
1 code implementation • 23 May 2022 • Jianfeng Chi, William Shand, Yaodong Yu, Kai-Wei Chang, Han Zhao, Yuan Tian
Contrastive representation learning has gained much attention due to its superior performance in learning representations from both image and sequential data.
no code implementations • 15 May 2022 • Tianyi Lin, Aldo Pacchiano, Yaodong Yu, Michael I. Jordan
Motivated by applications to online learning in sparse estimation and Bayesian optimization, we consider the problem of online unconstrained nonsubmodular minimization with delayed costs in both full information and bandit feedback settings.
1 code implementation • 7 Apr 2022 • Bogdan Kulynych, Yao-Yuan Yang, Yaodong Yu, Jarosław Błasiok, Preetum Nakkiran
In contrast, we show that Differentially-Private (DP) training provably ensures the high-level WYSIWYG property, which we quantify using a notion of distributional generalization.
1 code implementation • 11 Feb 2022 • Yaodong Yu, Zitong Yang, Alexander Wei, Yi Ma, Jacob Steinhardt
Projection Norm first uses model predictions to pseudo-label test samples and then trains a new model on the pseudo-labels.
no code implementations • 8 Dec 2021 • Alan Pham, Eunice Chan, Vikranth Srivatsa, Dhruba Ghosh, Yaoqing Yang, Yaodong Yu, Ruiqi Zhong, Joseph E. Gonzalez, Jacob Steinhardt
Overparameterization is shown to result in poor test accuracy on rare subgroups under a variety of settings where subgroup information is known.
1 code implementation • 12 Nov 2021 • Xili Dai, Shengbang Tong, Mingyang Li, Ziyang Wu, Michael Psenka, Kwan Ho Ryan Chan, Pengyuan Zhai, Yaodong Yu, Xiaojun Yuan, Heung Yeung Shum, Yi Ma
In particular, we propose to learn a closed-loop transcription between a multi-class multi-dimensional data distribution and a linear discriminative representation (LDR) in the feature space that consists of multiple independent multi-dimensional linear subspaces.
no code implementations • 29 Sep 2021 • Yaodong Yu, Heinrich Jiang, Dara Bahri, Hossein Mobahi, Seungyeon Kim, Ankit Singh Rawat, Andreas Veit, Yi Ma
Concretely, we show that larger models and larger datasets need to be simultaneously leveraged to improve OOD performance.
no code implementations • 30 Jun 2021 • Chris Junchi Li, Yaodong Yu, Nicolas Loizou, Gauthier Gidel, Yi Ma, Nicolas Le Roux, Michael I. Jordan
We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) method with constant step size, and presenting variations of the method that yield favorable convergence.
2 code implementations • 21 May 2021 • Kwan Ho Ryan Chan, Yaodong Yu, Chong You, Haozhi Qi, John Wright, Yi Ma
This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation.
no code implementations • 27 Apr 2021 • Yaodong Yu, Tianyi Lin, Eric Mazumdar, Michael I. Jordan
Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications -- reflecting the need for classifiers and predictive models that are robust to the distribution shifts that arise from phenomena such as selection bias or nonstationarity.
1 code implementation • 17 Mar 2021 • Yaodong Yu, Zitong Yang, Edgar Dobriban, Jacob Steinhardt, Yi Ma
To investigate this gap, we decompose the test risk into its bias and variance components and study their behavior as a function of adversarial training perturbation radii ($\varepsilon$).
3 code implementations • 27 Oct 2020 • Kwan Ho Ryan Chan, Yaodong Yu, Chong You, Haozhi Qi, John Wright, Yi Ma
The layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layer-by-layer in a forward propagation fashion by emulating the gradient scheme.
1 code implementation • 28 Sep 2020 • Yifei Huang, Yaodong Yu, Hongyang Zhang, Yi Ma, Yuan YAO
Even replacing only the first layer of a ResNet by such a ODE block can exhibit further improvement in robustness, e. g., under PGD-20 ($\ell_\infty=0. 031$) attack on CIFAR-10 dataset, it achieves 91. 57\% and natural accuracy and 62. 35\% robust accuracy, while a counterpart architecture of ResNet trained with TRADES achieves natural and robust accuracy 76. 29\% and 45. 24\%, respectively.
1 code implementation • NeurIPS 2020 • Yaoqing Yang, Rajiv Khanna, Yaodong Yu, Amir Gholami, Kurt Keutzer, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney
Using these observations, we show that noise-augmentation on mixup training further increases boundary thickness, thereby combating vulnerability to various forms of adversarial attacks and OOD transforms.
2 code implementations • NeurIPS 2020 • Yaodong Yu, Kwan Ho Ryan Chan, Chong You, Chaobing Song, Yi Ma
To learn intrinsic low-dimensional structures from high-dimensional data that most discriminate between classes, we propose the principle of Maximal Coding Rate Reduction ($\text{MCR}^2$), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class.
Ranked #23 on
Image Clustering
on STL-10
1 code implementation • ICML 2020 • Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, Yi Ma
We provide a simple explanation for this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network.
9 code implementations • 24 Jan 2019 • Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, Michael. I. Jordan
We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples.
Ranked #3 on
Adversarial Attack
on CIFAR-10
no code implementations • NeurIPS 2018 • Yaodong Yu, Pan Xu, Quanquan Gu
We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently.
no code implementations • ICML 2018 • Xiao Zhang, Lingxiao Wang, Yaodong Yu, Quanquan Gu
We propose a primal-dual based framework for analyzing the global optimality of nonconvex low-rank matrix recovery.
no code implementations • 20 Jun 2018 • Xiao Zhang, Yaodong Yu, Lingxiao Wang, Quanquan Gu
We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network.
no code implementations • 18 Dec 2017 • Yaodong Yu, Pan Xu, Quanquan Gu
We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently.
no code implementations • 11 Dec 2017 • Yaodong Yu, Difan Zou, Quanquan Gu
We propose a family of nonconvex optimization algorithms that are able to save gradient and negative curvature computations to a large extent, and are guaranteed to find an approximate local minimum with improved runtime complexity.