Search Results for author: Song Mei

Found 35 papers, 5 papers with code

An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization

no code implementations11 Apr 2024 Minshuo Chen, Song Mei, Jianqing Fan, Mengdi Wang

In this paper, we review emerging applications of diffusion models, understanding their sample generation under various controls.

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning

no code implementations8 Apr 2024 Ruiqi Zhang, Licong Lin, Yu Bai, Song Mei

LLM unlearning aims to eliminate the influence of undesirable data from the pre-trained model while preserving the model's utilities on other tasks.

Mean-field variational inference with the TAP free energy: Geometric and statistical properties in linear models

no code implementations14 Nov 2023 Michael Celentano, Zhou Fan, Licong Lin, Song Mei

In settings where it is conjectured that no efficient algorithm can find this local neighborhood, we prove analogous geometric properties for a local minimizer of the TAP free energy reachable by AMP, and show that posterior inference based on this minimizer remains correctly calibrated.

Variational Inference

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

no code implementations16 Oct 2023 Tianyu Guo, Wei Hu, Song Mei, Huan Wang, Caiming Xiong, Silvio Savarese, Yu Bai

Through extensive probing and a new pasting experiment, we further reveal several mechanisms within the trained transformers, such as concrete copying behaviors on both the inputs and the representations, linear ICL capability of the upper layers alone, and a post-ICL representation selection mechanism in a harder mixture setting.

In-Context Learning

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining

no code implementations12 Oct 2023 Licong Lin, Yu Bai, Song Mei

This provides the first quantitative analysis of the ICRL capabilities of transformers pretrained from offline trajectories.

reinforcement-learning Thompson Sampling

Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models

no code implementations20 Sep 2023 Song Mei, Yuchen Wu

We investigate the approximation efficiency of score functions by deep neural networks in diffusion-based generative modeling.

Denoising Efficient Neural Network +1

Lower Bounds for Learning in Revealing POMDPs

no code implementations2 Feb 2023 Fan Chen, Huan Wang, Caiming Xiong, Song Mei, Yu Bai

However, the fundamental limits for learning in revealing POMDPs are much less understood, with existing lower bounds being rather preliminary and having substantial gaps from the current best upper bounds.

Reinforcement Learning (RL)

Near-optimal multiple testing in Bayesian linear models with finite-sample FDR control

1 code implementation4 Nov 2022 Taejoo Ahn, Licong Lin, Song Mei

In this paper, we develop near-optimal multiple testing procedures for high dimensional Bayesian linear models with isotropic covariates.

Open-Ended Question Answering Variable Selection

Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms

no code implementations29 Sep 2022 Fan Chen, Yu Bai, Song Mei

Recent work has identified several tractable subclasses that are learnable with polynomial samples, such as Partially Observable Markov Decision Processes (POMDPs) with certain revealing or decodability conditions.

Reinforcement Learning (RL)

Unified Algorithms for RL with Decision-Estimation Coefficients: No-Regret, PAC, and Reward-Free Learning

no code implementations23 Sep 2022 Fan Chen, Song Mei, Yu Bai

Finding unified complexity measures and algorithms for sample-efficient learning is a central topic of research in reinforcement learning (RL).

PAC learning Reinforcement Learning (RL)

Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent

no code implementations30 May 2022 Yu Bai, Chi Jin, Song Mei, Ziang Song, Tiancheng Yu

A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs).

Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games

no code implementations15 May 2022 Ziang Song, Song Mei, Yu Bai

We then design an uncoupled no-regret algorithm that finds an $\varepsilon$-approximate $K$-EFCE within $\widetilde{\mathcal{O}}(\max_{i}X_iA_i^{K}/\varepsilon^2)$ iterations in the full feedback setting, where $X_i$ and $A_i$ are the number of information sets and actions for the $i$-th player.

Efficient and Differentiable Conformal Prediction with General Function Classes

1 code implementation ICLR 2022 Yu Bai, Song Mei, Huan Wang, Yingbo Zhou, Caiming Xiong

Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly over existing approaches in several applications such as prediction intervals with improved length, minimum-volume prediction sets for multi-output regression, and label prediction sets for image classification.

Conformal Prediction Image Classification +2

Near-Optimal Learning of Extensive-Form Games with Imperfect Information

no code implementations3 Feb 2022 Yu Bai, Chi Jin, Song Mei, Tiancheng Yu

This improves upon the best known sample complexity of $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of $\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic lower bound up to logarithmic factors.

counterfactual Open-Ended Question Answering

Learning with convolution and pooling operations in kernel methods

no code implementations16 Nov 2021 Theodor Misiakiewicz, Song Mei

Recent empirical work has shown that hierarchical convolutional kernels inspired by convolutional neural networks (CNNs) significantly improve the performance of kernel methods in image classification tasks.

Image Classification

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

no code implementations ICLR 2022 Ziang Song, Song Mei, Yu Bai

First, we design algorithms for learning an $\epsilon$-Coarse Correlated Equilibrium (CCE) in $\widetilde{\mathcal{O}}(H^5S\max_{i\le m} A_i / \epsilon^2)$ episodes, and an $\epsilon$-Correlated Equilibrium (CE) in $\widetilde{\mathcal{O}}(H^6S\max_{i\le m} A_i^2 / \epsilon^2)$ episodes.

Multi-agent Reinforcement Learning

Spectral Multiplicity Entails Sample-wise Multiple Descent

no code implementations29 Sep 2021 Lin Chen, Song Mei

Moreover, we theoretically show that the ridge estimator with optimal regularization can result in a monotone generalization risk curve and thereby eliminate multiple descent under some assumptions.

Local convexity of the TAP free energy and AMP convergence for Z2-synchronization

no code implementations21 Jun 2021 Michael Celentano, Zhou Fan, Song Mei

This provides a rigorous foundation for variational inference in high dimensions via minimization of the TAP free energy.

Bayesian Inference Variational Inference

Understanding the Under-Coverage Bias in Uncertainty Estimation

no code implementations NeurIPS 2021 Yu Bai, Song Mei, Huan Wang, Caiming Xiong

Estimating the data uncertainty in regression tasks is often done by learning a quantile function or a prediction interval of the true label conditioned on the input.

regression

Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models

no code implementations8 Mar 2021 Zitong Yang, Yu Bai, Song Mei

We show that, in the setting where the classical uniform convergence bound is vacuous (diverges to $\infty$), uniform convergence over the interpolators still gives a non-trivial bound of the test error of interpolating solutions.

Learning with invariances in random features and kernel models

no code implementations25 Feb 2021 Song Mei, Theodor Misiakiewicz, Andrea Montanari

Certain neural network architectures -- for instance, convolutional networks -- are believed to owe their success to the fact that they exploit such invariance properties.

Data Augmentation

Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification

no code implementations15 Feb 2021 Yu Bai, Song Mei, Huan Wang, Caiming Xiong

Modern machine learning models with high accuracy are often miscalibrated -- the predicted top probability does not reflect the actual accuracy, and tends to be over-confident.

Binary Classification

Generalization error of random features and kernel methods: hypercontractivity and kernel matrix concentration

no code implementations26 Jan 2021 Song Mei, Theodor Misiakiewicz, Andrea Montanari

We show that the test error of random features ridge regression is dominated by its approximation error and is larger than the error of KRR as long as $N\le n^{1-\delta}$ for some $\delta>0$.

regression

When Do Neural Networks Outperform Kernel Methods?

1 code implementation NeurIPS 2020 Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance.

Image Classification

Limitations of Lazy Training of Two-layers Neural Network

1 code implementation NeurIPS 2019 Song Mei, Theodor Misiakiewicz, Behrooz Ghorbani, Andrea Montanari

We study the supervised learning problem under either of the following two models: (1) Feature vectors x_i are d-dimensional Gaussian and responses are y_i = f_*(x_i) for f_* an unknown quadratic function; (2) Feature vectors x_i are distributed as a mixture of two d-dimensional centered Gaussians, and y_i's are the corresponding class labels.

Vocal Bursts Valence Prediction

The generalization error of random features regression: Precise asymptotics and double descent curve

no code implementations14 Aug 2019 Song Mei, Andrea Montanari

We compute the precise asymptotics of the test error, in the limit $N, n, d\to \infty$ with $N/d$ and $n/d$ fixed.

regression

Limitations of Lazy Training of Two-layers Neural Networks

1 code implementation21 Jun 2019 Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

We study the supervised learning problem under either of the following two models: (1) Feature vectors ${\boldsymbol x}_i$ are $d$-dimensional Gaussians and responses are $y_i = f_*({\boldsymbol x}_i)$ for $f_*$ an unknown quadratic function; (2) Feature vectors ${\boldsymbol x}_i$ are distributed as a mixture of two $d$-dimensional centered Gaussians, and $y_i$'s are the corresponding class labels.

Vocal Bursts Valence Prediction

Linearized two-layers neural networks in high dimension

no code implementations27 Apr 2019 Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and enjoy universal approximation properties when the number of neurons $N$ diverges, for a fixed dimension $d$.

regression Vocal Bursts Intensity Prediction +1

Proximal algorithms for constrained composite optimization, with applications to solving low-rank SDPs

no code implementations1 Mar 2019 Yu Bai, John Duchi, Song Mei

We study a family of (potentially non-convex) constrained optimization problems with convex composite structure.

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit

no code implementations16 Feb 2019 Song Mei, Theodor Misiakiewicz, Andrea Montanari

Earlier work shows that (under some regularity assumptions), the mean field description is accurate as soon as the number of hidden units is much larger than the dimension $D$.

A Mean Field View of the Landscape of Two-Layers Neural Networks

no code implementations18 Apr 2018 Song Mei, Andrea Montanari, Phan-Minh Nguyen

Does SGD converge to a global optimum of the risk or only to a local optimum?

The landscape of the spiked tensor model

no code implementations15 Nov 2017 Gerard Ben Arous, Song Mei, Andrea Montanari, Mihai Nica

We compute the expected number of critical points and local maxima of this objective function and show that it is exponential in the dimensions $n$, and give exact formulas for the exponential growth rate.

The Landscape of Empirical Risk for Non-convex Losses

no code implementations22 Jul 2016 Song Mei, Yu Bai, Andrea Montanari

We establish uniform convergence of the gradient and Hessian of the empirical risk to their population counterparts, as soon as the number of samples becomes larger than the number of unknown parameters (modulo logarithmic factors).

Binary Classification General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.