no code implementations • 13 Oct 2024 • Dhruv Rohatgi, Tanya Marwah, Zachary Chase Lipton, Jianfeng Lu, Ankur Moitra, Andrej Risteski
In this paper, we consider the benefits of architectures that maintain and update edge embeddings.
no code implementations • 7 Oct 2024 • Abhishek Panigrahi, Bingbin Liu, Sadhika Malladi, Andrej Risteski, Surbhi Goel
Our theoretical and empirical findings on sparse parity, complemented by empirical observations on more complex tasks, highlight the benefit of progressive distillation via implicit curriculum across setups.
no code implementations • 3 Sep 2024 • Ricardo Buitrago Ruiz, Tanya Marwah, Albert Gu, Andrej Risteski
Data-driven techniques have emerged as a promising alternative to traditional numerical methods for solving partial differential equations (PDEs).
1 code implementation • 22 Jul 2024 • Yuchen Li, Alexandre Kirchmeyer, Aashay Mehta, Yilong Qin, Boris Dadachev, Kishore Papineni, Sanjiv Kumar, Andrej Risteski
While alternate classes of models have been explored, we have limited mathematical understanding of their fundamental power and limitations.
no code implementations • NeurIPS 2023 • Kaiyue Wen, Yuchen Li, Bingbin Liu, Andrej Risteski
Interpretability methods aim to understand the algorithm implemented by a trained model (e. g., a Transofmer) by examining various aspects of the model, such as the weight matrices or the attention patterns.
no code implementations • NeurIPS 2023 • Tanya Marwah, Ashwini Pokle, J. Zico Kolter, Zachary C. Lipton, Jianfeng Lu, Andrej Risteski
Motivated by this observation, we propose FNO-DEQ, a deep equilibrium variant of the FNO architecture that directly solves for the solution of a steady-state PDE as the infinite-depth fixed point of an implicit operator layer using a black-box root solver and differentiates analytically through this fixed point resulting in $\mathcal{O}(1)$ training memory.
no code implementations • 7 Nov 2023 • Elan Rosenfeld, Andrej Risteski
We identify a new phenomenon in neural network optimization which arises from the interaction of depth and a particular heavy-tailed structure in natural data.
no code implementations • 15 Jun 2023 • Yilong Qin, Andrej Risteski
Moreover, we show that if the distribution being learned is a finite mixture of Gaussians in $d$ dimensions with a shared covariance, the sample complexity of annealed score matching is polynomial in the ambient dimension, the diameter of the means, and the smallest and largest eigenvalues of the covariance -- obviating the Poincar\'e constant-based lower bounds of the basic score matching loss shown in Koehler et al. 2022.
no code implementations • 1 Jun 2023 • Runtian Zhai, Bingbin Liu, Andrej Risteski, Zico Kolter, Pradeep Ravikumar
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator, suggesting that learning a linear probe atop such representation can be connected to RKHS regression.
1 code implementation • 7 Mar 2023 • Yuchen Li, Yuanzhi Li, Andrej Risteski
While the successes of transformers across many domains are indisputable, accurate understanding of the learning mechanics is still largely lacking.
no code implementations • 21 Oct 2022 • Tanya Marwah, Zachary C. Lipton, Jianfeng Lu, Andrej Risteski
We show that if composing a function with Barron norm $b$ with partial derivatives of $L$ produces a function of Barron norm at most $B_L b^p$, the solution to the PDE can be $\epsilon$-approximated in the $L^2$ sense by a function with Barron norm $O\left(\left(dB_L\right)^{\max\{p \log(1/ \epsilon), p^{\log(1/\epsilon)}\}}\right)$.
no code implementations • 3 Oct 2022 • Frederic Koehler, Alexander Heckett, Andrej Risteski
Roughly, we show that the score matching estimator is statistically comparable to the maximum likelihood when the distribution has a small isoperimetric constant.
no code implementations • 1 Oct 2022 • Holden Lee, Chirag Pabbaraju, Anish Sevekari, Andrej Risteski
Noise Contrastive Estimation (NCE) is a popular approach for learning probability density functions parameterized up to a constant of proportionality.
1 code implementation • 29 Mar 2022 • Ashwini Pokle, Jinjin Tian, Yuchen Li, Andrej Risteski
Some recent works however have shown promising results for non-contrastive learning, which does not require negative samples.
no code implementations • 27 Mar 2022 • Binghui Peng, Andrej Risteski
When the features are linear, we design an efficient gradient-based algorithm $\mathsf{DPGD}$, that is guaranteed to perform well on the current environment, as well as avoid catastrophic forgetting.
no code implementations • 18 Feb 2022 • Bingbin Liu, Daniel Hsu, Pradeep Ravikumar, Andrej Risteski
This lens is undoubtedly very interesting, but suffers from the problem that there isn't a "canonical" set of downstream tasks to focus on -- in practice, this problem is usually resolved by competing on the benchmark dataset du jour.
no code implementations • 17 Feb 2022 • Frederic Koehler, Holden Lee, Andrej Risteski
We consider Ising models on the hypercube with a general interaction matrix $J$, and give a polynomial time sampling algorithm when all but $O(1)$ eigenvalues of $J$ lie in an interval of length one, a situation which occurs in many models of interest.
2 code implementations • 14 Feb 2022 • Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski
Towards this end, we introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
1 code implementation • ICLR 2022 • Frederic Koehler, Viraj Mehta, Chenghui Zhou, Andrej Risteski
Recent work by Dai and Wipf (2020) proposes a two-stage training algorithm for VAEs, based on a conjecture that in standard VAE training the generator will converge to a solution with 0 variance which is correctly supported on the ground truth manifold.
no code implementations • NeurIPS 2021 • Holden Lee, Chirag Pabbaraju, Anish Prasad Sevekari, Andrej Risteski
As ill-conditioned Jacobians are an obstacle for likelihood-based training, the fundamental question remains: which distributions can be approximated using well-conditioned affine coupling flows?
no code implementations • ICLR 2022 • Bingbin Liu, Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski
Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models.
no code implementations • ICML Workshop INNF 2021 • Divyansh Pareek, Andrej Risteski
Training and using modern neural-network based latent-variable generative models (like Variational Autoencoders) often require simultaneously training a generative direction along with an inferential(encoding) direction, which approximates the posterior distribution over the latent variables.
no code implementations • ICML Workshop INNF 2021 • Holden Lee, Chirag Pabbaraju, Anish Sevekari, Andrej Risteski
As ill-conditioned Jacobians are an obstacle for likelihood-based training, the fundamental question remains: which distributions can be approximated using well-conditioned affine coupling flows?
no code implementations • 18 Jun 2021 • Yining Chen, Elan Rosenfeld, Mark Sellke, Tengyu Ma, Andrej Risteski
Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments.
no code implementations • ACL 2021 • Yuchen Li, Andrej Risteski
Concretely, we ground this question in the sandbox of probabilistic context-free-grammars (PCFGs), and identify a key aspect of the representational power of these approaches: the amount and directionality of context that the predictor has access to when forced to make parsing decision.
no code implementations • 3 Mar 2021 • Bingbin Liu, Pradeep Ravikumar, Andrej Risteski
Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data.
no code implementations • NeurIPS 2021 • Tanya Marwah, Zachary C. Lipton, Andrej Risteski
Recent experiments have shown that deep networks can approximate solutions to high-dimensional PDEs, seemingly escaping the curse of dimensionality.
no code implementations • 25 Feb 2021 • Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski
A popular assumption for out-of-distribution generalization is that the training data comprises sub-datasets, each drawn from a distinct distribution; the goal is then to "interpolate" these distributions and "extrapolate" beyond them -- this objective is broadly known as domain generalization.
no code implementations • ICLR 2021 • Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski
We furthermore present the very first results in the non-linear regime: we demonstrate that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.
no code implementations • 2 Oct 2020 • Frederic Koehler, Viraj Mehta, Andrej Risteski
Normalizing flows are among the most popular paradigms in generative modeling, especially for images, primarily because we can efficiently evaluate the likelihood of a data point.
no code implementations • 30 Sep 2020 • Rong Ge, Holden Lee, Jianfeng Lu, Andrej Risteski
We give a algorithm for exact sampling from the Bingham distribution $p(x)\propto \exp(x^\top A x)$ on the sphere $\mathcal S^{d-1}$ with expected runtime of $\operatorname{poly}(d, \lambda_{\max}(A)-\lambda_{\min}(A))$.
no code implementations • ICML 2020 • Han Zhao, Junjie Hu, Andrej Risteski
The goal of universal machine translation is to learn to translate between any pair of languages, given a corpus of paired translated documents for \emph{a small subset} of all pairs of languages.
no code implementations • ICLR Workshop DeepDiffEq 2019 • Ankur Moitra, Andrej Risteski
In this paper, we study one aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability.
no code implementations • 13 Feb 2020 • Ankur Moitra, Andrej Risteski
In this paper, we focus on an aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability.
no code implementations • 25 Sep 2019 • Rares-Darius Buhai, Andrej Risteski, Yoni Halpern, David Sontag
One of the most surprising and exciting discoveries in supervising learning was the benefit of overparameterization (i. e. training a very large model) to improving the optimization landscape of a problem, with minimal effect on statistical performance (i. e. generalization).
1 code implementation • ICML 2020 • Rares-Darius Buhai, Yoni Halpern, Yoon Kim, Andrej Risteski, David Sontag
One of the most surprising and exciting discoveries in supervised learning was the benefit of overparameterization (i. e. training a very large model) to improving the optimization landscape of a problem, with minimal effect on statistical performance (i. e. generalization).
no code implementations • 30 May 2019 • Dylan J. Foster, Andrej Risteski
In agnostic tensor completion, we make no assumption on the rank of the unknown tensor, but attempt to predict unknown entries as well as the best rank-$r$ tensor.
no code implementations • ICLR 2019 • Frederic Koehler, Andrej Risteski
We give an almost-tight theoretical analysis of the performance of both neural networks and polynomials for this problem, as well as verify our theory with simulations.
no code implementations • 29 Nov 2018 • Rong Ge, Holden Lee, Andrej Risteski
Previous approaches rely on decomposing the state space as a partition of sets, while our approach can be thought of as decomposing the stationary measure as a mixture of distributions (a "soft partition").
no code implementations • 22 Aug 2018 • Vishesh Jain, Frederic Koehler, Andrej Risteski
More precisely, we show that the mean-field approximation is within $O((n\|J\|_{F})^{2/3})$ of the free energy, where $\|J\|_F$ denotes the Frobenius norm of the interaction matrix of the Ising model.
no code implementations • ICLR 2019 • Yu Bai, Tengyu Ma, Andrej Risteski
Our preliminary experiments show that on synthetic datasets the test IPM is well correlated with KL divergence or the Wasserstein distance, indicating that the lack of diversity in GANs may be caused by the sub-optimality in optimization instead of statistical inefficiency.
no code implementations • 29 May 2018 • Frederic Koehler, Andrej Risteski
We give almost-tight bounds on the performance of both neural networks and low degree polynomials for this problem.
no code implementations • ICLR 2018 • Sanjeev Arora, Andrej Risteski, Yi Zhang
Using this evidence is presented that well-known GANs approaches do learn distributions of fairly low support.
no code implementations • 7 Nov 2017 • Sanjeev Arora, Andrej Risteski, Yi Zhang
Encoder-decoder GANs architectures (e. g., BiGAN and ALI) seek to add an inference mechanism to the GANs setup, consisting of a small encoder deep net that maps data-points to their succinct encodings.
no code implementations • NeurIPS 2018 • Rong Ge, Holden Lee, Andrej Risteski
We analyze this Markov chain for the canonical multi-modal distribution: a mixture of gaussians (of equal variance).
no code implementations • 14 Jun 2017 • Sanjeev Arora, Andrej Risteski
There is general consensus that learning representations is useful for a variety of reasons, e. g. efficient use of labeled data (semi-supervised learning), transfer learning and understanding hidden structure of data.
no code implementations • 29 Apr 2017 • Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora
Our methods require very few linguistic resources, thus being applicable for Wordnet construction in low-resources languages, and may further be applied to sense clustering and other Wordnet improvements.
1 code implementation • WS 2017 • Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora
To evaluate our method we construct two 600-word testsets for word-to-synset matching in French and Russian using native speakers and evaluate the performance of our method along with several other recent approaches.
no code implementations • 22 Feb 2017 • Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, Sanjeev Arora
We take a first cut at explaining the expressivity of multilayer nets by giving a sufficient criterion for a function to be approximable by a neural network with $n$ hidden layers.
no code implementations • 28 Dec 2016 • Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski
Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables.
no code implementations • NeurIPS 2016 • Andrej Risteski, Yuanzhi Li
In recent years, a rapidly increasing number of applications in practice requires solving non-convex objectives, like training neural networks, learning graphical models, maximum likelihood estimation etc.
no code implementations • NeurIPS 2016 • Yuanzhi Li, YIngyu Liang, Andrej Risteski
Non-negative matrix factorization is a popular tool for decomposing data into feature and weight matrices under non-negativity constraints.
no code implementations • NeurIPS 2016 • Yuanzhi Li, Andrej Risteski
The well known maximum-entropy principle due to Jaynes, which states that given mean parameters, the maximum entropy distribution matching them is in an exponential family, has been very popular in machine learning due to its "Occam's razor" interpretation.
no code implementations • 11 Jul 2016 • Andrej Risteski
We make use of recent tools in combinatorial optimization: the Sherali-Adams and Lasserre convex programming hierarchies, in combination with variational methods to get algorithms for calculating partition functions in these families.
no code implementations • 6 Feb 2016 • Yuanzhi Li, YIngyu Liang, Andrej Risteski
We show that the properties only need to hold in an average sense and can be achieved by the clipping step.
1 code implementation • TACL 2018 • Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski
A novel aspect of our technique is that each extracted word sense is accompanied by one of about 2000 "discourse atoms" that gives a succinct description of which other words co-occur with that word sense.
no code implementations • NeurIPS 2015 • Pranjal Awasthi, Andrej Risteski
The assumptions on the topic priors are related to the well known Dirichlet prior, introduced to the area of topic modeling by (Blei et al., 2003).
no code implementations • 7 Mar 2015 • Pranjal Awasthi, Moses Charikar, Kevin A. Lai, Andrej Risteski
We resolve an open question from (Christiano, 2014b) posed in COLT'14 regarding the optimal dependency of the regret achievable for online local learning on the size of the label set.
4 code implementations • TACL 2016 • Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski
Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods.