You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 15 Jun 2023 • Yilong Qin, Andrej Risteski

If $\mathcal{L}$ corresponds to a Markov process corresponding to a continuous version of simulated tempering, we show the corresponding generalized score matching loss is a Gaussian-convolution annealed score matching loss, akin to the one proposed in Song and Ermon 2019.

no code implementations • 3 Jun 2023 • Chirag Pabbaraju, Dhruv Rohatgi, Anish Sevekari, Holden Lee, Ankur Moitra, Andrej Risteski

In this work, we give the first example of a natural exponential family of distributions such that the score matching loss is computationally efficient to optimize, and has a comparable statistical efficiency to ML, while the ML loss is intractable to optimize using a gradient-based method.

no code implementations • 1 Jun 2023 • Runtian Zhai, Bingbin Liu, Andrej Risteski, Zico Kolter, Pradeep Ravikumar

Our first main theorem provides, for an arbitrary encoder, near tight bounds for both the estimation error incurred by fitting the linear probe on top of the encoder, and the approximation error entailed by the fitness of the RKHS the encoder learns.

1 code implementation • 7 Mar 2023 • Yuchen Li, Yuanzhi Li, Andrej Risteski

While the successes of transformers across many domains are indisputable, accurate understanding of the learning mechanics is still largely lacking.

no code implementations • 21 Oct 2022 • Tanya Marwah, Zachary C. Lipton, Jianfeng Lu, Andrej Risteski

We show that if composing a function with Barron norm $b$ with partial derivatives of $L$ produces a function of Barron norm at most $B_L b^p$, the solution to the PDE can be $\epsilon$-approximated in the $L^2$ sense by a function with Barron norm $O\left(\left(dB_L\right)^{\max\{p \log(1/ \epsilon), p^{\log(1/\epsilon)}\}}\right)$.

no code implementations • 3 Oct 2022 • Frederic Koehler, Alexander Heckett, Andrej Risteski

Roughly, we show that the score matching estimator is statistically comparable to the maximum likelihood when the distribution has a small isoperimetric constant.

no code implementations • 1 Oct 2022 • Holden Lee, Chirag Pabbaraju, Anish Sevekari, Andrej Risteski

Noise Contrastive Estimation (NCE) is a popular approach for learning probability density functions parameterized up to a constant of proportionality.

1 code implementation • 29 Mar 2022 • Ashwini Pokle, Jinjin Tian, Yuchen Li, Andrej Risteski

Some recent works however have shown promising results for non-contrastive learning, which does not require negative samples.

no code implementations • 27 Mar 2022 • Binghui Peng, Andrej Risteski

When the features are linear, we design an efficient gradient-based algorithm $\mathsf{DPGD}$, that is guaranteed to perform well on the current environment, as well as avoid catastrophic forgetting.

no code implementations • 18 Feb 2022 • Bingbin Liu, Daniel Hsu, Pradeep Ravikumar, Andrej Risteski

This lens is undoubtedly very interesting, but suffers from the problem that there isn't a "canonical" set of downstream tasks to focus on -- in practice, this problem is usually resolved by competing on the benchmark dataset du jour.

no code implementations • 17 Feb 2022 • Frederic Koehler, Holden Lee, Andrej Risteski

We consider Ising models on the hypercube with a general interaction matrix $J$, and give a polynomial time sampling algorithm when all but $O(1)$ eigenvalues of $J$ lie in an interval of length one, a situation which occurs in many models of interest.

1 code implementation • 14 Feb 2022 • Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski

Towards this end, we introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.

1 code implementation • ICLR 2022 • Frederic Koehler, Viraj Mehta, Chenghui Zhou, Andrej Risteski

Recent work by Dai and Wipf (2020) proposes a two-stage training algorithm for VAEs, based on a conjecture that in standard VAE training the generator will converge to a solution with 0 variance which is correctly supported on the ground truth manifold.

no code implementations • NeurIPS 2021 • Holden Lee, Chirag Pabbaraju, Anish Prasad Sevekari, Andrej Risteski

As ill-conditioned Jacobians are an obstacle for likelihood-based training, the fundamental question remains: which distributions can be approximated using well-conditioned affine coupling flows?

no code implementations • ICLR 2022 • Bingbin Liu, Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski

Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models.

no code implementations • ICML Workshop INNF 2021 • Divyansh Pareek, Andrej Risteski

Training and using modern neural-network based latent-variable generative models (like Variational Autoencoders) often require simultaneously training a generative direction along with an inferential(encoding) direction, which approximates the posterior distribution over the latent variables.

no code implementations • ICML Workshop INNF 2021 • Holden Lee, Chirag Pabbaraju, Anish Sevekari, Andrej Risteski

As ill-conditioned Jacobians are an obstacle for likelihood-based training, the fundamental question remains: which distributions can be approximated using well-conditioned affine coupling flows?

no code implementations • 18 Jun 2021 • Yining Chen, Elan Rosenfeld, Mark Sellke, Tengyu Ma, Andrej Risteski

Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments.

no code implementations • ACL 2021 • Yuchen Li, Andrej Risteski

Concretely, we ground this question in the sandbox of probabilistic context-free-grammars (PCFGs), and identify a key aspect of the representational power of these approaches: the amount and directionality of context that the predictor has access to when forced to make parsing decision.

no code implementations • NeurIPS 2021 • Tanya Marwah, Zachary C. Lipton, Andrej Risteski

Recent experiments have shown that deep networks can approximate solutions to high-dimensional PDEs, seemingly escaping the curse of dimensionality.

no code implementations • 3 Mar 2021 • Bingbin Liu, Pradeep Ravikumar, Andrej Risteski

Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data.

no code implementations • 25 Feb 2021 • Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski

A popular assumption for out-of-distribution generalization is that the training data comprises sub-datasets, each drawn from a distinct distribution; the goal is then to "interpolate" these distributions and "extrapolate" beyond them -- this objective is broadly known as domain generalization.

no code implementations • ICLR 2021 • Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski

We furthermore present the very first results in the non-linear regime: we demonstrate that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.

no code implementations • 2 Oct 2020 • Frederic Koehler, Viraj Mehta, Andrej Risteski

Normalizing flows are among the most popular paradigms in generative modeling, especially for images, primarily because we can efficiently evaluate the likelihood of a data point.

no code implementations • 30 Sep 2020 • Rong Ge, Holden Lee, Jianfeng Lu, Andrej Risteski

We give a algorithm for exact sampling from the Bingham distribution $p(x)\propto \exp(x^\top A x)$ on the sphere $\mathcal S^{d-1}$ with expected runtime of $\operatorname{poly}(d, \lambda_{\max}(A)-\lambda_{\min}(A))$.

no code implementations • ICML 2020 • Han Zhao, Junjie Hu, Andrej Risteski

The goal of universal machine translation is to learn to translate between any pair of languages, given a corpus of paired translated documents for \emph{a small subset} of all pairs of languages.

no code implementations • ICLR Workshop DeepDiffEq 2019 • Ankur Moitra, Andrej Risteski

In this paper, we study one aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability.

no code implementations • 13 Feb 2020 • Ankur Moitra, Andrej Risteski

In this paper, we focus on an aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability.

no code implementations • 25 Sep 2019 • Rares-Darius Buhai, Andrej Risteski, Yoni Halpern, David Sontag

One of the most surprising and exciting discoveries in supervising learning was the benefit of overparameterization (i. e. training a very large model) to improving the optimization landscape of a problem, with minimal effect on statistical performance (i. e. generalization).

1 code implementation • ICML 2020 • Rares-Darius Buhai, Yoni Halpern, Yoon Kim, Andrej Risteski, David Sontag

One of the most surprising and exciting discoveries in supervised learning was the benefit of overparameterization (i. e. training a very large model) to improving the optimization landscape of a problem, with minimal effect on statistical performance (i. e. generalization).

no code implementations • 30 May 2019 • Dylan J. Foster, Andrej Risteski

In agnostic tensor completion, we make no assumption on the rank of the unknown tensor, but attempt to predict unknown entries as well as the best rank-$r$ tensor.

no code implementations • ICLR 2019 • Frederic Koehler, Andrej Risteski

We give an almost-tight theoretical analysis of the performance of both neural networks and polynomials for this problem, as well as verify our theory with simulations.

no code implementations • 29 Nov 2018 • Rong Ge, Holden Lee, Andrej Risteski

Previous approaches rely on decomposing the state space as a partition of sets, while our approach can be thought of as decomposing the stationary measure as a mixture of distributions (a "soft partition").

no code implementations • 22 Aug 2018 • Vishesh Jain, Frederic Koehler, Andrej Risteski

More precisely, we show that the mean-field approximation is within $O((n\|J\|_{F})^{2/3})$ of the free energy, where $\|J\|_F$ denotes the Frobenius norm of the interaction matrix of the Ising model.

no code implementations • ICLR 2019 • Yu Bai, Tengyu Ma, Andrej Risteski

Our preliminary experiments show that on synthetic datasets the test IPM is well correlated with KL divergence or the Wasserstein distance, indicating that the lack of diversity in GANs may be caused by the sub-optimality in optimization instead of statistical inefficiency.

no code implementations • 29 May 2018 • Frederic Koehler, Andrej Risteski

We give almost-tight bounds on the performance of both neural networks and low degree polynomials for this problem.

no code implementations • ICLR 2018 • Sanjeev Arora, Andrej Risteski, Yi Zhang

Using this evidence is presented that well-known GANs approaches do learn distributions of fairly low support.

no code implementations • 7 Nov 2017 • Sanjeev Arora, Andrej Risteski, Yi Zhang

Encoder-decoder GANs architectures (e. g., BiGAN and ALI) seek to add an inference mechanism to the GANs setup, consisting of a small encoder deep net that maps data-points to their succinct encodings.

no code implementations • NeurIPS 2018 • Rong Ge, Holden Lee, Andrej Risteski

We analyze this Markov chain for the canonical multi-modal distribution: a mixture of gaussians (of equal variance).

no code implementations • 14 Jun 2017 • Sanjeev Arora, Andrej Risteski

There is general consensus that learning representations is useful for a variety of reasons, e. g. efficient use of labeled data (semi-supervised learning), transfer learning and understanding hidden structure of data.

no code implementations • 29 Apr 2017 • Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora

Our methods require very few linguistic resources, thus being applicable for Wordnet construction in low-resources languages, and may further be applied to sense clustering and other Wordnet improvements.

1 code implementation • WS 2017 • Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora

To evaluate our method we construct two 600-word testsets for word-to-synset matching in French and Russian using native speakers and evaluate the performance of our method along with several other recent approaches.

no code implementations • 22 Feb 2017 • Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, Sanjeev Arora

We take a first cut at explaining the expressivity of multilayer nets by giving a sufficient criterion for a function to be approximable by a neural network with $n$ hidden layers.

no code implementations • 28 Dec 2016 • Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski

Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables.

no code implementations • NeurIPS 2016 • Andrej Risteski, Yuanzhi Li

In recent years, a rapidly increasing number of applications in practice requires solving non-convex objectives, like training neural networks, learning graphical models, maximum likelihood estimation etc.

no code implementations • NeurIPS 2016 • Yuanzhi Li, YIngyu Liang, Andrej Risteski

Non-negative matrix factorization is a popular tool for decomposing data into feature and weight matrices under non-negativity constraints.

no code implementations • NeurIPS 2016 • Yuanzhi Li, Andrej Risteski

The well known maximum-entropy principle due to Jaynes, which states that given mean parameters, the maximum entropy distribution matching them is in an exponential family, has been very popular in machine learning due to its "Occam's razor" interpretation.

no code implementations • 11 Jul 2016 • Andrej Risteski

We make use of recent tools in combinatorial optimization: the Sherali-Adams and Lasserre convex programming hierarchies, in combination with variational methods to get algorithms for calculating partition functions in these families.

no code implementations • 6 Feb 2016 • Yuanzhi Li, YIngyu Liang, Andrej Risteski

We show that the properties only need to hold in an average sense and can be achieved by the clipping step.

1 code implementation • TACL 2018 • Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

A novel aspect of our technique is that each extracted word sense is accompanied by one of about 2000 "discourse atoms" that gives a succinct description of which other words co-occur with that word sense.

no code implementations • NeurIPS 2015 • Pranjal Awasthi, Andrej Risteski

The assumptions on the topic priors are related to the well known Dirichlet prior, introduced to the area of topic modeling by (Blei et al., 2003).

no code implementations • 7 Mar 2015 • Pranjal Awasthi, Moses Charikar, Kevin A. Lai, Andrej Risteski

We resolve an open question from (Christiano, 2014b) posed in COLT'14 regarding the optimal dependency of the regret achievable for online local learning on the size of the label set.

4 code implementations • TACL 2016 • Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.