Search Results for author: Andrej Risteski

Found 41 papers, 4 papers with code

Variational autoencoders in the presence of low-dimensional data: landscape and implicit bias

no code implementations13 Dec 2021 Frederic Koehler, Viraj Mehta, Andrej Risteski, Chenghui Zhou

Recent work by Dai and Wipf (2019) suggests that on low-dimensional data, the generator will converge to a solution with 0 variance which is correctly supported on the ground truth manifold.

Universal Approximation Using Well-Conditioned Normalizing Flows

no code implementations NeurIPS 2021 Holden Lee, Chirag Pabbaraju, Anish Prasad Sevekari, Andrej Risteski

As ill-conditioned Jacobians are an obstacle for likelihood-based training, the fundamental question remains: which distributions can be approximated using well-conditioned affine coupling flows?

Analyzing and Improving the Optimization Landscape of Noise-Contrastive Estimation

no code implementations21 Oct 2021 Bingbin Liu, Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski

Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models.

The Effects of Invertibility on the Representational Complexity of Encoders in Variational Autoencoders

no code implementations ICML Workshop INNF 2021 Divyansh Pareek, Andrej Risteski

Training and using modern neural-network based latent-variable generative models (like Variational Autoencoders) often require simultaneously training a generative direction along with an inferential(encoding) direction, which approximates the posterior distribution over the latent variables.

Universal Approximation for Log-concave Distributions using Well-conditioned Normalizing Flows

no code implementations ICML Workshop INNF 2021 Holden Lee, Chirag Pabbaraju, Anish Sevekari, Andrej Risteski

As ill-conditioned Jacobians are an obstacle for likelihood-based training, the fundamental question remains: which distributions can be approximated using well-conditioned affine coupling flows?

Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments

no code implementations18 Jun 2021 Yining Chen, Elan Rosenfeld, Mark Sellke, Tengyu Ma, Andrej Risteski

Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments.

Domain Generalization

The Limitations of Limited Context for Constituency Parsing

no code implementations ACL 2021 Yuchen Li, Andrej Risteski

Concretely, we ground this question in the sandbox of probabilistic context-free-grammars (PCFGs), and identify a key aspect of the representational power of these approaches: the amount and directionality of context that the predictor has access to when forced to make parsing decision.

Constituency Parsing Language Modelling

Parametric Complexity Bounds for Approximating PDEs with Neural Networks

no code implementations NeurIPS 2021 Tanya Marwah, Zachary C. Lipton, Andrej Risteski

Recent experiments have shown that deep networks can approximate solutions to high-dimensional PDEs, seemingly escaping the curse of dimensionality.

Contrastive learning of strong-mixing continuous-time stochastic processes

no code implementations3 Mar 2021 Bingbin Liu, Pradeep Ravikumar, Andrej Risteski

Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data.

Contrastive Learning Time Series

An Online Learning Approach to Interpolation and Extrapolation in Domain Generalization

no code implementations25 Feb 2021 Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski

A popular assumption for out-of-distribution generalization is that the training data comprises sub-datasets, each drawn from a distinct distribution; the goal is then to "interpolate" these distributions and "extrapolate" beyond them -- this objective is broadly known as domain generalization.

Domain Generalization

The Risks of Invariant Risk Minimization

no code implementations ICLR 2021 Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski

We furthermore present the very first results in the non-linear regime: we demonstrate that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.

Representational aspects of depth and conditioning in normalizing flows

no code implementations2 Oct 2020 Frederic Koehler, Viraj Mehta, Andrej Risteski

Normalizing flows are among the most popular paradigms in generative modeling, especially for images, primarily because we can efficiently evaluate the likelihood of a data point.

Efficient sampling from the Bingham distribution

no code implementations30 Sep 2020 Rong Ge, Holden Lee, Jianfeng Lu, Andrej Risteski

We give a algorithm for exact sampling from the Bingham distribution $p(x)\propto \exp(x^\top A x)$ on the sphere $\mathcal S^{d-1}$ with expected runtime of $\operatorname{poly}(d, \lambda_{\max}(A)-\lambda_{\min}(A))$.

On Learning Language-Invariant Representations for Universal Machine Translation

no code implementations ICML 2020 Han Zhao, Junjie Hu, Andrej Risteski

The goal of universal machine translation is to learn to translate between any pair of languages, given a corpus of paired translated documents for \emph{a small subset} of all pairs of languages.

Machine Translation Translation

Fast Convergence for Langevin with Matrix Manifold Structure

no code implementations ICLR Workshop DeepDiffEq 2019 Ankur Moitra, Andrej Risteski

In this paper, we study one aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability.

Bayesian Inference

Fast Convergence for Langevin Diffusion with Manifold Structure

no code implementations13 Feb 2020 Ankur Moitra, Andrej Risteski

In this paper, we focus on an aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability.

Bayesian Inference

Benefits of Overparameterization in Single-Layer Latent Variable Generative Models

no code implementations25 Sep 2019 Rares-Darius Buhai, Andrej Risteski, Yoni Halpern, David Sontag

One of the most surprising and exciting discoveries in supervising learning was the benefit of overparameterization (i. e. training a very large model) to improving the optimization landscape of a problem, with minimal effect on statistical performance (i. e. generalization).

Variational Inference

Empirical Study of the Benefits of Overparameterization in Learning Latent Variable Models

1 code implementation ICML 2020 Rares-Darius Buhai, Yoni Halpern, Yoon Kim, Andrej Risteski, David Sontag

One of the most surprising and exciting discoveries in supervised learning was the benefit of overparameterization (i. e. training a very large model) to improving the optimization landscape of a problem, with minimal effect on statistical performance (i. e. generalization).

Variational Inference

Sum-of-squares meets square loss: Fast rates for agnostic tensor completion

no code implementations30 May 2019 Dylan J. Foster, Andrej Risteski

In agnostic tensor completion, we make no assumption on the rank of the unknown tensor, but attempt to predict unknown entries as well as the best rank-$r$ tensor.

Matrix Completion

The Comparative Power of ReLU Networks and Polynomial Kernels in the Presence of Sparse Latent Structure

no code implementations ICLR 2019 Frederic Koehler, Andrej Risteski

We give an almost-tight theoretical analysis of the performance of both neural networks and polynomials for this problem, as well as verify our theory with simulations.

Simulated Tempering Langevin Monte Carlo II: An Improved Proof using Soft Markov Chain Decomposition

no code implementations29 Nov 2018 Rong Ge, Holden Lee, Andrej Risteski

Previous approaches rely on decomposing the state space as a partition of sets, while our approach can be thought of as decomposing the stationary measure as a mixture of distributions (a "soft partition").

Mean-field approximation, convex hierarchies, and the optimality of correlation rounding: a unified perspective

no code implementations22 Aug 2018 Vishesh Jain, Frederic Koehler, Andrej Risteski

More precisely, we show that the mean-field approximation is within $O((n\|J\|_{F})^{2/3})$ of the free energy, where $\|J\|_F$ denotes the Frobenius norm of the interaction matrix of the Ising model.

Approximability of Discriminators Implies Diversity in GANs

no code implementations ICLR 2019 Yu Bai, Tengyu Ma, Andrej Risteski

Our preliminary experiments show that on synthetic datasets the test IPM is well correlated with KL divergence or the Wasserstein distance, indicating that the lack of diversity in GANs may be caused by the sub-optimality in optimization instead of statistical inefficiency.

Representational Power of ReLU Networks and Polynomial Kernels: Beyond Worst-Case Analysis

no code implementations29 May 2018 Frederic Koehler, Andrej Risteski

We give almost-tight bounds on the performance of both neural networks and low degree polynomials for this problem.

Do GANs learn the distribution? Some Theory and Empirics

no code implementations ICLR 2018 Sanjeev Arora, Andrej Risteski, Yi Zhang

Using this evidence is presented that well-known GANs approaches do learn distributions of fairly low support.

Theoretical limitations of Encoder-Decoder GAN architectures

no code implementations7 Nov 2017 Sanjeev Arora, Andrej Risteski, Yi Zhang

Encoder-decoder GANs architectures (e. g., BiGAN and ALI) seek to add an inference mechanism to the GANs setup, consisting of a small encoder deep net that maps data-points to their succinct encodings.

Provable benefits of representation learning

no code implementations14 Jun 2017 Sanjeev Arora, Andrej Risteski

There is general consensus that learning representations is useful for a variety of reasons, e. g. efficient use of labeled data (semi-supervised learning), transfer learning and understanding hidden structure of data.

Representation Learning Transfer Learning

Extending and Improving Wordnet via Unsupervised Word Embeddings

no code implementations29 Apr 2017 Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora

Our methods require very few linguistic resources, thus being applicable for Wordnet construction in low-resources languages, and may further be applied to sense clustering and other Wordnet improvements.

Word Embeddings

Automated WordNet Construction Using Word Embeddings

1 code implementation WS 2017 Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora

To evaluate our method we construct two 600-word testsets for word-to-synset matching in French and Russian using native speakers and evaluate the performance of our method along with several other recent approaches.

Information Retrieval Machine Translation +3

On the ability of neural nets to express distributions

no code implementations22 Feb 2017 Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, Sanjeev Arora

We take a first cut at explaining the expressivity of multilayer nets by giving a sufficient criterion for a function to be approximable by a neural network with $n$ hidden layers.

Provable learning of Noisy-or Networks

no code implementations28 Dec 2016 Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski

Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables.

Tensor Decomposition Topic Models

Algorithms and matching lower bounds for approximately-convex optimization

no code implementations NeurIPS 2016 Andrej Risteski, Yuanzhi Li

In recent years, a rapidly increasing number of applications in practice requires solving non-convex objectives, like training neural networks, learning graphical models, maximum likelihood estimation etc.

Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates

no code implementations NeurIPS 2016 Yuanzhi Li, YIngyu Liang, Andrej Risteski

Non-negative matrix factorization is a popular tool for decomposing data into feature and weight matrices under non-negativity constraints.

Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods

no code implementations NeurIPS 2016 Yuanzhi Li, Andrej Risteski

The well known maximum-entropy principle due to Jaynes, which states that given mean parameters, the maximum entropy distribution matching them is in an exponential family, has been very popular in machine learning due to its "Occam's razor" interpretation.

How to calculate partition functions using convex programming hierarchies: provable bounds for variational methods

no code implementations11 Jul 2016 Andrej Risteski

We make use of recent tools in combinatorial optimization: the Sherali-Adams and Lasserre convex programming hierarchies, in combination with variational methods to get algorithms for calculating partition functions in these families.

Combinatorial Optimization

Recovery guarantee of weighted low-rank approximation via alternating minimization

no code implementations6 Feb 2016 Yuanzhi Li, YIngyu Liang, Andrej Risteski

We show that the properties only need to hold in an average sense and can be achieved by the clipping step.

Matrix Completion

Linear Algebraic Structure of Word Senses, with Applications to Polysemy

1 code implementation TACL 2018 Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

A novel aspect of our technique is that each extracted word sense is accompanied by one of about 2000 "discourse atoms" that gives a succinct description of which other words co-occur with that word sense.

Information Retrieval Word Embeddings

On some provably correct cases of variational inference for topic models

no code implementations NeurIPS 2015 Pranjal Awasthi, Andrej Risteski

The assumptions on the topic priors are related to the well known Dirichlet prior, introduced to the area of topic modeling by (Blei et al., 2003).

Dictionary Learning Topic Models +1

Label optimal regret bounds for online local learning

no code implementations7 Mar 2015 Pranjal Awasthi, Moses Charikar, Kevin A. Lai, Andrej Risteski

We resolve an open question from (Christiano, 2014b) posed in COLT'14 regarding the optimal dependency of the regret achievable for online local learning on the size of the label set.

Collaborative Filtering

A Latent Variable Model Approach to PMI-based Word Embeddings

4 code implementations TACL 2016 Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods.

Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.