Search Results for author: Krzysztof Choromanski

Found 66 papers, 19 papers with code

Quasi-Monte Carlo Graph Random Features

no code implementations21 May 2023 Isaac Reid, Krzysztof Choromanski, Adrian Weller

We present a novel mechanism to improve the accuracy of the recently-introduced class of graph random features (GRFs).

Taming graph kernels with random features

no code implementations29 Apr 2023 Krzysztof Choromanski

We also introduce a (still unbiased) quasi Monte Carlo variant of GRFs, q-GRFs, relying on the so-called reinforced random walks, that might be used to optimize the variance of GRFs.

Graph Clustering

Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

no code implementations31 Mar 2023 Rami Botros, Anmol Gulati, Tara N. Sainath, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu

Conformer models maintain a large number of internal states, the vast majority of which are associated with self-attention layers.

FAVOR#: Sharp Attention Kernel Approximations via New Classes of Positive Random Features

no code implementations1 Feb 2023 Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

The problem of efficient approximation of a linear operator induced by the Gaussian or softmax kernel is often addressed using random features (RFs) which yield an unbiased approximation of the operator's result.

Simplex Random Features

no code implementations31 Jan 2023 Isaac Reid, Krzysztof Choromanski, Valerii Likhosherstov, Adrian Weller

We present Simplex Random Features (SimRFs), a new random feature (RF) mechanism for unbiased approximation of the softmax and Gaussian kernels by geometrical correlation of random projection vectors.

Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation

no code implementations22 Sep 2022 Xuesu Xiao, Tingnan Zhang, Krzysztof Choromanski, Edward Lee, Anthony Francis, Jake Varley, Stephen Tu, Sumeet Singh, Peng Xu, Fei Xia, Sven Mikael Persson, Dmitry Kalashnikov, Leila Takayama, Roy Frostig, Jie Tan, Carolina Parada, Vikas Sindhwani

Despite decades of research, existing navigation systems still face real-world challenges when deployed in the wild, e. g., in cluttered home environments or in human-occupied public spaces.

Imitation Learning

Multiple View Performers for Shape Completion

no code implementations13 Sep 2022 David Watkins-Valls, Peter Allen, Krzysztof Choromanski, Jacob Varley, Nicholas Waytowich

We propose the Multiple View Performer (MVP) - a new architecture for 3D shape completion from a series of temporally sequential views.

Implicit Two-Tower Policies

no code implementations2 Aug 2022 Yunfan Zhao, Qingkai Pan, Krzysztof Choromanski, Deepali Jain, Vikas Sindhwani

We present a new class of structured reinforcement learning policy-architectures, Implicit Two-Tower (ITT) policies, where the actions are chosen based on the attention scores of their learnable latent representations with those of the input states.

OpenAI Gym Vocal Bursts Valence Prediction

Chefs' Random Tables: Non-Trigonometric Random Features

1 code implementation30 May 2022 Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

We introduce chefs' random tables (CRTs), a new class of non-trigonometric random features (RFs) to approximate Gaussian and softmax kernels.

PolyViT: Co-training Vision Transformers on Images, Videos and Audio

no code implementations25 Nov 2021 Valerii Likhosherstov, Anurag Arnab, Krzysztof Choromanski, Mario Lucic, Yi Tay, Adrian Weller, Mostafa Dehghani

Can we train a single transformer model capable of processing multiple modalities and datasets, whilst sharing almost all of its learnable parameters?

Audio Classification

Hybrid Random Features

1 code implementation ICLR 2022 Krzysztof Choromanski, Haoxian Chen, Han Lin, Yuanzhe Ma, Arijit Sehanobish, Deepali Jain, Michael S Ryoo, Jake Varley, Andy Zeng, Valerii Likhosherstov, Dmitry Kalashnikov, Vikas Sindhwani, Adrian Weller

We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) that automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.

Benchmarking

From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers

1 code implementation16 Jul 2021 Krzysztof Choromanski, Han Lin, Haoxian Chen, Tianyi Zhang, Arijit Sehanobish, Valerii Likhosherstov, Jack Parker-Holder, Tamas Sarlos, Adrian Weller, Thomas Weingarten

In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way.

Graph Attention

On the Expressive Power of Self-Attention Matrices

no code implementations7 Jun 2021 Valerii Likhosherstov, Krzysztof Choromanski, Adrian Weller

Our proof is constructive, enabling us to propose an algorithm for finding adaptive inputs and fixed self-attention parameters in order to approximate a given matrix.

LEMMA

Debiasing a First-order Heuristic for Approximate Bi-level Optimization

1 code implementation4 Jun 2021 Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared Davis, Adrian Weller

Approximate bi-level optimization (ABLO) consists of (outer-level) optimization problems, involving numerical (inner-level) optimization loops.

ES-ENAS: Efficient Evolutionary Optimization for Large Hybrid Search Spaces

1 code implementation19 Jan 2021 Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Qiuyi Zhang, Daiyi Peng, Deepali Jain, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Yuxiang Yang

In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters.

Combinatorial Optimization Continuous Control +3

MLGO: a Machine Learning Guided Compiler Optimizations Framework

1 code implementation13 Jan 2021 Mircea Trofin, Yundi Qian, Eugene Brevdo, Zinan Lin, Krzysztof Choromanski, David Li

Leveraging machine-learning (ML) techniques for compiler optimizations has been widely studied and explored in academia.

BIG-bench Machine Learning

Sub-Linear Memory: How to Make Performers SLiM

2 code implementations NeurIPS 2021 Valerii Likhosherstov, Krzysztof Choromanski, Jared Davis, Xingyou Song, Adrian Weller

Recent works proposed various linear self-attention mechanisms, scaling only as $O(L)$ for serial computation.

Rethinking Attention with Performers

12 code implementations ICLR 2021 Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller

We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness.

Ranked #15 on Image Generation on ImageNet 64x64 (Bits per dim metric)

Image Generation

Towards Tractable Optimism in Model-Based Reinforcement Learning

no code implementations21 Jun 2020 Aldo Pacchiano, Philip J. Ball, Jack Parker-Holder, Krzysztof Choromanski, Stephen Roberts

The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL).

Continuous Control Decision Making +4

An Ode to an ODE

no code implementations NeurIPS 2020 Krzysztof Choromanski, Jared Quincy Davis, Valerii Likhosherstov, Xingyou Song, Jean-Jacques Slotine, Jacob Varley, Honglak Lee, Adrian Weller, Vikas Sindhwani

We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d).

Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies

no code implementations13 Jun 2020 Yunhao Tang, Krzysztof Choromanski

Off-policy learning algorithms have been known to be sensitive to the choice of hyper-parameters.

Continuous Control

UFO-BLO: Unbiased First-Order Bilevel Optimization

no code implementations5 Jun 2020 Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared Davis, Adrian Weller

Bilevel optimization (BLO) is a popular approach with many applications including hyperparameter optimization, neural architecture search, adversarial robustness and model-agnostic meta-learning.

Adversarial Robustness Bilevel Optimization +4

Demystifying Orthogonal Monte Carlo and Beyond

no code implementations NeurIPS 2020 Han Lin, Haoxian Chen, Tianyi Zhang, Clement Laroche, Krzysztof Choromanski

Orthogonal Monte Carlo (OMC) is a very effective sampling algorithm imposing structural geometric conditions (orthogonality) on samples for variance reduction.

Time Dependence in Non-Autonomous Neural ODEs

no code implementations ICLR Workshop DeepDiffEq 2019 Jared Quincy Davis, Krzysztof Choromanski, Jake Varley, Honglak Lee, Jean-Jacques Slotine, Valerii Likhosterov, Adrian Weller, Ameesh Makadia, Vikas Sindhwani

Neural Ordinary Differential Equations (ODEs) are elegant reinterpretations of deep networks where continuous time can replace the discrete notion of depth, ODE solvers perform forward propagation, and the adjoint method enables efficient, constant memory backpropagation.

Image Classification Video Prediction

CWY Parametrization: a Solution for Parallelized Optimization of Orthogonal and Stiefel Matrices

no code implementations18 Apr 2020 Valerii Likhosherstov, Jared Davis, Krzysztof Choromanski, Adrian Weller

We introduce an efficient approach for optimization over orthogonal groups on highly parallel computation units such as GPUs or TPUs.

Machine Translation Translation +1

Robotic Table Tennis with Model-Free Reinforcement Learning

no code implementations31 Mar 2020 Wenbo Gao, Laura Graesser, Krzysztof Choromanski, Xingyou Song, Nevena Lazic, Pannag Sanketi, Vikas Sindhwani, Navdeep Jaitly

We propose a model-free algorithm for learning efficient policies capable of returning table tennis balls by controlling robot joints at a rate of 100Hz.

reinforcement-learning Reinforcement Learning (RL)

Stochastic Flows and Geometric Optimization on the Orthogonal Group

no code implementations ICML 2020 Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$.

Metric Learning Stochastic Optimization

Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning

no code implementations2 Mar 2020 Xingyou Song, Yuxiang Yang, Krzysztof Choromanski, Ken Caluwaerts, Wenbo Gao, Chelsea Finn, Jie Tan

Learning adaptable policies is crucial for robots to operate autonomously in our complex and quickly changing world.

Meta-Learning

Ready Policy One: World Building Through Active Learning

no code implementations ICML 2020 Philip Ball, Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts

Model-Based Reinforcement Learning (MBRL) offers a promising direction for sample efficient learning, often achieving state of the art results for continuous control tasks.

Active Learning Continuous Control +1

Behavior-Guided Reinforcement Learning

no code implementations25 Sep 2019 Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael I. Jordan

We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space.

reinforcement-learning Reinforcement Learning (RL)

Reinforcement Learning with Chromatic Networks

no code implementations25 Sep 2019 Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang

We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way.

Neural Architecture Search reinforcement-learning +1

ES-MAML: Simple Hessian-Free Meta Learning

1 code implementation ICLR 2020 Xingyou Song, Wenbo Gao, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano, Yunhao Tang

We introduce ES-MAML, a new framework for solving the model agnostic meta learning (MAML) problem based on Evolution Strategies (ES).

Meta-Learning

Reinforcement Learning with Chromatic Networks for Compact Architecture Search

no code implementations10 Jul 2019 Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang

We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way.

Combinatorial Optimization Neural Architecture Search +2

Learning to Score Behaviors for Guided Policy Optimization

1 code implementation ICML 2020 Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael. I. Jordan

We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space.

Efficient Exploration Imitation Learning +2

Structured Monte Carlo Sampling for Nonisotropic Distributions via Determinantal Point Processes

no code implementations29 May 2019 Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang

We propose a new class of structured methods for Monte Carlo (MC) sampling, called DPPMC, designed for high-dimensional nonisotropic distributions where samples are correlated to reduce the variance of the estimator via determinantal point processes.

Point Processes

Linear interpolation gives better gradients than Gaussian smoothing in derivative-free optimization

no code implementations29 May 2019 Albert S. Berahas, Liyuan Cao, Krzysztof Choromanski, Katya Scheinberg

We then demonstrate via rigorous analysis of the variance and by numerical comparisons on reinforcement learning tasks that the Gaussian sampling method used in [Salimans et al. 2016] is significantly inferior to the orthogonal sampling used in [Choromaski et al. 2018] as well as more general interpolation methods.

reinforcement-learning Reinforcement Learning (RL)

Variance Reduction for Evolution Strategies via Structured Control Variates

no code implementations29 May 2019 Yunhao Tang, Krzysztof Choromanski, Alp Kucukelbir

Evolution Strategies (ES) are a powerful class of blackbox optimization techniques that recently became a competitive alternative to state-of-the-art policy gradient (PG) algorithms for reinforcement learning (RL).

Reinforcement Learning (RL)

A Theoretical and Empirical Comparison of Gradient Approximations in Derivative-Free Optimization

no code implementations3 May 2019 Albert S. Berahas, Liyuan Cao, Krzysztof Choromanski, Katya Scheinberg

To this end, we use the results in [Berahas et al., 2019] and show how each method can satisfy the sufficient conditions, possibly only with some sufficiently large probability at each iteration, as happens to be the case with Gaussian smoothing and smoothing on a sphere.

Optimization and Control

Provably Robust Blackbox Optimization for Reinforcement Learning

no code implementations7 Mar 2019 Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Deepali Jain, Yuxiang Yang, Atil Iscen, Jasmine Hsu, Vikas Sindhwani

Interest in derivative-free optimization (DFO) and "evolutionary strategies" (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they can match state of the art methods for policy optimization problems in Robotics.

reinforcement-learning Reinforcement Learning (RL) +1

From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization

1 code implementation NeurIPS 2019 Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang

ASEBO adapts to the geometry of the function and learns optimal sets of sensing directions, which are used to probe it, on-the-fly.

Multi-Armed Bandits

Structured Evolution with Compact Architectures for Scalable Policy Optimization

no code implementations ICML 2018 Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard E. Turner, Adrian Weller

We present a new method of blackbox optimization via gradient approximation with the use of structured random orthogonal matrices, providing more accurate estimators than baselines and with provable theoretical guarantees.

OpenAI Gym Text-to-Image Generation

On the Needs for Rotations in Hypercubic Quantization Hashing

no code implementations12 Feb 2018 Anne Morvan, Antoine Souloumiac, Krzysztof Choromanski, Cédric Gouy-Pailler, Jamal Atif

The aim of this paper is to endow the well-known family of hypercubic quantization hashing methods with theoretical guarantees.

Dimensionality Reduction Quantization

Initialization matters: Orthogonal Predictive State Recurrent Neural Networks

no code implementations ICLR 2018 Krzysztof Choromanski, Carlton Downey, Byron Boots

In this paper, we extend the theory of ORFs to Kernel Ridge Regression and show that ORFs can be used to obtain Orthogonal PSRNNs (OPSRNNs), which are smaller and faster than PSRNNs.

regression Time Series Analysis

Manifold Regularization for Kernelized LSTD

no code implementations15 Oct 2017 Xinyan Yan, Krzysztof Choromanski, Byron Boots, Vikas Sindhwani

Policy evaluation or value function or Q-function approximation is a key procedure in reinforcement learning (RL).

Policy Gradient Methods Reinforcement Learning (RL)

Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car

8 code implementations25 Apr 2017 Mariusz Bojarski, Philip Yeres, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Lawrence Jackel, Urs Muller

This eliminates the need for human engineers to anticipate what is important in an image and foresee all the necessary rules for safe driving.

Autonomous Driving Self-Driving Cars

Graph sketching-based Space-efficient Data Clustering

1 code implementation7 Mar 2017 Anne Morvan, Krzysztof Choromanski, Cédric Gouy-Pailler, Jamal Atif

In this paper, we address the problem of recovering arbitrary-shaped data clusters from datasets while facing \emph{high space constraints}, as this is for instance the case in many real-world applications when analysis algorithms are directly deployed on resources-limited mobile devices collecting the data.

The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings

2 code implementations NeurIPS 2017 Krzysztof Choromanski, Mark Rowland, Adrian Weller

We examine a class of embeddings based on structured random matrices with orthogonal rows which can be applied in many machine learning applications including dimensionality reduction and kernel approximation.

BIG-bench Machine Learning Dimensionality Reduction

VisualBackProp: efficient visualization of CNNs

4 code implementations16 Nov 2016 Mariusz Bojarski, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Larry Jackel, Urs Muller, Karol Zieba

We furthermore justify our approach with theoretical arguments and theoretically confirm that the proposed method identifies sets of input pixels, rather than individual pixels, that collaboratively contribute to the prediction.

Self-Driving Cars

Orthogonal Random Features

no code implementations NeurIPS 2016 Felix X. Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, Sanjiv Kumar

We present an intriguing discovery related to Random Fourier Features: in Gaussian kernel approximation, replacing the random Gaussian matrix by a properly scaled random orthogonal matrix significantly decreases kernel approximation error.

Recycling Randomness with Structure for Sublinear time Kernel Expansions

no code implementations29 May 2016 Krzysztof Choromanski, Vikas Sindhwani

We propose a scheme for recycling Gaussian random vectors into structured matrices to approximate various kernel functions in sublinear time via random embeddings.

TripleSpin - a generic compact paradigm for fast machine learning computations

no code implementations29 May 2016 Krzysztof Choromanski, Francois Fagan, Cedric Gouy-Pailler, Anne Morvan, Tamas Sarlos, Jamal Atif

In particular, as a byproduct of the presented techniques and by using relatively new Berry-Esseen-type CLT for random vectors, we give the first theoretical guarantees for one of the most efficient existing LSH algorithms based on the $\textbf{HD}_{3}\textbf{HD}_{2}\textbf{HD}_{1}$ structured matrix ("Practical and Optimal LSH for Angular Distance").

BIG-bench Machine Learning Quantization

On the boosting ability of top-down decision tree learning algorithm for multiclass classification

no code implementations17 May 2016 Anna Choromanska, Krzysztof Choromanski, Mariusz Bojarski

We analyze the performance of the top-down multiclass classification algorithm for decision tree learning called LOMtree, recently proposed in the literature Choromanska and Langford (2014) for solving efficiently classification problems with very large number of classes.

General Classification

Fast nonlinear embeddings via structured matrices

no code implementations25 Apr 2016 Krzysztof Choromanski, Francois Fagan

Our framework covers as special cases already known structured approaches such as the Fast Johnson-Lindenstrauss Transform, but is much more general since it can be applied also to highly nonlinear embeddings.

Binary embeddings with structured hashed projections

no code implementations16 Nov 2015 Anna Choromanska, Krzysztof Choromanski, Mariusz Bojarski, Tony Jebara, Sanjiv Kumar, Yann Lecun

We prove several theoretical results showing that projections via various structured matrices followed by nonlinear mappings accurately preserve the angular distance between input high-dimensional vectors.

LEMMA

Quantization based Fast Inner Product Search

no code implementations4 Sep 2015 Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, David Simcha

We propose a quantization based approach for fast approximate Maximum Inner Product Search (MIPS).

Quantization

Fast Online Clustering with Randomized Skeleton Sets

no code implementations10 Jun 2015 Krzysztof Choromanski, Sanjiv Kumar, Xiaofeng Liu

To achieve fast clustering, we propose to represent each cluster by a skeleton set which is updated continuously as new data is seen.

Nonparametric Clustering Online Clustering

Notes on using Determinantal Point Processes for Clustering with Applications to Text Clustering

no code implementations26 Oct 2014 Apoorv Agarwal, Anna Choromanska, Krzysztof Choromanski

In this paper, we compare three initialization schemes for the KMEANS clustering algorithm: 1) random initialization (KMEANSRAND), 2) KMEANS++, and 3) KMEANSD++.

Point Processes Text Clustering

Differentially- and non-differentially-private random decision trees

no code implementations26 Oct 2014 Mariusz Bojarski, Anna Choromanska, Krzysztof Choromanski, Yann Lecun

We consider supervised learning with random decision trees, where the tree construction is completely random.

On Learning from Label Proportions

1 code implementation24 Feb 2014 Felix X. Yu, Krzysztof Choromanski, Sanjiv Kumar, Tony Jebara, Shih-Fu Chang

Learning from Label Proportions (LLP) is a learning setting, where the training data is provided in groups, or "bags", and only the proportion of each class in each bag is known.

Marketing

Cannot find the paper you are looking for? You can Submit a new open access paper.