Search Results for author: Sanjiv Kumar

Found 92 papers, 15 papers with code

TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s

no code implementations28 Jun 2022 Felix Chern, Blake Hechtman, Andy Davis, Ruiqi Guo, David Majnemer, Sanjiv Kumar

This paper presents a novel nearest neighbor search algorithm achieving TPU (Google Tensor Processing Unit) peak performance, outperforming state-of-the-art GPU algorithms with similar level of recall.

ELM: Embedding and Logit Margins for Long-Tail Learning

no code implementations27 Apr 2022 Wittawat Jitkrittum, Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners.

Contrastive Learning Long-tail Learning

Predicting on the Edge: Identifying Where a Larger Model Does Better

no code implementations15 Feb 2022 Taman Narayan, Heinrich Jiang, Sen Zhao, Sanjiv Kumar

Much effort has been devoted to making large and more accurate models, but relatively little has been put into understanding which examples are benefiting from the added complexity.

Robust Training of Neural Networks Using Scale Invariant Architectures

no code implementations2 Feb 2022 Zhiyuan Li, Srinadh Bhojanapalli, Manzil Zaheer, Sashank J. Reddi, Sanjiv Kumar

In contrast to SGD, adaptive gradient methods like Adam allow robust training of modern deep networks, especially large language models.

Efficient Training of Retrieval Models using Negative Cache

2 code implementations NeurIPS 2021 Erik Lindgren, Sashank Reddi, Ruiqi Guo, Sanjiv Kumar

These models are typically trained by optimizing the model parameters to score relevant positive" pairs higher than the irrelevantnegative" ones.

Information Retrieval

When in Doubt, Summon the Titans: Efficient Inference with Large Models

no code implementations19 Oct 2021 Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar

In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher.

Image Classification Natural Language Processing

Leveraging redundancy in attention with Reuse Transformers

no code implementations13 Oct 2021 Srinadh Bhojanapalli, Ayan Chakrabarti, Andreas Veit, Michal Lukasik, Himanshu Jain, Frederick Liu, Yin-Wen Chang, Sanjiv Kumar

Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision.

Model-Efficient Deep Learning with Kernelized Classification

no code implementations29 Sep 2021 Sadeep Jayasumana, Srikumar Ramalingam, Sanjiv Kumar

We investigate the possibility of using the embeddings produced by a lightweight network more effectively with a nonlinear classification layer.

Classification Natural Language Processing

When in Doubt, Summon the Titans: A Framework for Efficient Inference with Large Models

no code implementations29 Sep 2021 Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar

In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher.

Image Classification Natural Language Processing

In defense of dual-encoders for neural ranking

no code implementations29 Sep 2021 Aditya Krishna Menon, Sadeep Jayasumana, Seungyeon Kim, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

Transformer-based models such as BERT have proven successful in information retrieval problem, which seek to identify relevant documents for a given query.

Information Retrieval Natural Questions

Batch Active Learning at Scale

no code implementations NeurIPS 2021 Gui Citovsky, Giulia Desalvo, Claudio Gentile, Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, Sanjiv Kumar

The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources.

Active Learning

Teacher's pet: understanding and mitigating biases in distillation

no code implementations19 Jun 2021 Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model.

Knowledge Distillation

Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation

no code implementations16 Jun 2021 Srinadh Bhojanapalli, Ayan Chakrabarti, Himanshu Jain, Sanjiv Kumar, Michal Lukasik, Andreas Veit

State-of-the-art transformer models use pairwise dot-product based self-attention, which comes at a computational cost quadratic in the input sequence length.

Balancing Robustness and Sensitivity using Feature Contrastive Learning

no code implementations19 May 2021 Seungyeon Kim, Daniel Glasner, Srikumar Ramalingam, Cho-Jui Hsieh, Kishore Papineni, Sanjiv Kumar

It is generally believed that robust training of extremely large networks is critical to their success in real-world applications.

Contrastive Learning

Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

no code implementations12 May 2021 Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar

Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account.

RankDistil: Knowledge Distillation for Ranking

no code implementations AISTATS 2021 Sashank J. Reddi, Rama Kumar Pasumarthi, Aditya Krishna Menon, Ankit Singh Rawat Felix Yu, Seungyeon Kim, Andreas Veit, Sanjiv Kumar

Knowledge distillation is an approach to improve the performance of a student model by using the knowledge of a complex teacher. Despite its success in several deep learning applications, the study of distillation is mostly confined to classification settings.

Document Ranking Knowledge Distillation

On the Reproducibility of Neural Network Predictions

no code implementations5 Feb 2021 Srinadh Bhojanapalli, Kimberly Wilber, Andreas Veit, Ankit Singh Rawat, Seungyeon Kim, Aditya Menon, Sanjiv Kumar

By analyzing the relationship between churn and prediction confidences, we pursue an approach with two components for churn reduction.

Data Augmentation Image Classification

Overparameterisation and worst-case generalisation: friend or foe?

no code implementations ICLR 2021 Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Overparameterised neural networks have demonstrated the remarkable ability to perfectly fit training samples, while still generalising to unseen test samples.

Structured Prediction

O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers

no code implementations NeurIPS 2020 Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar

We propose sufficient conditions under which we prove that a sparse attention model can universally approximate any sequence-to-sequence function.

Modifying Memories in Transformer Models

no code implementations1 Dec 2020 Chen Zhu, Ankit Singh Rawat, Manzil Zaheer, Srinadh Bhojanapalli, Daliang Li, Felix Yu, Sanjiv Kumar

In this paper, we propose a new task of \emph{explicitly modifying specific factual knowledge in Transformer models while ensuring the model performance does not degrade on the unmodified facts}.

Coping with Label Shift via Distributionally Robust Optimisation

no code implementations ICLR 2021 Jingzhao Zhang, Aditya Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra

The label shift problem refers to the supervised learning setting where the train and test label distributions do not match.

Semantic Label Smoothing for Sequence to Sequence Problems

no code implementations EMNLP 2020 Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar

Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising.

Machine Translation Translation

Learning discrete distributions: user vs item-level privacy

no code implementations NeurIPS 2020 Yuhan Liu, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, Michael Riley

If each user has $m$ samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be $\mathcal{O}(k/(m\alpha^2) + k/\epsilon\alpha)$ to achieve an $\ell_1$ distance of $\alpha$ between the true and estimated distributions, with the privacy-induced penalty $k/\epsilon\alpha$ independent of the number of samples per user $m$.

Federated Learning

Multi-Stage Influence Function

no code implementations NeurIPS 2020 Hongge Chen, Si Si, Yang Li, Ciprian Chelba, Sanjiv Kumar, Duane Boning, Cho-Jui Hsieh

With this score, we can identify the pretraining examples in the pretraining task that contribute most to a prediction in the finetuning task.

Natural Language Processing Transfer Learning

Long-tail learning via logit adjustment

2 code implementations ICLR 2021 Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples.

Long-tail Learning

$O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers

no code implementations NeurIPS 2020 Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

We propose sufficient conditions under which we prove that a sparse attention model can universally approximate any sequence-to-sequence function.

Why distillation helps: a statistical perspective

no code implementations21 May 2020 Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Seungyeon Kim, Sanjiv Kumar

In this paper, we present a statistical perspective on distillation which addresses this question, and provides a novel connection to extreme multiclass retrieval techniques.

Knowledge Distillation

Can gradient clipping mitigate label noise?

1 code implementation ICLR 2020 Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

Gradient clipping is a widely-used technique in the training of deep networks, and is generally motivated from an optimisation lens: informally, it controls the dynamics of iterates, thus enhancing the rate of convergence to a local minimum.

Doubly-stochastic mining for heterogeneous retrieval

no code implementations23 Apr 2020 Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar

Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e. g., users of a retrieval system may be from different countries), each of which poses a challenge.

Stochastic Optimization

Federated Learning with Only Positive Labels

no code implementations ICML 2020 Felix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

We consider learning a multi-class classification model in the federated setting, where each user has access to the positive data associated with only a single class.

Federated Learning Multi-class Classification

Robust Large-Margin Learning in Hyperbolic Space

no code implementations NeurIPS 2020 Melanie Weber, Manzil Zaheer, Ankit Singh Rawat, Aditya Menon, Sanjiv Kumar

In this paper, we present, to our knowledge, the first theoretical guarantees for learning a classifier in hyperbolic rather than Euclidean space.

Representation Learning

Does label smoothing mitigate label noise?

no code implementations ICML 2020 Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors.

Learning with noisy labels

Adaptive Federated Optimization

3 code implementations ICLR 2021 Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, H. Brendan McMahan

Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data.

Federated Learning

Pre-training Tasks for Embedding-based Large-scale Retrieval

no code implementations ICLR 2020 Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, Sanjiv Kumar

We consider the large-scale query-document retrieval problem: given a query (e. g., a question), return the set of relevant documents (e. g., paragraphs containing the answer) from a large document corpus.

Information Retrieval Link Prediction

New Loss Functions for Fast Maximum Inner Product Search

no code implementations ICLR 2020 Ruiqi Guo, Quan Geng, David Simcha, Felix Chern, Phil Sun, Sanjiv Kumar

In this work, we focus directly on minimizing error in inner product approximation and derive a new class of quantization loss functions.

Quantization

Are Transformers universal approximators of sequence-to-sequence functions?

no code implementations ICLR 2020 Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

In this paper, we establish that Transformer models are universal approximators of continuous permutation equivariant sequence-to-sequence functions with compact support, which is quite surprising given the amount of shared parameters in these models.

Why are Adaptive Methods Good for Attention Models?

no code implementations NeurIPS 2020 Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J. Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the \emph{de facto} algorithm in deep learning, adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across important tasks, such as attention models.

Multilabel reductions: what is my loss optimising?

no code implementations NeurIPS 2019 Aditya K. Menon, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar

Multilabel classification is a challenging problem arising in applications ranging from information retrieval to image tagging.

General Classification Information Retrieval

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

no code implementations NeurIPS 2019 Chuan Guo, Ali Mousavi, Xiang Wu, Daniel N. Holtmann-Rice, Satyen Kale, Sashank Reddi, Sanjiv Kumar

In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy.

Classification Data Augmentation +1

Learning to Learn by Zeroth-Order Oracle

1 code implementation ICLR 2020 Yangjun Ruan, Yuanhao Xiong, Sashank Reddi, Sanjiv Kumar, Cho-Jui Hsieh

In the learning to learn (L2L) framework, we cast the design of optimization algorithms as a machine learning problem and use deep neural networks to learn the update rules.

Adversarial Attack

LEARNING TO LEARN WITH BETTER CONVERGENCE

no code implementations25 Sep 2019 Patrick H. Chen, Sashank Reddi, Sanjiv Kumar, Cho-Jui Hsieh

We consider the learning to learn problem, where the goal is to leverage deeplearning models to automatically learn (iterative) optimization algorithms for training machine learning models.

Why ADAM Beats SGD for Attention Models

no code implementations25 Sep 2019 Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning, adaptive methods like Adam have been observed to outperform SGD across important tasks, such as attention models.

Concise Multi-head Attention Models

no code implementations25 Sep 2019 Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar

Attention based Transformer architecture has enabled significant advances in the field of natural language processing.

Natural Language Processing

Online Hierarchical Clustering Approximations

no code implementations20 Sep 2019 Aditya Krishna Menon, Anand Rajagopalan, Baris Sumengen, Gui Citovsky, Qin Cao, Sanjiv Kumar

The second algorithm, OHAC, is an online counterpart to offline HAC, which is known to yield a 1/3-approximation to the MW revenue, and produce good quality clusters in practice.

Accelerating Large-Scale Inference with Anisotropic Vector Quantization

2 code implementations ICML 2020 Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, Sanjiv Kumar

Based on the observation that for a given query, the database points that have the largest inner products are more relevant, we develop a family of anisotropic quantization loss functions.

Quantization

AdaCliP: Adaptive Clipping for Private SGD

1 code implementation20 Aug 2019 Venkatadheeraj Pichapati, Ananda Theertha Suresh, Felix X. Yu, Sashank J. Reddi, Sanjiv Kumar

Motivated by this, differentially private stochastic gradient descent (SGD) algorithms for training machine learning models have been proposed.

Machine Learning Privacy Preserving

Sampled Softmax with Random Fourier Features

no code implementations NeurIPS 2019 Ankit Singh Rawat, Jiecao Chen, Felix Yu, Ananda Theertha Suresh, Sanjiv Kumar

For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method.

Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise

1 code implementation5 Jun 2019 Xuanqing Liu, Tesi Xiao, Si Si, Qin Cao, Sanjiv Kumar, Cho-Jui Hsieh

In this paper, we propose a new continuous neural network framework called Neural Stochastic Differential Equation (Neural SDE) network, which naturally incorporates various commonly used regularization mechanisms based on random noise injection.

On the Convergence of Adam and Beyond

2 code implementations ICLR 2018 Sashank J. Reddi, Satyen Kale, Sanjiv Kumar

Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients.

Stochastic Optimization

Local Orthogonal Decomposition for Maximum Inner Product Search

no code implementations25 Mar 2019 Xiang Wu, Ruiqi Guo, Sanjiv Kumar, David Simcha

More specifically, we decompose a residual vector locally into two orthogonal components and perform uniform quantization and multiscale quantization to each component respectively.

Quantization

Efficient Inner Product Approximation in Hybrid Spaces

no code implementations20 Mar 2019 Xiang Wu, Ruiqi Guo, David Simcha, Dave Dopson, Sanjiv Kumar

In this paper, we propose a technique that approximates the inner product computation in hybrid vectors, leading to substantial speedup in search while maintaining high accuracy.

Network Embedding

Escaping Saddle Points with Adaptive Gradient Methods

no code implementations26 Jan 2019 Matthew Staib, Sashank J. Reddi, Satyen Kale, Sanjiv Kumar, Suvrit Sra

Adaptive methods such as Adam and RMSProp are widely used in deep learning but are not well understood.

Adaptive Methods for Nonconvex Optimization

1 code implementation NeurIPS 2018 Manzil Zaheer, Sashank Reddi, Devendra Sachan, Satyen Kale, Sanjiv Kumar

In this work, we provide a new analysis of such methods applied to nonconvex stochastic optimization problems, characterizing the effect of increasing minibatch size.

Stochastic Optimization

Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks

no code implementations ICLR 2019 Patrick H. Chen, Si Si, Sanjiv Kumar, Yang Li, Cho-Jui Hsieh

The algorithm achieves an order of magnitude faster inference than the original softmax layer for predicting top-$k$ words in various tasks such as beam search in machine translation or next words prediction.

Machine Translation Translation

Stochastic Negative Mining for Learning with Large Output Spaces

no code implementations16 Oct 2018 Sashank J. Reddi, Satyen Kale, Felix Yu, Dan Holtmann-Rice, Jiecao Chen, Sanjiv Kumar

Furthermore, we identify a particularly intuitive class of loss functions in the aforementioned family and show that they are amenable to practical implementation in the large output space setting (i. e. computation is possible without evaluating scores of all labels) by developing a technique called Stochastic Negative Mining.

Privacy and Utility Tradeoff in Approximate Differential Privacy

no code implementations1 Oct 2018 Quan Geng, Wei Ding, Ruiqi Guo, Sanjiv Kumar

We show that the multiplicative gap of the lower bounds and upper bounds goes to zero in various high privacy regimes, proving the tightness of the lower and upper bounds and thus establishing the optimality of the truncated Laplacian mechanism.

Optimal Noise-Adding Mechanism in Additive Differential Privacy

no code implementations26 Sep 2018 Quan Geng, Wei Ding, Ruiqi Guo, Sanjiv Kumar

We derive the optimal $(0, \delta)$-differentially private query-output independent noise-adding mechanism for single real-valued query function under a general cost-minimization framework.

Loss Decomposition for Fast Learning in Large Output Spaces

no code implementations ICML 2018 Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, Pradeep Ravikumar

For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space.

Word Embeddings

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling

1 code implementation26 Jun 2018 Shanshan Wu, Alexandros G. Dimakis, Sujay Sanghavi, Felix X. Yu, Daniel Holtmann-Rice, Dmitry Storcheus, Afshin Rostamizadeh, Sanjiv Kumar

Our experiments show that there is indeed additional structure beyond sparsity in the real datasets; our method is able to discover it and exploit it to create excellent reconstructions with fewer measurements (by a factor of 1. 1-3x) compared to the previous state-of-the-art methods.

Extreme Multi-Label Classification Multi-Label Classification +1

Nonlinear Online Learning with Adaptive Nyström Approximation

no code implementations21 Feb 2018 Si Si, Sanjiv Kumar, Yang Li

Use of nonlinear feature maps via kernel approximation has led to success in many online learning tasks.

online learning

Now Playing: Continuous low-power music recognition

no code implementations29 Nov 2017 Blaise Agüera y Arcas, Beat Gfeller, Ruiqi Guo, Kevin Kilgour, Sanjiv Kumar, James Lyon, Julian Odell, Marvin Ritter, Dominik Roblek, Matthew Sharifi, Mihajlo Velimirović

To reduce battery consumption, a small music detector runs continuously on the mobile device's DSP chip and wakes up the main application processor only when it is confident that music is present.

Learning Spread-out Local Feature Descriptors

2 code implementations ICCV 2017 Xu Zhang, Felix X. Yu, Sanjiv Kumar, Shih-Fu Chang

We propose a simple, yet powerful regularization technique that can be used to significantly improve both the pairwise and triplet losses in learning local feature descriptors.

Efficient Natural Language Response Suggestion for Smart Reply

no code implementations1 May 2017 Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-Hsuan Sung, Laszlo Lukacs, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, Ray Kurzweil

This paper presents a computationally efficient machine-learned method for natural language response suggestion.

Stochastic Generative Hashing

2 code implementations ICML 2017 Bo Dai, Ruiqi Guo, Sanjiv Kumar, Niao He, Le Song

Learning-based binary hashing has become a powerful paradigm for fast search and retrieval in massive databases.

Distributed Mean Estimation with Limited Communication

no code implementations ICML 2017 Ananda Theertha Suresh, Felix X. Yu, Sanjiv Kumar, H. Brendan McMahan

Motivated by the need for distributed learning and optimization algorithms with low communication cost, we study communication efficient algorithms for distributed mean estimation.

Quantization

Orthogonal Random Features

no code implementations NeurIPS 2016 Felix X. Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, Sanjiv Kumar

We present an intriguing discovery related to Random Fourier Features: in Gaussian kernel approximation, replacing the random Gaussian matrix by a properly scaled random orthogonal matrix significantly decreases kernel approximation error.

Spherical Random Features for Polynomial Kernels

no code implementations NeurIPS 2015 Jeffrey Pennington, Felix Xinnan X. Yu, Sanjiv Kumar

Among the commonly used kernels for nonlinear classification are polynomial kernels, for which low approximation error has thus far necessitated explicit feature maps of large dimensionality, especially for higher-order polynomials.

General Classification

Fast Orthogonal Projection Based on Kronecker Product

no code implementations ICCV 2015 Xu Zhang, Felix X. Yu, Ruiqi Guo, Sanjiv Kumar, Shengjin Wang, Shi-Fu Chang

We propose a family of structured matrices to speed up orthogonal projections for high-dimensional data commonly seen in computer vision applications.

Image Retrieval Quantization

On Binary Embedding using Circulant Matrices

no code implementations20 Nov 2015 Felix X. Yu, Aditya Bhaskara, Sanjiv Kumar, Yunchao Gong, Shih-Fu Chang

To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix.

Binary embeddings with structured hashed projections

no code implementations16 Nov 2015 Anna Choromanska, Krzysztof Choromanski, Mariusz Bojarski, Tony Jebara, Sanjiv Kumar, Yann Lecun

We prove several theoretical results showing that projections via various structured matrices followed by nonlinear mappings accurately preserve the angular distance between input high-dimensional vectors.

Structured Transforms for Small-Footprint Deep Learning

no code implementations NeurIPS 2015 Vikas Sindhwani, Tara N. Sainath, Sanjiv Kumar

We consider the task of building compact deep learning pipelines suitable for deployment on storage and power constrained mobile devices.

Keyword Spotting speech-recognition +1

Learning to Hash for Indexing Big Data - A Survey

no code implementations17 Sep 2015 Jun Wang, Wei Liu, Sanjiv Kumar, Shih-Fu Chang

Such learning to hash methods exploit information such as data distributions or class labels when optimizing the hash codes or functions.

Quantization based Fast Inner Product Search

no code implementations4 Sep 2015 Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, David Simcha

We propose a quantization based approach for fast approximate Maximum Inner Product Search (MIPS).

Quantization

Fast Online Clustering with Randomized Skeleton Sets

no code implementations10 Jun 2015 Krzysztof Choromanski, Sanjiv Kumar, Xiaofeng Liu

To achieve fast clustering, we propose to represent each cluster by a skeleton set which is updated continuously as new data is seen.

Nonparametric Clustering Online Clustering

Compact Nonlinear Maps and Circulant Extensions

no code implementations12 Mar 2015 Felix X. Yu, Sanjiv Kumar, Henry Rowley, Shih-Fu Chang

This leads to much more compact maps without hurting the performance.

An exploration of parameter redundancy in deep networks with circulant projections

no code implementations ICCV 2015 Yu Cheng, Felix X. Yu, Rogerio S. Feris, Sanjiv Kumar, Alok Choudhary, Shih-Fu Chang

We explore the redundancy of parameters in deep neural networks by replacing the conventional linear projection in fully-connected layers with the circulant projection.

Discrete Graph Hashing

no code implementations NeurIPS 2014 Wei Liu, Cun Mu, Sanjiv Kumar, Shih-Fu Chang

Hashing has emerged as a popular technique for fast nearest neighbor search in gigantic databases.

Circulant Binary Embedding

no code implementations13 May 2014 Felix X. Yu, Sanjiv Kumar, Yunchao Gong, Shih-Fu Chang

To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix.

On Learning from Label Proportions

1 code implementation24 Feb 2014 Felix X. Yu, Krzysztof Choromanski, Sanjiv Kumar, Tony Jebara, Shih-Fu Chang

Learning from Label Proportions (LLP) is a learning setting, where the training data is provided in groups, or "bags", and only the proportion of each class in each bag is known.

Marketing

$\propto$SVM for learning with label proportions

no code implementations4 Jun 2013 Felix X. Yu, Dong Liu, Sanjiv Kumar, Tony Jebara, Shih-Fu Chang

We study the problem of learning with label proportions in which the training data is provided in groups and only the proportion of each class in each group is known.

Learning Binary Codes for High-Dimensional Data Using Bilinear Projections

no code implementations CVPR 2013 Yunchao Gong, Sanjiv Kumar, Henry A. Rowley, Svetlana Lazebnik

Recent advances in visual recognition indicate that to achieve good retrieval and classification accuracy on largescale datasets like ImageNet, extremely high-dimensional visual descriptors, e. g., Fisher Vectors, are needed.

Classification Code Generation +2

Angular Quantization-based Binary Codes for Fast Similarity Search

no code implementations NeurIPS 2012 Yunchao Gong, Sanjiv Kumar, Vishal Verma, Svetlana Lazebnik

Such data typically arises in a large number of vision and text applications where counts or frequencies are used as features.

Quantization

Ensemble Nystrom Method

no code implementations NeurIPS 2009 Sanjiv Kumar, Mehryar Mohri, Ameet Talwalkar

A crucial technique for scaling kernel methods to very large data sets reaching or exceeding millions of instances is based on low-rank approximation of kernel matrices.

Cannot find the paper you are looking for? You can Submit a new open access paper.