no code implementations • 28 Jun 2022 • Felix Chern, Blake Hechtman, Andy Davis, Ruiqi Guo, David Majnemer, Sanjiv Kumar
This paper presents a novel nearest neighbor search algorithm achieving TPU (Google Tensor Processing Unit) peak performance, outperforming state-of-the-art GPU algorithms with similar level of recall.
no code implementations • 27 Apr 2022 • Wittawat Jitkrittum, Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar
Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners.
no code implementations • 15 Feb 2022 • Taman Narayan, Heinrich Jiang, Sen Zhao, Sanjiv Kumar
Much effort has been devoted to making large and more accurate models, but relatively little has been put into understanding which examples are benefiting from the added complexity.
no code implementations • 2 Feb 2022 • Zhiyuan Li, Srinadh Bhojanapalli, Manzil Zaheer, Sashank J. Reddi, Sanjiv Kumar
In contrast to SGD, adaptive gradient methods like Adam allow robust training of modern deep networks, especially large language models.
2 code implementations • NeurIPS 2021 • Erik Lindgren, Sashank Reddi, Ruiqi Guo, Sanjiv Kumar
These models are typically trained by optimizing the model parameters to score relevant positive" pairs higher than the irrelevantnegative" ones.
no code implementations • 19 Oct 2021 • Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar
In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher.
no code implementations • 13 Oct 2021 • Srinadh Bhojanapalli, Ayan Chakrabarti, Andreas Veit, Michal Lukasik, Himanshu Jain, Frederick Liu, Yin-Wen Chang, Sanjiv Kumar
Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision.
no code implementations • 29 Sep 2021 • Srikumar Ramalingam, Daniel Glasner, Kaushal Patel, Raviteja Vemulapalli, Sadeep Jayasumana, Sanjiv Kumar
Deep learning has yielded extraordinary results in vision and natural language processing, but this achievement comes at a cost.
no code implementations • 29 Sep 2021 • Sadeep Jayasumana, Srikumar Ramalingam, Sanjiv Kumar
We investigate the possibility of using the embeddings produced by a lightweight network more effectively with a nonlinear classification layer.
no code implementations • 29 Sep 2021 • Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar
In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher.
no code implementations • 29 Sep 2021 • Aditya Krishna Menon, Sadeep Jayasumana, Seungyeon Kim, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
Transformer-based models such as BERT have proven successful in information retrieval problem, which seek to identify relevant documents for a given query.
no code implementations • NeurIPS 2021 • Gui Citovsky, Giulia Desalvo, Claudio Gentile, Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, Sanjiv Kumar
The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources.
no code implementations • 19 Jun 2021 • Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar
Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model.
no code implementations • 16 Jun 2021 • Srinadh Bhojanapalli, Ayan Chakrabarti, Himanshu Jain, Sanjiv Kumar, Michal Lukasik, Andreas Veit
State-of-the-art transformer models use pairwise dot-product based self-attention, which comes at a computational cost quadratic in the input sequence length.
no code implementations • 25 May 2021 • Baris Sumengen, Anand Rajagopalan, Gui Citovsky, David Simcha, Olivier Bachem, Pradipta Mitra, Sam Blasiak, Mason Liang, Sanjiv Kumar
Hierarchical Agglomerative Clustering (HAC) is one of the oldest but still most widely used clustering methods.
no code implementations • 19 May 2021 • Seungyeon Kim, Daniel Glasner, Srikumar Ramalingam, Cho-Jui Hsieh, Kishore Papineni, Sanjiv Kumar
It is generally believed that robust training of extremely large networks is critical to their success in real-world applications.
no code implementations • 12 May 2021 • Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar
Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account.
no code implementations • 26 Apr 2021 • Srikumar Ramalingam, Daniel Glasner, Kaushal Patel, Raviteja Vemulapalli, Sadeep Jayasumana, Sanjiv Kumar
Deep learning has yielded extraordinary results in vision and natural language processing, but this achievement comes at a cost.
no code implementations • AISTATS 2021 • Sashank J. Reddi, Rama Kumar Pasumarthi, Aditya Krishna Menon, Ankit Singh Rawat Felix Yu, Seungyeon Kim, Andreas Veit, Sanjiv Kumar
Knowledge distillation is an approach to improve the performance of a student model by using the knowledge of a complex teacher. Despite its success in several deep learning applications, the study of distillation is mostly confined to classification settings.
no code implementations • 5 Feb 2021 • Srinadh Bhojanapalli, Kimberly Wilber, Andreas Veit, Ankit Singh Rawat, Seungyeon Kim, Aditya Menon, Sanjiv Kumar
By analyzing the relationship between churn and prediction confidences, we pursue an approach with two components for churn reduction.
no code implementations • ICLR 2021 • Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar
Overparameterised neural networks have demonstrated the remarkable ability to perfectly fit training samples, while still generalising to unseen test samples.
no code implementations • 8 Dec 2020 • Sadeep Jayasumana, Srikumar Ramalingam, Sanjiv Kumar
We propose a kernelized classification layer for deep networks.
no code implementations • NeurIPS 2020 • Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar
We propose sufficient conditions under which we prove that a sparse attention model can universally approximate any sequence-to-sequence function.
no code implementations • 1 Dec 2020 • Chen Zhu, Ankit Singh Rawat, Manzil Zaheer, Srinadh Bhojanapalli, Daliang Li, Felix Yu, Sanjiv Kumar
In this paper, we propose a new task of \emph{explicitly modifying specific factual knowledge in Transformer models while ensuring the model performance does not degrade on the unmodified facts}.
no code implementations • ICLR 2021 • Jingzhao Zhang, Aditya Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra
The label shift problem refers to the supervised learning setting where the train and test label distributions do not match.
no code implementations • EMNLP 2020 • Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar
Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising.
no code implementations • NeurIPS 2020 • Yuhan Liu, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, Michael Riley
If each user has $m$ samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be $\mathcal{O}(k/(m\alpha^2) + k/\epsilon\alpha)$ to achieve an $\ell_1$ distance of $\alpha$ between the true and estimated distributions, with the privacy-induced penalty $k/\epsilon\alpha$ independent of the number of samples per user $m$.
no code implementations • NeurIPS 2020 • Hongge Chen, Si Si, Yang Li, Ciprian Chelba, Sanjiv Kumar, Duane Boning, Cho-Jui Hsieh
With this score, we can identify the pretraining examples in the pretraining task that contribute most to a prediction in the finetuning task.
2 code implementations • ICLR 2021 • Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar
Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples.
Ranked #33 on
Long-tail Learning
on ImageNet-LT
no code implementations • NeurIPS 2020 • Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
We propose sufficient conditions under which we prove that a sparse attention model can universally approximate any sequence-to-sequence function.
no code implementations • ICLR 2021 • Cheng-Yu Hsieh, Chih-Kuan Yeh, Xuanqing Liu, Pradeep Ravikumar, Seungyeon Kim, Sanjiv Kumar, Cho-Jui Hsieh
In this paper, we establish a novel set of evaluation criteria for such feature based explanations by robustness analysis.
no code implementations • 21 May 2020 • Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Seungyeon Kim, Sanjiv Kumar
In this paper, we present a statistical perspective on distillation which addresses this question, and provides a novel connection to extreme multiclass retrieval techniques.
1 code implementation • ICLR 2020 • Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
Gradient clipping is a widely-used technique in the training of deep networks, and is generally motivated from an optimisation lens: informally, it controls the dynamics of iterates, thus enhancing the rate of convergence to a local minimum.
no code implementations • 23 Apr 2020 • Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar
Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e. g., users of a retrieval system may be from different countries), each of which poses a challenge.
no code implementations • ICML 2020 • Felix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar
We consider learning a multi-class classification model in the federated setting, where each user has access to the positive data associated with only a single class.
no code implementations • NeurIPS 2020 • Melanie Weber, Manzil Zaheer, Ankit Singh Rawat, Aditya Menon, Sanjiv Kumar
In this paper, we present, to our knowledge, the first theoretical guarantees for learning a classifier in hyperbolic rather than Euclidean space.
no code implementations • ICML 2020 • Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar
Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors.
Ranked #8 on
Learning with noisy labels
on CIFAR-10N-Aggregate
3 code implementations • ICLR 2021 • Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, H. Brendan McMahan
Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data.
no code implementations • ICML 2020 • Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
Attention based Transformer architecture has enabled significant advances in the field of natural language processing.
no code implementations • ICLR 2020 • Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, Sanjiv Kumar
We consider the large-scale query-document retrieval problem: given a query (e. g., a question), return the set of relevant documents (e. g., paragraphs containing the answer) from a large document corpus.
no code implementations • ICLR 2020 • Ruiqi Guo, Quan Geng, David Simcha, Felix Chern, Phil Sun, Sanjiv Kumar
In this work, we focus directly on minimizing error in inner product approximation and derive a new class of quantization loss functions.
no code implementations • ICLR 2020 • Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
In this paper, we establish that Transformer models are universal approximators of continuous permutation equivariant sequence-to-sequence functions with compact support, which is quite surprising given the amount of shared parameters in these models.
no code implementations • NeurIPS 2020 • Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J. Reddi, Sanjiv Kumar, Suvrit Sra
While stochastic gradient descent (SGD) is still the \emph{de facto} algorithm in deep learning, adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across important tasks, such as attention models.
no code implementations • NeurIPS 2019 • Aditya K. Menon, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar
Multilabel classification is a challenging problem arising in applications ranging from information retrieval to image tagging.
no code implementations • NeurIPS 2019 • Chuan Guo, Ali Mousavi, Xiang Wu, Daniel N. Holtmann-Rice, Satyen Kale, Sashank Reddi, Sanjiv Kumar
In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy.
1 code implementation • ICLR 2020 • Yangjun Ruan, Yuanhao Xiong, Sashank Reddi, Sanjiv Kumar, Cho-Jui Hsieh
In the learning to learn (L2L) framework, we cast the design of optimization algorithms as a machine learning problem and use deep neural networks to learn the update rules.
no code implementations • 25 Sep 2019 • Patrick H. Chen, Sashank Reddi, Sanjiv Kumar, Cho-Jui Hsieh
We consider the learning to learn problem, where the goal is to leverage deeplearning models to automatically learn (iterative) optimization algorithms for training machine learning models.
no code implementations • 25 Sep 2019 • Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J Reddi, Sanjiv Kumar, Suvrit Sra
While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning, adaptive methods like Adam have been observed to outperform SGD across important tasks, such as attention models.
no code implementations • 25 Sep 2019 • Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar
Attention based Transformer architecture has enabled significant advances in the field of natural language processing.
no code implementations • 20 Sep 2019 • Aditya Krishna Menon, Anand Rajagopalan, Baris Sumengen, Gui Citovsky, Qin Cao, Sanjiv Kumar
The second algorithm, OHAC, is an online counterpart to offline HAC, which is known to yield a 1/3-approximation to the MW revenue, and produce good quality clusters in practice.
2 code implementations • ICML 2020 • Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, Sanjiv Kumar
Based on the observation that for a given query, the database points that have the largest inner products are more relevant, we develop a family of anisotropic quantization loss functions.
1 code implementation • 20 Aug 2019 • Venkatadheeraj Pichapati, Ananda Theertha Suresh, Felix X. Yu, Sashank J. Reddi, Sanjiv Kumar
Motivated by this, differentially private stochastic gradient descent (SGD) algorithms for training machine learning models have been proposed.
no code implementations • NeurIPS 2019 • Ankit Singh Rawat, Jiecao Chen, Felix Yu, Ananda Theertha Suresh, Sanjiv Kumar
For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method.
1 code implementation • 5 Jun 2019 • Xuanqing Liu, Tesi Xiao, Si Si, Qin Cao, Sanjiv Kumar, Cho-Jui Hsieh
In this paper, we propose a new continuous neural network framework called Neural Stochastic Differential Equation (Neural SDE) network, which naturally incorporates various commonly used regularization mechanisms based on random noise injection.
2 code implementations • ICLR 2018 • Sashank J. Reddi, Satyen Kale, Sanjiv Kumar
Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients.
22 code implementations • ICLR 2020 • Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, Cho-Jui Hsieh
In this paper, we first study a principled layerwise adaptation strategy to accelerate training of deep neural networks using large mini-batches.
Ranked #11 on
Question Answering
on SQuAD1.1 dev
(F1 metric)
no code implementations • 25 Mar 2019 • Xiang Wu, Ruiqi Guo, Sanjiv Kumar, David Simcha
More specifically, we decompose a residual vector locally into two orthogonal components and perform uniform quantization and multiscale quantization to each component respectively.
no code implementations • 20 Mar 2019 • Xiang Wu, Ruiqi Guo, David Simcha, Dave Dopson, Sanjiv Kumar
In this paper, we propose a technique that approximates the inner product computation in hybrid vectors, leading to substantial speedup in search while maintaining high accuracy.
no code implementations • 26 Jan 2019 • Matthew Staib, Sashank J. Reddi, Satyen Kale, Sanjiv Kumar, Suvrit Sra
Adaptive methods such as Adam and RMSProp are widely used in deep learning but are not well understood.
1 code implementation • NeurIPS 2018 • Manzil Zaheer, Sashank Reddi, Devendra Sachan, Satyen Kale, Sanjiv Kumar
In this work, we provide a new analysis of such methods applied to nonconvex stochastic optimization problems, characterizing the effect of increasing minibatch size.
no code implementations • ICLR 2019 • Patrick H. Chen, Si Si, Sanjiv Kumar, Yang Li, Cho-Jui Hsieh
The algorithm achieves an order of magnitude faster inference than the original softmax layer for predicting top-$k$ words in various tasks such as beam search in machine translation or next words prediction.
no code implementations • 16 Oct 2018 • Sashank J. Reddi, Satyen Kale, Felix Yu, Dan Holtmann-Rice, Jiecao Chen, Sanjiv Kumar
Furthermore, we identify a particularly intuitive class of loss functions in the aforementioned family and show that they are amenable to practical implementation in the large output space setting (i. e. computation is possible without evaluating scores of all labels) by developing a technique called Stochastic Negative Mining.
no code implementations • 1 Oct 2018 • Quan Geng, Wei Ding, Ruiqi Guo, Sanjiv Kumar
We show that the multiplicative gap of the lower bounds and upper bounds goes to zero in various high privacy regimes, proving the tightness of the lower and upper bounds and thus establishing the optimality of the truncated Laplacian mechanism.
no code implementations • 26 Sep 2018 • Quan Geng, Wei Ding, Ruiqi Guo, Sanjiv Kumar
We derive the optimal $(0, \delta)$-differentially private query-output independent noise-adding mechanism for single real-valued query function under a general cost-minimization framework.
no code implementations • ICML 2018 • Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, Pradeep Ravikumar
For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space.
1 code implementation • 26 Jun 2018 • Shanshan Wu, Alexandros G. Dimakis, Sujay Sanghavi, Felix X. Yu, Daniel Holtmann-Rice, Dmitry Storcheus, Afshin Rostamizadeh, Sanjiv Kumar
Our experiments show that there is indeed additional structure beyond sparsity in the real datasets; our method is able to discover it and exploit it to create excellent reconstructions with fewer measurements (by a factor of 1. 1-3x) compared to the previous state-of-the-art methods.
Extreme Multi-Label Classification
Multi-Label Classification
+1
no code implementations • NeurIPS 2018 • Naman Agarwal, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, H. Brendan McMahan
Distributed stochastic gradient descent is an important subroutine in distributed learning.
no code implementations • 21 Feb 2018 • Si Si, Sanjiv Kumar, Yang Li
Use of nonlinear feature maps via kernel approximation has led to success in many online learning tasks.
no code implementations • NeurIPS 2017 • Xiang Wu, Ruiqi Guo, Ananda Theertha Suresh, Sanjiv Kumar, Daniel N. Holtmann-Rice, David Simcha, Felix Yu
We propose a multiscale quantization approach for fast similarity search on large, high-dimensional datasets.
no code implementations • 29 Nov 2017 • Blaise Agüera y Arcas, Beat Gfeller, Ruiqi Guo, Kevin Kilgour, Sanjiv Kumar, James Lyon, Julian Odell, Marvin Ritter, Dominik Roblek, Matthew Sharifi, Mihajlo Velimirović
To reduce battery consumption, a small music detector runs continuously on the mobile device's DSP chip and wakes up the main application processor only when it is confident that music is present.
2 code implementations • ICCV 2017 • Xu Zhang, Felix X. Yu, Sanjiv Kumar, Shih-Fu Chang
We propose a simple, yet powerful regularization technique that can be used to significantly improve both the pairwise and triplet losses in learning local feature descriptors.
no code implementations • 1 May 2017 • Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-Hsuan Sung, Laszlo Lukacs, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, Ray Kurzweil
This paper presents a computationally efficient machine-learned method for natural language response suggestion.
2 code implementations • ICML 2017 • Bo Dai, Ruiqi Guo, Sanjiv Kumar, Niao He, Le Song
Learning-based binary hashing has become a powerful paradigm for fast search and retrieval in massive databases.
no code implementations • ICML 2017 • Ananda Theertha Suresh, Felix X. Yu, Sanjiv Kumar, H. Brendan McMahan
Motivated by the need for distributed learning and optimization algorithms with low communication cost, we study communication efficient algorithms for distributed mean estimation.
no code implementations • NeurIPS 2016 • Felix X. Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, Sanjiv Kumar
We present an intriguing discovery related to Random Fourier Features: in Gaussian kernel approximation, replacing the random Gaussian matrix by a properly scaled random orthogonal matrix significantly decreases kernel approximation error.
no code implementations • NeurIPS 2015 • Jeffrey Pennington, Felix Xinnan X. Yu, Sanjiv Kumar
Among the commonly used kernels for nonlinear classification are polynomial kernels, for which low approximation error has thus far necessitated explicit feature maps of large dimensionality, especially for higher-order polynomials.
no code implementations • ICCV 2015 • Xu Zhang, Felix X. Yu, Ruiqi Guo, Sanjiv Kumar, Shengjin Wang, Shi-Fu Chang
We propose a family of structured matrices to speed up orthogonal projections for high-dimensional data commonly seen in computer vision applications.
no code implementations • 20 Nov 2015 • Felix X. Yu, Aditya Bhaskara, Sanjiv Kumar, Yunchao Gong, Shih-Fu Chang
To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix.
no code implementations • 16 Nov 2015 • Anna Choromanska, Krzysztof Choromanski, Mariusz Bojarski, Tony Jebara, Sanjiv Kumar, Yann Lecun
We prove several theoretical results showing that projections via various structured matrices followed by nonlinear mappings accurately preserve the angular distance between input high-dimensional vectors.
no code implementations • NeurIPS 2015 • Vikas Sindhwani, Tara N. Sainath, Sanjiv Kumar
We consider the task of building compact deep learning pipelines suitable for deployment on storage and power constrained mobile devices.
no code implementations • 17 Sep 2015 • Jun Wang, Wei Liu, Sanjiv Kumar, Shih-Fu Chang
Such learning to hash methods exploit information such as data distributions or class labels when optimizing the hash codes or functions.
no code implementations • 4 Sep 2015 • Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, David Simcha
We propose a quantization based approach for fast approximate Maximum Inner Product Search (MIPS).
no code implementations • 10 Jun 2015 • Krzysztof Choromanski, Sanjiv Kumar, Xiaofeng Liu
To achieve fast clustering, we propose to represent each cluster by a skeleton set which is updated continuously as new data is seen.
no code implementations • 12 Mar 2015 • Felix X. Yu, Sanjiv Kumar, Henry Rowley, Shih-Fu Chang
This leads to much more compact maps without hurting the performance.
no code implementations • ICCV 2015 • Yu Cheng, Felix X. Yu, Rogerio S. Feris, Sanjiv Kumar, Alok Choudhary, Shih-Fu Chang
We explore the redundancy of parameters in deep neural networks by replacing the conventional linear projection in fully-connected layers with the circulant projection.
no code implementations • NeurIPS 2014 • Wei Liu, Cun Mu, Sanjiv Kumar, Shih-Fu Chang
Hashing has emerged as a popular technique for fast nearest neighbor search in gigantic databases.
no code implementations • 13 May 2014 • Felix X. Yu, Sanjiv Kumar, Yunchao Gong, Shih-Fu Chang
To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix.
1 code implementation • 24 Feb 2014 • Felix X. Yu, Krzysztof Choromanski, Sanjiv Kumar, Tony Jebara, Shih-Fu Chang
Learning from Label Proportions (LLP) is a learning setting, where the training data is provided in groups, or "bags", and only the proportion of each class in each bag is known.
no code implementations • 4 Jun 2013 • Felix X. Yu, Dong Liu, Sanjiv Kumar, Tony Jebara, Shih-Fu Chang
We study the problem of learning with label proportions in which the training data is provided in groups and only the proportion of each class in each group is known.
no code implementations • CVPR 2013 • Yunchao Gong, Sanjiv Kumar, Henry A. Rowley, Svetlana Lazebnik
Recent advances in visual recognition indicate that to achieve good retrieval and classification accuracy on largescale datasets like ImageNet, extremely high-dimensional visual descriptors, e. g., Fisher Vectors, are needed.
no code implementations • NeurIPS 2012 • Yunchao Gong, Sanjiv Kumar, Vishal Verma, Svetlana Lazebnik
Such data typically arises in a large number of vision and text applications where counts or frequencies are used as features.
no code implementations • NeurIPS 2009 • Sanjiv Kumar, Mehryar Mohri, Ameet Talwalkar
A crucial technique for scaling kernel methods to very large data sets reaching or exceeding millions of instances is based on low-rank approximation of kernel matrices.