no code implementations • ICML 2020 • Michal Moshkovitz, Sanjoy Dasgupta, Cyrus Rashtchian, Nave Frost
In terms of negative results, we show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost, and we prove that any explainable clustering must incur an \Omega(\log k) approximation compared to the optimal clustering.
no code implementations • 2 Dec 2024 • Akash Kumar, Sanjoy Dasgupta
In this work, we investigate the problem of learning distance functions within the query-based learning framework, where a learner is able to pose triplet queries of the form: ``Is $x_i$ closer to $x_j$ or $x_k$?''
no code implementations • 31 Oct 2024 • Sanjoy Dasgupta, Geelon So
In the realizable online setting, a learner is tasked with making predictions for a stream of instances, where the correct answer is revealed after each prediction.
no code implementations • 2 May 2024 • Sanjoy Dasgupta, Eduardo Laber
Linkage methods are among the most popular algorithms for hierarchical clustering.
no code implementations • 3 Jul 2023 • Sanjoy Dasgupta, Geelon So
We study an instance of online non-parametric classification in the realizable setting.
no code implementations • 5 Mar 2023 • Sanjoy Dasgupta, Yoav Freund
We present a general-purpose active learning scheme for data in metric spaces.
no code implementations • 25 Feb 2023 • Robi Bhattacharjee, Sanjoy Dasgupta, Kamalika Chaudhuri
There has been some recent interest in detecting and addressing memorization of training data by deep neural networks.
no code implementations • 20 Sep 2022 • Anthony Thomas, Behnam Khaleghi, Gopi Krishna Jha, Sanjoy Dasgupta, Nageen Himayat, Ravi Iyer, Nilesh Jain, Tajana Rosing
Hyperdimensional computing (HDC) is a paradigm for data representation and learning originating in computational neuroscience.
no code implementations • 22 Feb 2022 • Sanjoy Dasgupta, Gaurav Mahajan, Geelon So
We prove asymptotic convergence for a general class of $k$-means algorithms performed over streaming data from a distribution: the centers asymptotically converge to the set of stationary points of the $k$-means cost function.
no code implementations • 1 Feb 2022 • Sanjoy Dasgupta, Nave Frost, Michal Moshkovitz
We study the faithfulness of an explanation system to the underlying prediction model.
1 code implementation • 15 Jul 2021 • Yang shen, Sanjoy Dasgupta, Saket Navlakha
We discovered a two layer neural circuit in the fruit fly olfactory system that addresses this challenge by uniquely combining sparse coding and associative learning.
no code implementations • 18 Feb 2021 • Robi Bhattacharjee, Jacob Imola, Michal Moshkovitz, Sanjoy Dasgupta
We propose a data parameter, $\Lambda(X)$, such that for any algorithm maintaining $O(k\text{poly}(\log n))$ centers at time $n$, there exists a data stream $X$ for which a loss of $\Omega(\Lambda(X))$ is inevitable.
no code implementations • 14 Oct 2020 • Anthony Thomas, Sanjoy Dasgupta, Tajana Rosing
Hyperdimensional (HD) computing is a set of neurally inspired methods for obtaining high-dimensional, low-precision, distributed representations of data.
no code implementations • 5 Jun 2020 • Sanjoy Dasgupta, Christopher Tosh
The linear functions can be specified explicitly and are easy to learn, and we give bounds on how large $m$ needs to be as a function of the input dimension $d$ and the smoothness of the target function.
1 code implementation • 12 Apr 2020 • Casey Meehan, Kamalika Chaudhuri, Sanjoy Dasgupta
Detecting overfitting in generative models is an important challenge in machine learning.
no code implementations • 9 Mar 2020 • Sanjoy Dasgupta, Sivan Sabato
We show how such errors can be handled algorithmically, in both an adversarial and a stochastic setting.
3 code implementations • 28 Feb 2020 • Sanjoy Dasgupta, Nave Frost, Michal Moshkovitz, Cyrus Rashtchian
In terms of negative results, we show, first, that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost, and second, that any tree-induced clustering must in general incur an $\Omega(\log k)$ approximation factor compared to the optimal clustering.
no code implementations • 18 Jun 2019 • Sanjoy Dasgupta, Stefanos Poulis, Christopher Tosh
The formalism of anchor words has enabled the development of fast topic modeling algorithms with provable guarantees.
1 code implementation • NeurIPS 2019 • Akshay Balsubramani, Sanjoy Dasgupta, Yoav Freund, Shay Moran
We introduce a variant of the $k$-nearest neighbor classifier in which $k$ is chosen adaptively for each query, rather than supplied as a parameter.
no code implementations • 13 Mar 2019 • Robi Bhattacharjee, Sanjoy Dasgupta
We consider the problem of embedding a relation, represented as a directed graph, into Euclidean space.
no code implementations • NeurIPS 2018 • Sanjoy Dasgupta, Akansha Dey, Nicholas Roberts, Sivan Sabato
We consider the problem of learning a multi-class classifier from labels as well as simple explanations that we call "discriminative features".
no code implementations • NeurIPS 2018 • Christopher Tosh, Sanjoy Dasgupta
In this work, we introduce interactive structure learning, a framework that unifies many different interactive learning tasks.
no code implementations • 17 Mar 2018 • Christopher Tosh, Sanjoy Dasgupta
In this work, we describe a framework that unifies many different interactive learning tasks.
no code implementations • 20 Feb 2018 • Ehsan Kazemi, Lin Chen, Sanjoy Dasgupta, Amin Karbasi
More specifically, we aim at devising efficient algorithms to locate a target object in a database equipped with a dissimilarity metric via invocation of the weak comparison oracle.
no code implementations • 23 May 2017 • Sanjoy Dasgupta, Michael Luby
We introduce a new model of interactive learning in which an expert examines the predictions of a learner and partially fixes them if they are wrong.
no code implementations • ICML 2017 • Christopher Tosh, Sanjoy Dasgupta
To date, the tightest upper and lower-bounds for the active learning of general concept classes have been in terms of a parameter of the learning problem called the splitting index.
no code implementations • NeurIPS 2016 • Xinan Wang, Sanjoy Dasgupta
Fast algorithms for nearest neighbor (NN) search have in large part focused on L2 distance.
no code implementations • 10 Feb 2016 • Sharad Vikram, Sanjoy Dasgupta
Clustering is a powerful tool in data analysis, but it is often difficult to find a grouping that aligns with a user's needs.
no code implementations • 16 Oct 2015 • Sanjoy Dasgupta
The development of algorithms for hierarchical clustering has been hampered by a shortage of precise objective functions.
no code implementations • NeurIPS 2013 • Akshay Balsubramani, Sanjoy Dasgupta, Yoav Freund
We consider a situation in which we see samples in $\mathbb{R}^d$ drawn i. i. d.
no code implementations • NeurIPS 2014 • Sanjoy Dasgupta, Samory Kpotufe
We present two related contributions of independent interest: (1) high-probability finite sample rates for $k$-NN density estimation, and (2) practical mode estimators -- based on $k$-NN -- which attain minimax-optimal rates under surprisingly general distributional conditions.
no code implementations • NeurIPS 2014 • Kamalika Chaudhuri, Sanjoy Dasgupta
Nearest neighbor methods are a popular class of nonparametric estimators with several desirable properties, such as adaptivity to different distance scales in different regions of space.
no code implementations • NeurIPS 2014 • Margareta Ackerman, Sanjoy Dasgupta
The explosion in the amount of data available for analysis often necessitates a transition from batch to incremental clustering methods, which process one element at a time and typically store only a small subset of the data.
no code implementations • 5 Jun 2014 • Kamalika Chaudhuri, Sanjoy Dasgupta, Samory Kpotufe, Ulrike Von Luxburg
For a density $f$ on ${\mathbb R}^d$, a {\it high-density cluster} is any connected component of $\{x: f(x) \geq \lambda\}$, for some $\lambda > 0$.
no code implementations • NeurIPS 2013 • Matus J. Telgarsky, Sanjoy Dasgupta
Suppose $k$ centers are fit to $m$ points by heuristically minimizing the $k$-means cost; what is the corresponding fit over the source distribution?
1 code implementation • 8 Nov 2013 • Matus Telgarsky, Sanjoy Dasgupta
Suppose $k$ centers are fit to $m$ points by heuristically minimizing the $k$-means cost; what is the corresponding fit over the source distribution?
no code implementations • NeurIPS 2010 • Kamalika Chaudhuri, Sanjoy Dasgupta
For a density f on R^d, a high-density cluster is any connected component of {x: f(x) >= c}, for some c > 0.
no code implementations • NeurIPS 2007 • Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma
We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data.
no code implementations • NeurIPS 2007 • Lawrence Cayton, Sanjoy Dasgupta
Can we leverage learning techniques to build a fast nearest-neighbor (NN) retrieval data structure?