Search Results for author: Sanjoy Dasgupta

Found 36 papers, 5 papers with code

Explainable k-Means and k-Medians Clustering

no code implementations ICML 2020 Michal Moshkovitz, Sanjoy Dasgupta, Cyrus Rashtchian, Nave Frost

In terms of negative results, we show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost, and we prove that any explainable clustering must incur an \Omega(\log k) approximation compared to the optimal clustering.

Clustering

Online nearest neighbor classification

no code implementations3 Jul 2023 Sanjoy Dasgupta, Geelon So

We study an instance of online non-parametric classification in the realizable setting.

Classification

Active learning using region-based sampling

no code implementations5 Mar 2023 Sanjoy Dasgupta, Yoav Freund

We present a general-purpose active learning scheme for data in metric spaces.

Active Learning

Data-Copying in Generative Models: A Formal Framework

no code implementations25 Feb 2023 Robi Bhattacharjee, Sanjoy Dasgupta, Kamalika Chaudhuri

There has been some recent interest in detecting and addressing memorization of training data by deep neural networks.

Memorization

Streaming Encoding Algorithms for Scalable Hyperdimensional Computing

no code implementations20 Sep 2022 Anthony Thomas, Behnam Khaleghi, Gopi Krishna Jha, Sanjoy Dasgupta, Nageen Himayat, Ravi Iyer, Nilesh Jain, Tajana Rosing

Hyperdimensional computing (HDC) is a paradigm for data representation and learning originating in computational neuroscience.

Convergence of online $k$-means

no code implementations22 Feb 2022 Sanjoy Dasgupta, Gaurav Mahajan, Geelon So

We prove asymptotic convergence for a general class of $k$-means algorithms performed over streaming data from a distribution: the centers asymptotically converge to the set of stationary points of the $k$-means cost function.

Framework for Evaluating Faithfulness of Local Explanations

no code implementations1 Feb 2022 Sanjoy Dasgupta, Nave Frost, Michal Moshkovitz

We study the faithfulness of an explanation system to the underlying prediction model.

Algorithmic insights on continual learning from fruit flies

1 code implementation15 Jul 2021 Yang shen, Sanjoy Dasgupta, Saket Navlakha

We discovered a two layer neural circuit in the fruit fly olfactory system that addresses this challenge by uniquely combining sparse coding and associative learning.

Continual Learning

Online $k$-means Clustering on Arbitrary Data Streams

no code implementations18 Feb 2021 Robi Bhattacharjee, Jacob Imola, Michal Moshkovitz, Sanjoy Dasgupta

We propose a data parameter, $\Lambda(X)$, such that for any algorithm maintaining $O(k\text{poly}(\log n))$ centers at time $n$, there exists a data stream $X$ for which a loss of $\Omega(\Lambda(X))$ is inevitable.

Clustering

A Theoretical Perspective on Hyperdimensional Computing

no code implementations14 Oct 2020 Anthony Thomas, Sanjoy Dasgupta, Tajana Rosing

Hyperdimensional (HD) computing is a set of neurally inspired methods for obtaining high-dimensional, low-precision, distributed representations of data.

Expressivity of expand-and-sparsify representations

no code implementations5 Jun 2020 Sanjoy Dasgupta, Christopher Tosh

The linear functions can be specified explicitly and are easy to learn, and we give bounds on how large $m$ needs to be as a function of the input dimension $d$ and the smoothness of the target function.

A Non-Parametric Test to Detect Data-Copying in Generative Models

1 code implementation12 Apr 2020 Casey Meehan, Kamalika Chaudhuri, Sanjoy Dasgupta

Detecting overfitting in generative models is an important challenge in machine learning.

BIG-bench Machine Learning

Robust Learning from Discriminative Feature Feedback

no code implementations9 Mar 2020 Sanjoy Dasgupta, Sivan Sabato

We show how such errors can be handled algorithmically, in both an adversarial and a stochastic setting.

Explainable $k$-Means and $k$-Medians Clustering

3 code implementations28 Feb 2020 Sanjoy Dasgupta, Nave Frost, Michal Moshkovitz, Cyrus Rashtchian

In terms of negative results, we show, first, that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost, and second, that any tree-induced clustering must in general incur an $\Omega(\log k)$ approximation factor compared to the optimal clustering.

Clustering

Interactive Topic Modeling with Anchor Words

no code implementations18 Jun 2019 Sanjoy Dasgupta, Stefanos Poulis, Christopher Tosh

The formalism of anchor words has enabled the development of fast topic modeling algorithms with provable guarantees.

Topic Models

An adaptive nearest neighbor rule for classification

1 code implementation NeurIPS 2019 Akshay Balsubramani, Sanjoy Dasgupta, Yoav Freund, Shay Moran

We introduce a variant of the $k$-nearest neighbor classifier in which $k$ is chosen adaptively for each query, rather than supplied as a parameter.

Classification General Classification +1

What relations are reliably embeddable in Euclidean space?

no code implementations13 Mar 2019 Robi Bhattacharjee, Sanjoy Dasgupta

We consider the problem of embedding a relation, represented as a directed graph, into Euclidean space.

Knowledge Graphs Relation

Learning from discriminative feature feedback

no code implementations NeurIPS 2018 Sanjoy Dasgupta, Akansha Dey, Nicholas Roberts, Sivan Sabato

We consider the problem of learning a multi-class classifier from labels as well as simple explanations that we call "discriminative features".

Interactive Structure Learning with Structural Query-by-Committee

no code implementations NeurIPS 2018 Christopher Tosh, Sanjoy Dasgupta

In this work, we introduce interactive structure learning, a framework that unifies many different interactive learning tasks.

Active Learning

Structural query-by-committee

no code implementations17 Mar 2018 Christopher Tosh, Sanjoy Dasgupta

In this work, we describe a framework that unifies many different interactive learning tasks.

Active Learning

Comparison Based Learning from Weak Oracles

no code implementations20 Feb 2018 Ehsan Kazemi, Lin Chen, Sanjoy Dasgupta, Amin Karbasi

More specifically, we aim at devising efficient algorithms to locate a target object in a database equipped with a dissimilarity metric via invocation of the weak comparison oracle.

Learning from partial correction

no code implementations23 May 2017 Sanjoy Dasgupta, Michael Luby

We introduce a new model of interactive learning in which an expert examines the predictions of a learner and partially fixes them if they are wrong.

Generalization Bounds

Diameter-Based Active Learning

no code implementations ICML 2017 Christopher Tosh, Sanjoy Dasgupta

To date, the tightest upper and lower-bounds for the active learning of general concept classes have been in terms of a parameter of the learning problem called the splitting index.

Active Learning

An algorithm for L1 nearest neighbor search via monotonic embedding

no code implementations NeurIPS 2016 Xinan Wang, Sanjoy Dasgupta

Fast algorithms for nearest neighbor (NN) search have in large part focused on L2 distance.

Interactive Bayesian Hierarchical Clustering

no code implementations10 Feb 2016 Sharad Vikram, Sanjoy Dasgupta

Clustering is a powerful tool in data analysis, but it is often difficult to find a grouping that aligns with a user's needs.

Clustering

A cost function for similarity-based hierarchical clustering

no code implementations16 Oct 2015 Sanjoy Dasgupta

The development of algorithms for hierarchical clustering has been hampered by a shortage of precise objective functions.

Clustering

The Fast Convergence of Incremental PCA

no code implementations NeurIPS 2013 Akshay Balsubramani, Sanjoy Dasgupta, Yoav Freund

We consider a situation in which we see samples in $\mathbb{R}^d$ drawn i. i. d.

Optimal rates for k-NN density and mode estimation

no code implementations NeurIPS 2014 Sanjoy Dasgupta, Samory Kpotufe

We present two related contributions of independent interest: (1) high-probability finite sample rates for $k$-NN density estimation, and (2) practical mode estimators -- based on $k$-NN -- which attain minimax-optimal rates under surprisingly general distributional conditions.

Density Estimation

Rates of Convergence for Nearest Neighbor Classification

no code implementations NeurIPS 2014 Kamalika Chaudhuri, Sanjoy Dasgupta

Nearest neighbor methods are a popular class of nonparametric estimators with several desirable properties, such as adaptivity to different distance scales in different regions of space.

Classification General Classification

Incremental Clustering: The Case for Extra Clusters

no code implementations NeurIPS 2014 Margareta Ackerman, Sanjoy Dasgupta

The explosion in the amount of data available for analysis often necessitates a transition from batch to incremental clustering methods, which process one element at a time and typically store only a small subset of the data.

Clustering

Consistent procedures for cluster tree estimation and pruning

no code implementations5 Jun 2014 Kamalika Chaudhuri, Sanjoy Dasgupta, Samory Kpotufe, Ulrike Von Luxburg

For a density $f$ on ${\mathbb R}^d$, a {\it high-density cluster} is any connected component of $\{x: f(x) \geq \lambda\}$, for some $\lambda > 0$.

Clustering

Moment-based Uniform Deviation Bounds for k-means and Friends

no code implementations NeurIPS 2013 Matus J. Telgarsky, Sanjoy Dasgupta

Suppose $k$ centers are fit to $m$ points by heuristically minimizing the $k$-means cost; what is the corresponding fit over the source distribution?

Clustering

Moment-based Uniform Deviation Bounds for $k$-means and Friends

1 code implementation8 Nov 2013 Matus Telgarsky, Sanjoy Dasgupta

Suppose $k$ centers are fit to $m$ points by heuristically minimizing the $k$-means cost; what is the corresponding fit over the source distribution?

Clustering

Rates of convergence for the cluster tree

no code implementations NeurIPS 2010 Kamalika Chaudhuri, Sanjoy Dasgupta

For a density f on R^d, a high-density cluster is any connected component of {x: f(x) >= c}, for some c > 0.

A learning framework for nearest neighbor search

no code implementations NeurIPS 2007 Lawrence Cayton, Sanjoy Dasgupta

Can we leverage learning techniques to build a fast nearest-neighbor (NN) retrieval data structure?

Retrieval

Learning the structure of manifolds using random projections

no code implementations NeurIPS 2007 Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma

We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data.

Cannot find the paper you are looking for? You can Submit a new open access paper.