no code implementations • 26 May 2023 • Amir Abboud, Nick Fischer, Elazar Goldenberg, Karthik C. S., Ron Safier
We study the fundamental problem of finding the best string to represent a given set, in the form of the Closest String problem: Given a set $X \subseteq \Sigma^d$ of $n$ strings, find the string $x^*$ minimizing the radius of the smallest Hamming ball around $x^*$ that encloses all the strings in $X$.
no code implementations • 4 May 2023 • Chengyuan Deng, Surya Teja Gavva, Karthik C. S., Parth Patel, Adarsh Srinivasan
Formally, we show that there exists a data set X in the Euclidean plane, for which there is a decision tree of depth k-1 whose k-means/k-median cost matches the optimal clustering cost of X, but every decision tree of depth less than k-1 has unbounded cost w. r. t.
1 code implementation • 18 Oct 2022 • Surya Teja Gavva, Karthik C. S., Sharath Punna
Over the last three decades, researchers have intensively explored various clustering tools for categorical data analysis.
no code implementations • 6 Dec 2021 • Amir Abboud, Mohammad Hossein Bateni, Vincent Cohen-Addad, Karthik C. S., Saeed Seddighin
Moreover, we extend one of our hardness results to rule out subquartic algorithms for the well-studied 1-median problem in the edit metric, where given a set of $n$ strings each of length $n$, the goal is to find a string in the set that minimizes the sum of the edit distances to the rest of the strings in the set.
no code implementations • ICML 2020 • Vincent Cohen-Addad, Karthik C. S., Guillaume Lagarde
In this paper, we provide a new algorithm which takes as input a set of points $P$ in $\mathbb{R}^d$, and for every $c\ge 1$, runs in time $n^{1+\frac{\rho}{c^2}}$ (for some universal constant $\rho>1$) to output an ultrametric $\Delta$ such that for any two points $u, v$ in $P$, we have $\Delta(u, v)$ is within a multiplicative factor of $5c$ to the distance between $u$ and $v$ in the "best" ultrametric representation of $P$.